Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

Merged
merged 5 commits into from May 28, 2018

Conversation

rmetrich
Copy link
Contributor

Using 'vgcfgrestore' to restore Thin LVs doesn't work, because there
isn't the metadata in the configuration file.
To be able to use Thin LVs, upon failure of 'vgcfgrestore' (or migration),
traditional 'vgcreate/lvcreate' are used instead.

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.
Nonetheless, this is better than nothing since the issue with 'lvcreate'
options was already there in case of migration.

The code using 'lvcreate' only runs upon 'vgcfgrestore' failure, which
typically happens only with Thin LVs (IMHO). Hence, on most setups, the
optimized code will still run, preventing any regression to happen.

Signed-off-by: Renaud Métrich rmetrich@redhat.com

PLEASE TEST MORE.

I tested as follows:

  • regular system with 1 VG, no thin pool
  • system with 1 VG, and root filesystem on thin pool + some special LVs (mirror, raid1, raid5, etc).

I plan to test tomorrow on a system with 1 VG, no thin pool but some special LVs (mirror, raid, etc) to verify that vgcfgrestore also works with special LVs.

…l/Volume

Using 'vgcfgrestore' to restore Thin LVs doesn't work, because there
isn't the metadata in the configuration file.
To be able to use Thin LVs, upon failure of 'vgcfgrestore' (or migration),
traditional 'vgcreate/lvcreate' are used instead.

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.
Nonetheless, this is better than nothing since the issue with 'lvcreate'
options was already there in case of migration.

The code using 'lvcreate' only runs upon 'vgcfgrestore' failure, which
typically happens only with Thin LVs (IMHO). Hence, on most setups, the
optimized code will still run, preventing any regression to happen.

Signed-off-by: Renaud Métrich <rmetrich@redhat.com>
…ove Thin volumes being recovered, since they are broken.
@rmetrich
Copy link
Contributor Author

The proposed code does the following:

  1. During backup
  • Collect additional LV properties

    origin: originating LV (for cache and snapshots)
    lv_layout: type of LV (mirror, raid, thin, etc)
    pool_lv: thin pool hosting LV
    chunk_size: size of the chunk, for various volumes
    stripes: number of stripes, for Mirror volumes and Stripes volumes
    stripe_size: size of a stripe, for Raid volumes

  • Skip caches and snapshots (not supported)

  1. During restore
  • If in Migration mode (e.g. different disks but same size), go through the vgcreate/lvcreate code (Legacy Method), printing Warnings because the initial layout may not be preserved (because we do not save all attributes needed for re-creating LVM volumes)

  • Otherwise, try "vgcfgrestore"

  • If it fails

    • Try "vgcfgrestore --force"
    • If it fails, use vgcreate/lvcreate (Legacy Method)
    • Otherwise, remove Thin pools (which are broken due to --force flag)
    • Create Thin pools using Legacy Method (but do not create other LVs which have been succesfully restored)

@gdha gdha added the enhancement Adaptions and new features label May 16, 2018
@gdha gdha self-assigned this May 16, 2018
@jsmeix
Copy link
Member

jsmeix commented May 17, 2018

@gdha
I am currently too busy with other stuff so that I cannot have a look here right now.
Is this enhancement useful for ReaR 2.4 or can it be postponed to ReaR 2.5?

@jsmeix jsmeix removed their request for review May 17, 2018 11:24
@rmetrich
Copy link
Contributor Author

Thin pools are widely used along with docker.
The code has been implemented to prevent any regression when not using thin pools (the vgcfgrestore should just work). When in migration mode, the code should enhance the recovery, but this requires some testing (I tested with Thin pools, Linear volumes and Raid/Mirror volumes).

@gozora
Copy link
Member

gozora commented May 17, 2018

Hello @rmetrich at first glance it looks good to me. I'm however not that good in reading plain code :-/ ...
But I can run couple of tests during this weekend if needed ...

V.

@gdha
Copy link
Member

gdha commented May 17, 2018

@rmetrich End of next week I can do some automated tests - this weekend is occupied (marriage of my daughter and other work duty unfortunately). Beginning of next week I take a few days off to really relax (in Germany and Luxembourg).

Additional note: ReaR 2.4 is OK for me as my main customer is also struggling with these kind of stuff (with docker containers). And, I believe RedHat had plans to replace there 2.0 version with the latest 2.4 if I'm not mistaken.

@gdha gdha added this to the ReaR v2.4 milestone May 17, 2018
@rmetrich
Copy link
Contributor Author

👍 Perfect!
We have indeed plans to rebase to 2.4 asap (RHEL7.6), but it's not committed.

@jsmeix
Copy link
Member

jsmeix commented May 17, 2018

@rmetrich
only a side note FYI in general
regarding what you wrote in your initial comment

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.

You may have a look at my general opinion about such cases
e.g. in
#1513 (comment)
and
#1204 (comment)
and
#1242 (comment)
and in the latter follow the links to the other issue comments.

But I know basically nothing at all about Thin Pool/Volume or even docker
so that I don't know if it could make any sense in this particular case
to implement config variables where the user could - if needed - specify
his particular settings for his particular Thin Pool/Volume setup.

@jsmeix
Copy link
Member

jsmeix commented May 17, 2018

@rmetrich
another side note FYI in general regarding "when in migration mode":

When in migration mode it is perfectly fine to ask the user
via the UserInput function for needed values and/or let him
check and confirm something before it gets actually set up.

Only when not in migration mode "rear recover" should normally
run unattended - but even if UserInput is also needed in this case
the user could predefine UserInput values to get it run unattended,
cf. the USER_INPUT_... variables in default.conf

@schabrolles
Copy link
Member

schabrolles commented May 19, 2018

@rmetrich,
I tried your patch on a sles11sp4 (POWER arch). I got the following error during rear recover

No code has been generated to recreate fs:/ (fs).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33
Manually add code that recreates fs:/ (fs)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 10 seconds)

FYI, the layout is LVM sles11 default:
/boot in a partition
/ in LVM

I don't have time to look at the reason why ... but just to be sure I've tested several time. I also confirm that I don't get any problem with the current master branch on this system.

I attache the log file of rear -D recover in debug mode
rear-rear-sles11-144.log

I'm gonna test on other system (SLES12 / RHEL7 / RHEL6 and ubuntu)

@rmetrich
Copy link
Contributor Author

rmetrich commented May 19, 2018 via email

@rmetrich
Copy link
Contributor Author

@schabrolles From the Debug log, I do not see any create_lvmvol() call, causing the issue.
Can you provide the Debug log for the rescue creation, it's likely there that the issue is.

@schabrolles
Copy link
Member

@rmetrich,

What is strange is that /boot fs is present in disklayout file (I checked in the rescue image before running rear recover)

# Disk /dev/vda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/vda 53687091200 msdos
# Partitions on /dev/vda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/vda 413696 1048576 primary boot,prep /dev/vda1
part /dev/vda 1077936128 213909504 primary none /dev/vda2
part /dev/vda 52395245568 1291845632 primary lvm /dev/vda3
# Format for LVM PVs
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/rootvg /dev/vda3 9cavrA-gSqy-ROyO-7GWa-PXVI-Ker3-cBtZld 102334464
# Format for LVM VGs
# lvmgrp <volume_group> <extentsize> [<size(extents)>] [<size(bytes)>]
lvmgrp /dev/rootvg 4096 12491 51163136
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/rootvg-lvroot / ext3 uuid=487f413a-b41a-4347-89cd-e4249207ab47 label= blocksize=4096 reserved_blocks=0% max_mounts=27 check_interval=180d bytes_per_inode=13207 options=rw,acl,user_xattr
fs /dev/vda2 /boot ext3 uuid=189efaae-c4e6-4df3-af91-5150b91001f1 label= blocksize=4096 reserved_blocks=0% max_mounts=36 check_interval=180d bytes_per_inode=16380 options=rw,acl,user_xattr
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/mapper/rootvg-lv_swap uuid=89296247-5f68-43c6-884f-862dc506ee26 label=

Here is the log during rear -D mkrescue :
rear-rear-sles11-144.log

@rmetrich
Copy link
Contributor Author

The issue is on /dev/mapper/rootvg-lvroot which has no lvmvol description.
This is due to my new code (me--) which seems to not work with your LVM version (help is being printed):

 2845 ++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
 2846   Logical Volume Fields
 2847   ---------------------
 2848     lv_all               - All fields in this section.
 2849     lv_uuid              - Unique identifier.
...
 2957   Unrecognised field: lv_layout

What is your LVM version?

@schabrolles
Copy link
Member

@rmetrich

LVM version from sles11sp4

lvm> version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

@rmetrich
Copy link
Contributor Author

OK, can reproduce on RHEL6.4 which has similar version.

@rmetrich
Copy link
Contributor Author

@schabrolles That should make it now. I had to add quite a lot of code for older LVM versions.

@schabrolles
Copy link
Member

@rmetrich this is working now ... I'm planning to test it tonight on all the Linux distribution I got.
(rhel6 rhel7 ubuntu sles11 sles12) and test migration tomorrow morning.

Copy link
Member

@schabrolles schabrolles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested with standard LVM (no thinpool) on:

  • rhel6 (ppc64)
  • rhel7 (ppc64/ppc64le)
  • sles11sp4 (ppc64)
  • sles12sp2 (ppc64le)
  • ubuntu16.04 (ppc64le)

Tested during migration on

  • sles11sp4
  • sles12sp2
  • rhel7

Tested with lv_thinpool on rhel7 (ppc64le)

Copy link
Member

@gdha gdha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on the blessing of @jsmeix before I approve this PR


if lvs --noheadings -o thin_count | grep -q -v "^\s*$" ; then
# There are Thin Pools on the system, include required binaries
PROGS=( "${PROGS[@]}" /usr/sbin/thin_* )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmetrich Could you list up (in comment) which executables you need at minimum? I think of thin_check (aka RedHat Solution bulletin 3031921 - https://access.redhat.com/solutions/3031921)
Why do I ask this? I know it has been tested and approved for RHEL7 by @schabrolles - but what about SLES? Are the executables named the same? To be verified by @jsmeix

Personally I will approve once @jsmeix gives his blessing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot test or verify this within a reasonable time,
not this week and next week I am not in the office
and afterwards - with probability one - some other
issues have piled up that have higher priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact they are not really needed except thin_check, since for thin volumes, we recreate these using lvcreate commands, so I'll remove the others (only the libraries are needed).
Let me test again without ...

if lvs --noheadings -o thin_count | grep -q -v "^\s*$" ; then
# There are Thin Pools on the system, include required binaries
PROGS=( "${PROGS[@]}" /usr/sbin/thin_* )
LIBS=( "${LIBS[@]}" /usr/lib64/*lvm2* )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmetrich Same comment as with PROGS array. Any comments @jsmeix ?

@gdha gdha requested a review from jsmeix May 23, 2018 11:29
Copy link
Member

@jsmeix jsmeix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schabrolles @gdha
I can only bildly trust you here
(plain looking at the code changes does not help me here).

@rmetrich
Copy link
Contributor Author

Tested (RHEL7 + RHEL6) and pushed. Would you prefer squashed commits?

@jsmeix
Copy link
Member

jsmeix commented May 23, 2018

I prefer the default behaviour with non-squashed commits
(i.e. keep the individual commits in a pull request as they happened
to keep the history of how something was implemented)
because the final merge commit that merges a whole pull request
into the ReaR upstream master code is such a squashed commit.

For example see in
#1805
the individual commits "inside" the pull request
65d0fc3
bc1fa98
0ebeeac
versus the final merge commit
51d5f94
which result in the ReaR upstream master branch
the following git log entries

|
* s.chabrolles@fr.ibm.com 51d5f949fdd17a0fb205d865f9a4d5704a6650fe Fri May 18 09:24:23 2018 +0200
|\ Merge pull request #1805 from schabrolles/get_only_partition_from_holders :
| | Verify if dm-X is a partition before adding to sysfs_paths
| |
| * s.chabrolles@fr.ibm.com 0ebeeac6d71cf3dba4e7e0085e56e066cf5f9db2 Tue May 15 07:23:51 2018 +0200
| | Update comment :
| |
| * s.chabrolles@fr.ibm.com bc1fa9878dab19ed4c5a0392d98c873c1f7845a7 Mon May 14 22:04:42 2018 +0200
| | add comments :
| |
| * s.chabrolles@fr.ibm.com 65d0fc32e0814545e1fa28c44586e7400a5859e9 Mon May 14 19:04:47 2018 +0200
|/ Verify if dm-X is a partition before adding to sysfs_paths :
|

@rmetrich
Copy link
Contributor Author

Alright, not touching anything then. I also prefer having the history.

@gdha gdha merged commit b8630a6 into rear:master May 28, 2018
@rmetrich
Copy link
Contributor Author

Thanks to all of you for accepting this.

@razorfish-sl
Copy link

razorfish-sl commented Jun 3, 2018

I noticed that REAR was not backing up physical device thin pools, today...
Not lost anything, just noticed on recovery the old data was still in the thinpool

However.....
Just thinking of a case recently.(this week)

I had a "thinpool" that was mapped to a BLOCK FILE ,then remounted on loop, and REAR obviously dealt with it correctly because it was a physical file on /dev/sda ,in the directory tree.
REAR is capable of this case since it sees a file in a directory, the config for LVM is something separate.

So possibly we may need to be careful, since in this case LVM "sees" a thin pool, but actually it is a "lie." it's actually a raw image mounted as loop.

We may get into a situation where "REAR" backs it up correctly, as a general file , BUT then backs it up AGAIN as an LVM thin pool, in this situation how would it re-create the physical device
or not get confused it is actually a file mounted as a "block device"?

@rmetrich rmetrich deleted the ThinPool_1380 branch June 6, 2018 20:06
pcahyna added a commit to pcahyna/rear that referenced this pull request Sep 15, 2021
Fix a problem introduced in commits
b184194 and
311bfb3 (PR rear#1806) that shows up when
there are multiple VGs to restore.

Using variables create_thin_volumes_only and create_logical_volumes to
propagate infromation from VG creation to LV creation does not work well
in the case of multiple VGs, because the variables are global and if
there are multiple VGs, their values will leak from one VG to another.
The generated diskrestore.sh script does not guarantee that the LVs of a
given VG are created immediately after their VG and before creating
another VG. Curently, the script first creates all VGs and then all LVs,
so all the LVs in all VGs will see the value of create_logical_volumes
and create_thin_volumes_only from the last VG, not from their own. This
matters when different VGs behave differently (typically if one has a
thin pool and the other does not).

Fix by replacing the scalar values by arrays of VG names. If a given VG
is in the array, it is the equivalent of the former scalar value being 1
for the given VG, if it is not in the array, it is an equivalent of a
former value of 0.

For the create_volume_group variable the change is not needed, but do it
nevertheless for symmetry with other variables.
pcahyna added a commit to pcahyna/rear that referenced this pull request Oct 4, 2021
Fix a problem introduced in commits
b184194 and
311bfb3 (PR rear#1806) that shows up when
there are multiple VGs to restore.

Using variables create_thin_volumes_only and create_logical_volumes to
propagate infromation from VG creation to LV creation does not work well
in the case of multiple VGs, because the variables are global and if
there are multiple VGs, their values will leak from one VG to another.
The generated diskrestore.sh script does not guarantee that the LVs of a
given VG are created immediately after their VG and before creating
another VG. Curently, the script first creates all VGs and then all LVs,
so all the LVs in all VGs will see the value of create_logical_volumes
and create_thin_volumes_only from the last VG, not from their own. This
matters when different VGs behave differently (typically if one has a
thin pool and the other does not).

Fix by replacing the scalar values by arrays of VG names. If a given VG
is in the array, it is the equivalent of the former scalar value being 1
for the given VG, if it is not in the array, it is an equivalent of a
former value of 0.

For the create_volume_group variable the change is not needed, but do it
nevertheless for symmetry with other variables.
pcahyna added a commit to pcahyna/rear that referenced this pull request Oct 4, 2021
Fix a problem introduced in commits
b184194 and
311bfb3 (PR rear#1806) that shows up when
there are multiple VGs to restore.

Using variables create_thin_volumes_only and create_logical_volumes to
propagate information from VG creation to LV creation does not work well
in the case of multiple VGs, because the variables are global and if
there are multiple VGs, their values will leak from one VG to another.
The generated diskrestore.sh script does not guarantee that the LVs of a
given VG are created immediately after their VG and before creating
another VG. Currently, the script first creates all VGs and then all LVs,
so all the LVs in all VGs will see the value of create_logical_volumes
and create_thin_volumes_only from the last VG, not from their own. This
matters when different VGs behave differently (typically if one has a
thin pool and the other does not).

Fix by replacing the scalar values by arrays of VG names. If a given VG
is in the array, it is the equivalent of the former scalar value being 1
for the given VG, if it is not in the array, it is an equivalent of a
former value of 0.

For the create_volume_group variable the change is not needed, but do it
nevertheless for symmetry with other variables.
pcahyna added a commit to pcahyna/rear that referenced this pull request Feb 17, 2023
Fix a problem introduced in commits
b184194 and
311bfb3 (PR rear#1806) that shows up when
there are multiple VGs to restore.

Using variables create_thin_volumes_only and create_logical_volumes to
propagate infromation from VG creation to LV creation does not work well
in the case of multiple VGs, because the variables are global and if
there are multiple VGs, their values will leak from one VG to another.
The generated diskrestore.sh script does not guarantee that the LVs of a
given VG are created immediately after their VG and before creating
another VG. Curently, the script first creates all VGs and then all LVs,
so all the LVs in all VGs will see the value of create_logical_volumes
and create_thin_volumes_only from the last VG, not from their own. This
matters when different VGs behave differently (typically if one has a
thin pool and the other does not).

Fix by replacing the scalar values by arrays of VG names. If a given VG
is in the array, it is the equivalent of the former scalar value being 1
for the given VG, if it is not in the array, it is an equivalent of a
former value of 0.

For the create_volume_group variable the change is not needed, but do it
nevertheless for symmetry with other variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adaptions and new features fixed / solved / done
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants