Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

rmetrich · 2018-05-15T15:42:01Z

Using 'vgcfgrestore' to restore Thin LVs doesn't work, because there
isn't the metadata in the configuration file.
To be able to use Thin LVs, upon failure of 'vgcfgrestore' (or migration),
traditional 'vgcreate/lvcreate' are used instead.

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.
Nonetheless, this is better than nothing since the issue with 'lvcreate'
options was already there in case of migration.

The code using 'lvcreate' only runs upon 'vgcfgrestore' failure, which
typically happens only with Thin LVs (IMHO). Hence, on most setups, the
optimized code will still run, preventing any regression to happen.

Signed-off-by: Renaud Métrich rmetrich@redhat.com

PLEASE TEST MORE.

I tested as follows:

regular system with 1 VG, no thin pool
system with 1 VG, and root filesystem on thin pool + some special LVs (mirror, raid1, raid5, etc).

I plan to test tomorrow on a system with 1 VG, no thin pool but some special LVs (mirror, raid, etc) to verify that vgcfgrestore also works with special LVs.

…l/Volume Using 'vgcfgrestore' to restore Thin LVs doesn't work, because there isn't the metadata in the configuration file. To be able to use Thin LVs, upon failure of 'vgcfgrestore' (or migration), traditional 'vgcreate/lvcreate' are used instead. This approach is sub-optimal because it requires to know all the possible options to 'lvcreate', which is hard/impossible to do. Nonetheless, this is better than nothing since the issue with 'lvcreate' options was already there in case of migration. The code using 'lvcreate' only runs upon 'vgcfgrestore' failure, which typically happens only with Thin LVs (IMHO). Hence, on most setups, the optimized code will still run, preventing any regression to happen. Signed-off-by: Renaud Métrich <rmetrich@redhat.com>

…ove Thin volumes being recovered, since they are broken.

rmetrich · 2018-05-16T10:50:18Z

The proposed code does the following:

During backup

Collect additional LV properties

origin: originating LV (for cache and snapshots)
lv_layout: type of LV (mirror, raid, thin, etc)
pool_lv: thin pool hosting LV
chunk_size: size of the chunk, for various volumes
stripes: number of stripes, for Mirror volumes and Stripes volumes
stripe_size: size of a stripe, for Raid volumes
Skip caches and snapshots (not supported)

During restore

If in Migration mode (e.g. different disks but same size), go through the vgcreate/lvcreate code (Legacy Method), printing Warnings because the initial layout may not be preserved (because we do not save all attributes needed for re-creating LVM volumes)
Otherwise, try "vgcfgrestore"
If it fails
- Try "vgcfgrestore --force"
- If it fails, use vgcreate/lvcreate (Legacy Method)
- Otherwise, remove Thin pools (which are broken due to --force flag)
- Create Thin pools using Legacy Method (but do not create other LVs which have been succesfully restored)

jsmeix · 2018-05-17T11:24:24Z

@gdha
I am currently too busy with other stuff so that I cannot have a look here right now.
Is this enhancement useful for ReaR 2.4 or can it be postponed to ReaR 2.5?

rmetrich · 2018-05-17T11:29:01Z

Thin pools are widely used along with docker.
The code has been implemented to prevent any regression when not using thin pools (the vgcfgrestore should just work). When in migration mode, the code should enhance the recovery, but this requires some testing (I tested with Thin pools, Linear volumes and Raid/Mirror volumes).

gozora · 2018-05-17T11:45:00Z

Hello @rmetrich at first glance it looks good to me. I'm however not that good in reading plain code :-/ ...
But I can run couple of tests during this weekend if needed ...

V.

gdha · 2018-05-17T12:49:38Z

@rmetrich End of next week I can do some automated tests - this weekend is occupied (marriage of my daughter and other work duty unfortunately). Beginning of next week I take a few days off to really relax (in Germany and Luxembourg).

Additional note: ReaR 2.4 is OK for me as my main customer is also struggling with these kind of stuff (with docker containers). And, I believe RedHat had plans to replace there 2.0 version with the latest 2.4 if I'm not mistaken.

rmetrich · 2018-05-17T12:54:08Z

👍 Perfect!
We have indeed plans to rebase to 2.4 asap (RHEL7.6), but it's not committed.

jsmeix · 2018-05-17T13:07:47Z

@rmetrich
only a side note FYI in general
regarding what you wrote in your initial comment

This approach is sub-optimal because it requires to know all the
possible options to 'lvcreate', which is hard/impossible to do.

You may have a look at my general opinion about such cases
e.g. in
#1513 (comment)
and
#1204 (comment)
and
#1242 (comment)
and in the latter follow the links to the other issue comments.

But I know basically nothing at all about Thin Pool/Volume or even docker
so that I don't know if it could make any sense in this particular case
to implement config variables where the user could - if needed - specify
his particular settings for his particular Thin Pool/Volume setup.

jsmeix · 2018-05-17T13:15:17Z

@rmetrich
another side note FYI in general regarding "when in migration mode":

When in migration mode it is perfectly fine to ask the user
via the UserInput function for needed values and/or let him
check and confirm something before it gets actually set up.

Only when not in migration mode "rear recover" should normally
run unattended - but even if UserInput is also needed in this case
the user could predefine UserInput values to get it run unattended,
cf. the USER_INPUT_... variables in default.conf

schabrolles · 2018-05-19T16:52:55Z

@rmetrich,
I tried your patch on a sles11sp4 (POWER arch). I got the following error during rear recover

No code has been generated to recreate fs:/ (fs).
    To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort.
UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33
Manually add code that recreates fs:/ (fs)
1) View /var/lib/rear/layout/diskrestore.sh
2) Edit /var/lib/rear/layout/diskrestore.sh
3) Go to Relax-and-Recover shell
4) Continue 'rear recover'
5) Abort 'rear recover'
(default '4' timeout 10 seconds)

FYI, the layout is LVM sles11 default:
/boot in a partition
/ in LVM

I don't have time to look at the reason why ... but just to be sure I've tested several time. I also confirm that I don't get any problem with the current master branch on this system.

I attache the log file of rear -D recover in debug mode
rear-rear-sles11-144.log

I'm gonna test on other system (SLES12 / RHEL7 / RHEL6 and ubuntu)

rmetrich · 2018-05-19T18:30:34Z

I'll have a look on tuesday. Thanks for reporting. Renaud. Out of the office / Sent from my phone. Le sam. 19 mai 2018 18:53, Sébastien Chabrolles <notifications@github.com> a écrit :

…

@rmetrich <https://github.com/rmetrich>, I tried your patch on a sles11sp4 (POWER arch). I got the following error during rear recover No code has been generated to recreate fs:/ (fs). To recreate it manually add code to /var/lib/rear/layout/diskrestore.sh or abort. UserInput -I ADD_CODE_TO_RECREATE_MISSING_FSFS needed in /usr/share/rear/layout/prepare/default/600_show_unprocessed.sh line 33 Manually add code that recreates fs:/ (fs) 1) View /var/lib/rear/layout/diskrestore.sh 2) Edit /var/lib/rear/layout/diskrestore.sh 3) Go to Relax-and-Recover shell 4) Continue 'rear recover' 5) Abort 'rear recover' (default '4' timeout 10 seconds) I don't have time to look at the reason why ... but just to be sure I've tested several time. I also confirm that I don't get any problem with the current master branch on this system. I attache the log file of rear -D recover in debug mode rear-rear-sles11-144.log <https://github.com/rear/rear/files/2019653/rear-rear-sles11-144.log> I'm gonna test on other system (SLES12 / RHEL7 / RHEL6 and ubuntu) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1806 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABHBc11N1T9p6KIBGovoWBzJfBUGTsFzks5t0E3ogaJpZM4T_zHF> .

rmetrich · 2018-05-22T08:06:04Z

@schabrolles From the Debug log, I do not see any create_lvmvol() call, causing the issue.
Can you provide the Debug log for the rescue creation, it's likely there that the issue is.

schabrolles · 2018-05-22T08:25:35Z

@rmetrich,

What is strange is that /boot fs is present in disklayout file (I checked in the rescue image before running rear recover)

# Disk /dev/vda
# Format: disk <devname> <size(bytes)> <partition label type>
disk /dev/vda 53687091200 msdos
# Partitions on /dev/vda
# Format: part <device> <partition size(bytes)> <partition start(bytes)> <partition type|name> <flags> /dev/<partition>
part /dev/vda 413696 1048576 primary boot,prep /dev/vda1
part /dev/vda 1077936128 213909504 primary none /dev/vda2
part /dev/vda 52395245568 1291845632 primary lvm /dev/vda3
# Format for LVM PVs
# lvmdev <volume_group> <device> [<uuid>] [<size(bytes)>]
lvmdev /dev/rootvg /dev/vda3 9cavrA-gSqy-ROyO-7GWa-PXVI-Ker3-cBtZld 102334464
# Format for LVM VGs
# lvmgrp <volume_group> <extentsize> [<size(extents)>] [<size(bytes)>]
lvmgrp /dev/rootvg 4096 12491 51163136
# Filesystems (only ext2,ext3,ext4,vfat,xfs,reiserfs,btrfs are supported).
# Format: fs <device> <mountpoint> <fstype> [uuid=<uuid>] [label=<label>] [<attributes>]
fs /dev/mapper/rootvg-lvroot / ext3 uuid=487f413a-b41a-4347-89cd-e4249207ab47 label= blocksize=4096 reserved_blocks=0% max_mounts=27 check_interval=180d bytes_per_inode=13207 options=rw,acl,user_xattr
fs /dev/vda2 /boot ext3 uuid=189efaae-c4e6-4df3-af91-5150b91001f1 label= blocksize=4096 reserved_blocks=0% max_mounts=36 check_interval=180d bytes_per_inode=16380 options=rw,acl,user_xattr
# Swap partitions or swap files
# Format: swap <filename> uuid=<uuid> label=<label>
swap /dev/mapper/rootvg-lv_swap uuid=89296247-5f68-43c6-884f-862dc506ee26 label=

Here is the log during rear -D mkrescue :
rear-rear-sles11-144.log

rmetrich · 2018-05-22T08:43:01Z

The issue is on /dev/mapper/rootvg-lvroot which has no lvmvol description.
This is due to my new code (me--) which seems to not work with your LVM version (help is being printed):

 2845 ++ lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size
 2846   Logical Volume Fields
 2847   ---------------------
 2848     lv_all               - All fields in this section.
 2849     lv_uuid              - Unique identifier.
...
 2957   Unrecognised field: lv_layout

What is your LVM version?

schabrolles · 2018-05-22T09:19:40Z

@rmetrich

LVM version from sles11sp4

lvm> version
  LVM version:     2.02.98(2) (2012-10-15)
  Library version: 1.03.01 (2011-10-15)
  Driver version:  4.25.0

rmetrich · 2018-05-22T09:52:48Z

OK, can reproduce on RHEL6.4 which has similar version.

rmetrich · 2018-05-22T12:56:05Z

@schabrolles That should make it now. I had to add quite a lot of code for older LVM versions.

schabrolles · 2018-05-22T16:50:56Z

@rmetrich this is working now ... I'm planning to test it tonight on all the Linux distribution I got.
(rhel6 rhel7 ubuntu sles11 sles12) and test migration tomorrow morning.

schabrolles

Just tested with standard LVM (no thinpool) on:

rhel6 (ppc64)
rhel7 (ppc64/ppc64le)
sles11sp4 (ppc64)
sles12sp2 (ppc64le)
ubuntu16.04 (ppc64le)

Tested during migration on

sles11sp4
sles12sp2
rhel7

Tested with lv_thinpool on rhel7 (ppc64le)

gdha

Waiting on the blessing of @jsmeix before I approve this PR

gdha · 2018-05-23T11:26:41Z

usr/share/rear/prep/GNU/Linux/220_include_lvm_tools.sh

+
+if lvs --noheadings -o thin_count | grep -q -v "^\s*$" ; then
+    # There are Thin Pools on the system, include required binaries
+    PROGS=( "${PROGS[@]}" /usr/sbin/thin_* )


@rmetrich Could you list up (in comment) which executables you need at minimum? I think of thin_check (aka RedHat Solution bulletin 3031921 - https://access.redhat.com/solutions/3031921)
Why do I ask this? I know it has been tested and approved for RHEL7 by @schabrolles - but what about SLES? Are the executables named the same? To be verified by @jsmeix

Personally I will approve once @jsmeix gives his blessing.

I cannot test or verify this within a reasonable time,
not this week and next week I am not in the office
and afterwards - with probability one - some other
issues have piled up that have higher priority.

In fact they are not really needed except thin_check, since for thin volumes, we recreate these using lvcreate commands, so I'll remove the others (only the libraries are needed).
Let me test again without ...

gdha · 2018-05-23T11:27:29Z

usr/share/rear/prep/GNU/Linux/220_include_lvm_tools.sh

+if lvs --noheadings -o thin_count | grep -q -v "^\s*$" ; then
+    # There are Thin Pools on the system, include required binaries
+    PROGS=( "${PROGS[@]}" /usr/sbin/thin_* )
+    LIBS=( "${LIBS[@]}" /usr/lib64/*lvm2* )


@rmetrich Same comment as with PROGS array. Any comments @jsmeix ?

jsmeix

@schabrolles @gdha
I can only bildly trust you here
(plain looking at the code changes does not help me here).

rmetrich · 2018-05-23T12:03:04Z

Tested (RHEL7 + RHEL6) and pushed. Would you prefer squashed commits?

jsmeix · 2018-05-23T13:00:11Z

I prefer the default behaviour with non-squashed commits
(i.e. keep the individual commits in a pull request as they happened
to keep the history of how something was implemented)
because the final merge commit that merges a whole pull request
into the ReaR upstream master code is such a squashed commit.

For example see in
#1805
the individual commits "inside" the pull request
65d0fc3
bc1fa98
0ebeeac
versus the final merge commit
51d5f94
which result in the ReaR upstream master branch
the following git log entries

|
* s.chabrolles@fr.ibm.com 51d5f949fdd17a0fb205d865f9a4d5704a6650fe Fri May 18 09:24:23 2018 +0200
|\ Merge pull request #1805 from schabrolles/get_only_partition_from_holders :
| | Verify if dm-X is a partition before adding to sysfs_paths
| |
| * s.chabrolles@fr.ibm.com 0ebeeac6d71cf3dba4e7e0085e56e066cf5f9db2 Tue May 15 07:23:51 2018 +0200
| | Update comment :
| |
| * s.chabrolles@fr.ibm.com bc1fa9878dab19ed4c5a0392d98c873c1f7845a7 Mon May 14 22:04:42 2018 +0200
| | add comments :
| |
| * s.chabrolles@fr.ibm.com 65d0fc32e0814545e1fa28c44586e7400a5859e9 Mon May 14 19:04:47 2018 +0200
|/ Verify if dm-X is a partition before adding to sysfs_paths :
|

rmetrich · 2018-05-23T13:18:10Z

Alright, not touching anything then. I also prefer having the history.

rmetrich · 2018-05-28T09:04:21Z

Thanks to all of you for accepting this.

razorfish-sl · 2018-06-03T04:51:59Z

I noticed that REAR was not backing up physical device thin pools, today...
Not lost anything, just noticed on recovery the old data was still in the thinpool

However.....
Just thinking of a case recently.(this week)

I had a "thinpool" that was mapped to a BLOCK FILE ,then remounted on loop, and REAR obviously dealt with it correctly because it was a physical file on /dev/sda ,in the directory tree.
REAR is capable of this case since it sees a file in a directory, the config for LVM is something separate.

So possibly we may need to be careful, since in this case LVM "sees" a thin pool, but actually it is a "lie." it's actually a raw image mounted as loop.

We may get into a situation where "REAR" backs it up correctly, as a general file , BUT then backs it up AGAIN as an LVM thin pool, in this situation how would it re-create the physical device
or not get confused it is actually a file mounted as a "block device"?

Fix a problem introduced in commits b184194 and 311bfb3 (PR rear#1806) that shows up when there are multiple VGs to restore. Using variables create_thin_volumes_only and create_logical_volumes to propagate infromation from VG creation to LV creation does not work well in the case of multiple VGs, because the variables are global and if there are multiple VGs, their values will leak from one VG to another. The generated diskrestore.sh script does not guarantee that the LVs of a given VG are created immediately after their VG and before creating another VG. Curently, the script first creates all VGs and then all LVs, so all the LVs in all VGs will see the value of create_logical_volumes and create_thin_volumes_only from the last VG, not from their own. This matters when different VGs behave differently (typically if one has a thin pool and the other does not). Fix by replacing the scalar values by arrays of VG names. If a given VG is in the array, it is the equivalent of the former scalar value being 1 for the given VG, if it is not in the array, it is an equivalent of a former value of 0. For the create_volume_group variable the change is not needed, but do it nevertheless for symmetry with other variables.

Fix a problem introduced in commits b184194 and 311bfb3 (PR rear#1806) that shows up when there are multiple VGs to restore. Using variables create_thin_volumes_only and create_logical_volumes to propagate information from VG creation to LV creation does not work well in the case of multiple VGs, because the variables are global and if there are multiple VGs, their values will leak from one VG to another. The generated diskrestore.sh script does not guarantee that the LVs of a given VG are created immediately after their VG and before creating another VG. Currently, the script first creates all VGs and then all LVs, so all the LVs in all VGs will see the value of create_logical_volumes and create_thin_volumes_only from the last VG, not from their own. This matters when different VGs behave differently (typically if one has a thin pool and the other does not). Fix by replacing the scalar values by arrays of VG names. If a given VG is in the array, it is the equivalent of the former scalar value being 1 for the given VG, if it is not in the array, it is an equivalent of a former value of 0. For the create_volume_group variable the change is not needed, but do it nevertheless for symmetry with other variables.

Fix a problem introduced in commits b184194 and 311bfb3 (PR rear#1806) that shows up when there are multiple VGs to restore. Using variables create_thin_volumes_only and create_logical_volumes to propagate infromation from VG creation to LV creation does not work well in the case of multiple VGs, because the variables are global and if there are multiple VGs, their values will leak from one VG to another. The generated diskrestore.sh script does not guarantee that the LVs of a given VG are created immediately after their VG and before creating another VG. Curently, the script first creates all VGs and then all LVs, so all the LVs in all VGs will see the value of create_logical_volumes and create_thin_volumes_only from the last VG, not from their own. This matters when different VGs behave differently (typically if one has a thin pool and the other does not). Fix by replacing the scalar values by arrays of VG names. If a given VG is in the array, it is the equivalent of the former scalar value being 1 for the given VG, if it is not in the array, it is an equivalent of a former value of 0. For the create_volume_group variable the change is not needed, but do it nevertheless for symmetry with other variables.

rmetrich added 3 commits May 15, 2018 17:35

Added support for 'striped' LVs and fixed issue with caches

d6a784a

Alternate method using 'vgcfgrestore': use the '--force' flag and rem…

311bfb3

…ove Thin volumes being recovered, since they are broken.

gdha added the enhancement Adaptions and new features label May 16, 2018

gdha requested review from gdha, jsmeix, gozora and schabrolles May 16, 2018 15:32

gdha self-assigned this May 16, 2018

jsmeix removed their request for review May 17, 2018 11:24

gdha added this to the ReaR v2.4 milestone May 17, 2018

Added support for LVM <= 2.02.98

84f03a4

schabrolles approved these changes May 23, 2018

View reviewed changes

gdha reviewed May 23, 2018

View reviewed changes

gdha requested a review from jsmeix May 23, 2018 11:29

jsmeix approved these changes May 23, 2018

View reviewed changes

Removed unneeded 'thin_*' tools (keeping 'thin_check' only)

8676ea7

gdha merged commit b8630a6 into rear:master May 28, 2018

rmetrich deleted the ThinPool_1380 branch June 6, 2018 20:06

jsmeix added the fixed / solved / done label Oct 29, 2018

jsmeix mentioned this pull request Dec 21, 2018

Recreating LVM not working with ReaR 2.4 on RHEL 7.6 #2009

Closed

This was referenced Sep 2, 2019

Thin pool recreation logic / use of vgcfgrestore is broken #2222

Closed

Fix for #2222 - Thin pool recreation logic / use of vgcfgrestore is broken #2224

Merged

pcahyna mentioned this pull request Oct 4, 2021

Fix multiple vg recreation #2691

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

rmetrich commented May 15, 2018

rmetrich commented May 16, 2018

jsmeix commented May 17, 2018

rmetrich commented May 17, 2018

gozora commented May 17, 2018

gdha commented May 17, 2018 •

edited

rmetrich commented May 17, 2018

jsmeix commented May 17, 2018

jsmeix commented May 17, 2018

schabrolles commented May 19, 2018 •

edited

rmetrich commented May 19, 2018 via email

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

rmetrich commented May 22, 2018

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

schabrolles left a comment

gdha left a comment

gdha May 23, 2018

jsmeix May 23, 2018

rmetrich May 23, 2018

gdha May 23, 2018

jsmeix left a comment

rmetrich commented May 23, 2018

jsmeix commented May 23, 2018 •

edited

rmetrich commented May 23, 2018

rmetrich commented May 28, 2018

razorfish-sl commented Jun 3, 2018 •

edited

Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

Issue #1380 - ReaR recovery fails when the OS contains a Thin Pool/Volume #1806

Conversation

rmetrich commented May 15, 2018

rmetrich commented May 16, 2018

jsmeix commented May 17, 2018

rmetrich commented May 17, 2018

gozora commented May 17, 2018

gdha commented May 17, 2018 • edited

rmetrich commented May 17, 2018

jsmeix commented May 17, 2018

jsmeix commented May 17, 2018

schabrolles commented May 19, 2018 • edited

rmetrich commented May 19, 2018 via email

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

rmetrich commented May 22, 2018

rmetrich commented May 22, 2018

schabrolles commented May 22, 2018

schabrolles left a comment

Choose a reason for hiding this comment

gdha left a comment

Choose a reason for hiding this comment

gdha May 23, 2018

Choose a reason for hiding this comment

jsmeix May 23, 2018

Choose a reason for hiding this comment

rmetrich May 23, 2018

Choose a reason for hiding this comment

gdha May 23, 2018

Choose a reason for hiding this comment

jsmeix left a comment

Choose a reason for hiding this comment

rmetrich commented May 23, 2018

jsmeix commented May 23, 2018 • edited

rmetrich commented May 23, 2018

rmetrich commented May 28, 2018

razorfish-sl commented Jun 3, 2018 • edited

gdha commented May 17, 2018 •

edited

schabrolles commented May 19, 2018 •

edited

jsmeix commented May 23, 2018 •

edited

razorfish-sl commented Jun 3, 2018 •

edited