Implement a generic "cleanupdisk" function. #540

jsmeix · 2015-01-27T12:55:40Z

I think rear needs a generic "cleanupdisk" function that basically makes an already used harddisk behave as if it was a new harddisk.

When recovery is done on an already used harddisk, the still existing old data on the disk cause varius kind of unexpected weird failures where each kind is difficult to reproduce (because it depends on what exact old data there is).

For some example see #533 (therein the issue that on RHEL6 on a used disk mdadm interferes with parted) and #415 ("mkfs -t btrfs" needs option "-f" to enfore making a btrfs on a disk when there is already a btrfs).

Implementing it might become complicated in practice. (I have new harddisks in mind that might have ex factory a somewhat hidden special partition which must not be wiped or something like this - isn't there something like this for UEFI boot, see http://en.wikipedia.org/wiki/EFI_System_partition ).

A starting point could be "wipefs" a tool that wipes filesystem signatures from a device, see
http://karelzak.blogspot.de/2009/11/wipefs8.html

Regardless how complicated it is in practice, one dedicated function that cleans up the disk before anything else is done makes it much cleaner how rear works (instead of various workarounds here and there as needed).

tbsky · 2015-01-28T03:08:33Z

@jsmeix

in my opinion, it is much easier to use specific right tools to do the job. it is very hard to have a generic cleanupdisk function. it is because linux support so many different storage objects. and these objects can be dependent to each other. if the basement object is missed, then you can not reach upper object easily without scanning the whole disk. take one machine in my environment for example:

it has two hard disk: sda,sdb
partition is: sda1,sda2,sdb1,sdb2
make software raid above partition: md0 (sda1+sdb1), md1 (sda2+sdb2)
make lvm above software raid: rootvg-> my-lv1,my-lv2 (above md1)
make drbd above lvm: drbd1 (above my-lv1)
make file system above drbd: xfs (above drbd1)

each storage object has it's own metadata structure, at beginning/end/middle(lvm lv) of the object.
if the harddisk partition table is corrupted, all the upper layer information is missed, so the cleanup disk tools can not reach these layers to do clean job.

and there are several metadata revisions of each storage object.it is hard for a general tool to track all the changes. but the job is easy for specific right tools (mdadm,lvcreate,mkfs.xfs...). and all these tools has parameters or workarround to overwrite existing metadata.

so maybe we just need to make sure the specific tools to deal with the storage object can overwrite existing data correctly, then we are safe.

or we can full erase all the disks. but as you said, it will spend too many hours/days to do the job.

jsmeix · 2015-01-28T10:24:19Z

@tbsky,
again many thanks for your valuable descriptive information!

For me it is perfectly o.k. to have for each kind of storage object a separated generic cleanup function.

In particluar because I do prefer very much to Keep Separated Stuff Separated ( "KSSS" ;-) cf. what I wrote "on 11 Dec 2014" in #497

tbsky · 2015-01-28T10:38:09Z

@jsmeix

I know I will enjoy by your hard work when RHEL officially support btrfs someday. I hope it can be there this year :)

jsmeix · 2015-01-28T10:49:40Z

@tbsky

I forgot to reply to your "maybe we just need to make sure the specific tools to deal with the storage object can overwrite existing data correctly":

From what I learned during #533 it is not possible that the specific tools overwrite existing data because when the specific tool runs it can be already too late.

In #533 it failed for RHEL6 at the partitioning level because of old data of the MD level so that before partitioning the MD tool would have to be run to clean up old MD data.

This is exactly the reason why I think we need a generic way how to clean up old data.

Perhaps your information may lead to the conclusion that such a generic way is not possible.

This would be also a perfectly valid result because then no installer can reliably install on an already used disk with arbitrary old data on it.

In this case the solution is to document this and then it is left to the user to make sure in advance that his disks are sufficiently clean and if anything goes wrong because of old data on the disk it is no longer an issue for an installer (in particular rear).

jsmeix · 2015-01-28T10:56:56Z

Only for fun:

I predict when RHEL officially supports btrfs, they will devise another special way how one can set up btrfs where my current implementation fails ;-)

@tbsky
I rely on you to report issues with btrfs on RHEL early!

jsmeix · 2015-01-28T11:34:52Z

Regarding clean up DRBD I found in
https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_drbd_overview.html

DRBD uses the last 128 MB of the raw device for metadata

This means what @schlomo wrote in #533 "maybe just delete the first and last couple of MB on each previously existing partition?" does not work - specifically I mean "just a couple of MB" is not sufficient (if 128 MB is not "just a couple of MB").

schlomo · 2015-01-28T12:14:14Z

I update my suggestion: Wipe the first 256MB, the last 256MB and the
middle 256 MB of each disk / device :-)

On 28 January 2015 at 12:34, Johannes Meixner notifications@github.com
wrote:

Regarding clean up DRBD I found in

https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_drbd_overview.html

DRBD uses the last 128 MB of the raw device for metadata

This means what @schlomo https://github.com/schlomo wrote in #533
#533 "maybe just delete the first
and last couple of MB on each previously existing partition?" does not work

specifically I mean "just a couple of MB" is not sufficient (if 128 MB is
not "just a couple of MB").

—
Reply to this email directly or view it on GitHub
#540 (comment).

tbsky · 2015-01-28T13:31:39Z

@jsmeix
the suse drbd document seems not quite correct. drbd with fixed external metadata has limit with 128MB. with internal metadata it may over 128MB. please check below if you are interested :)
http://lists.linbit.com/pipermail/drbd-user/2008-June/009628.html

the external metadata may make the clean-up more crazy. but I think they still fit schlomo's 256MB plan :-D

jsmeix · 2015-11-06T12:50:13Z

Regarding "wipefs" see also #649

In particular therein see
#649 (comment)
how wipefs could be used by deault in rear - regardless that wipefs
did not help in that particular case it could be nevertheless used
by default in rear to avoid possible issues.

gdha · 2015-11-17T13:56:22Z

@jsmeix wipefs function could be useful indeed. However, we should foresee a fall-back when wipefs is not available, e.g. use dd instead.

before creating filesystems there, see rear#540 and rear#649 (comment) Currently "when available" means that on has to manually add it to the rear recovery system in in /etc/rear/local.conf via REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" wipefs ) Making wipefs automatically available to the rear recovery system when it is available in the original system is a next step.

jsmeix · 2015-11-19T13:39:32Z

As a first step I implemented in #704
using wipefs when available

Currently "when available" means that on has to manually add it
to the rear recovery system in in /etc/rear/local.conf via

REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" wipefs )

cf. #649 (comment)

Making wipefs automatically available to the rear recovery system
when it is available in the original system is the next step that I will
implement.

Using another program (e.g. dd) as fallback if wipefs is not available
is something for the future.

see rear#540

jsmeix · 2015-12-02T15:54:20Z

With #728 I think this issue is sufficiently fixed.

Using another program (e.g. dd) as fallback if wipefs is not available
is something for the future - perhaps best via a separated follow-up issue - provided such a fallback is really needed on nowadays systems: wipefs is available on SLE11 and SLE12 (wipefs was added on Apr 13 2011 in the SUSE's util-linux RPM (therefore wipefs is not available on SLE10-SP4 where the util-linux RPM changelog ends on Feb 03 2011) and I guess wipefs is also available on recent Red Hat systems.

jsmeix · 2015-12-03T09:47:11Z

With #728 this issue is sufficiently fixed.

jsmeix · 2015-12-09T13:48:38Z

Right now (during #732) I detected that in some cases one must use

wipefs -a -f /dev/sdXn

because without '-f' (--force) wipefs will not erase a partition table
on a block device that is a partition (e.g. /dev/sda1).

I had the strange case that wipefs detected a DOS partition table
on the partition /dev/sda2 (note that it is not the whole disk /dev/sda
where a partition table must exist - but here it is a partition table
at the beginning of a partition).

With the DOS partition table at the beginning of /dev/sda2
the subsequent "mkfs -t ext4 /dev/sda2" stopped with
a yes/no question whether or not to proceed.

Probably it is even a bug when "rear recover" hangs
because of such an issue?

Perhaps an alternative to "wipefs -a -f /dev/sdXn" is

mkfs.ext4 -F /dev/sdXn

I will think a bit about it what is better.

Currently I perefer the generic way via "wipefs -a -f..."
over adding specific "force" options to each mkfs call.

I only need to ensure to never do "wipefs -a -f /dev/sda"
because that would erase the partition table on the disk
that was just before created by parted (see diskrestore.sh).

It is still a SUSE-specific hack. Not at all generic. But it works! See rear#732 Additionally it uses "wipefs -f". See rear#540 (comment)

tbsky · 2015-12-09T23:51:38Z

hi jsmeix:

I don't use ext4 so I am surprised about your finding. as I said I don't think wipefs can detect/clean complicated storage object (like stacking of mdadm/drbd/lvm), so the correct behavior of specific storage tools is the last rescue. all the tools I need in rear now has correct behavior like below:

mkfs.xfs -f
mdadm --create --force
lvm lvcreate <<<y
drbdadm -- --force create-md

so I think no matter what wipefs can do, "mkfs.ext4 -F" is must to be last rescue.

tbsky · 2015-12-10T03:06:05Z

@jsmeix

it's maybe off topic. as you mentioned, the original problem wipefs want to solve is like issue #533.
the root cause of these issues is the "automatic behavior" of linux environment. if linux didn't have these "automatic behavior" rear would be much more easier to do its job.

in my case I found two kinds of "automatic behavior":

the first is from boot module loading, which cause issues like #480 and #626. we can blacklist useless modules, or maybe whitelist useful modules after booting. I don't know if there are general ways to do this for every distribution. maybe hide the modules at the beginning and unhide them after booting?

the second comes from udev event, which cause issues like #518. at first I think solve the issue via udevadm is a great idea. but unfortunately SuSE also try to solve the same problem so the two patches conflict hence issue #533.

I don't know if hide/unhide the modules will help stopping these kind of "automatic behavior". but as there will more storage modules in the future, if we can find a general way to handle the "automatic behavior" then rear will be much easier to re-create these storage object.

It is still a SUSE-specific hack. Not at all generic. But it works! See rear#732 Additionally it uses "wipefs -f". See rear#540 (comment)

jsmeix · 2016-03-15T15:20:42Z

I close this one because it is "somewhat done" for version 1.18 and probably it will become "mostly obsoleted" by #799 which might even (hopefully) really fix those issues.

jsmeix mentioned this issue Jan 27, 2015

rear git201501071534 "udevadm settle --timeout=20" hangs endlessly in recovery system for openSUSE 13.2 #533

Closed

gdha added the discuss / RFC label Jan 28, 2015

gdha added this to the Rear future milestone Jan 28, 2015

jsmeix mentioned this issue Nov 6, 2015

wrong uuid in initrd for bootfs #649

Closed

jsmeix mentioned this issue Nov 19, 2015

Use wipefs when available to clean up disk partitions #704

Merged

jsmeix self-assigned this Nov 20, 2015

jsmeix added enhancement Adaptions and new features and removed discuss / RFC labels Nov 20, 2015

jsmeix added a commit to jsmeix/rear that referenced this issue Dec 2, 2015

Have wipefs automatically available in recovery system

f7e1da5

see rear#540

jsmeix mentioned this issue Dec 2, 2015

Have wipefs automatically available in recovery system #728

Merged

jsmeix closed this as completed Dec 3, 2015

jsmeix added the fixed / solved / done label Dec 3, 2015

jsmeix reopened this Dec 9, 2015

jsmeix added bug The code does not do what it is meant to do waiting for info labels Dec 9, 2015

jsmeix added discuss / RFC and removed fixed / solved / done labels Dec 9, 2015

jsmeix added a commit to jsmeix/rear that referenced this issue Dec 9, 2015

First usable implementation of the "install" workflow.

900e59c

It is still a SUSE-specific hack. Not at all generic. But it works! See rear#732 Additionally it uses "wipefs -f". See rear#540 (comment)

jsmeix added a commit to jsmeix/rear that referenced this issue Dec 10, 2015

First usable implementation of the "install" workflow.

cab758a

It is still a SUSE-specific hack. Not at all generic. But it works! See rear#732 Additionally it uses "wipefs -f". See rear#540 (comment)

This was referenced Mar 14, 2016

Waiting for udev and "kicking udev" are wrong (both miss the point) #791

Closed

Clean up disks before recreating partitions/volumes/filesystems/... #799

Closed

jsmeix added fixed / solved / done and removed discuss / RFC waiting for info labels Mar 15, 2016

jsmeix modified the milestones: Rear v1.18, Rear future Mar 15, 2016

jsmeix closed this as completed Mar 15, 2016

jsmeix mentioned this issue Jun 25, 2021

Wipe disks before recreating partitions/volumes/filesystems/... #2514

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a generic "cleanupdisk" function. #540

Implement a generic "cleanupdisk" function. #540

jsmeix commented Jan 27, 2015

tbsky commented Jan 28, 2015

jsmeix commented Jan 28, 2015

tbsky commented Jan 28, 2015

jsmeix commented Jan 28, 2015

jsmeix commented Jan 28, 2015

jsmeix commented Jan 28, 2015

schlomo commented Jan 28, 2015

tbsky commented Jan 28, 2015

jsmeix commented Nov 6, 2015

gdha commented Nov 17, 2015

jsmeix commented Nov 19, 2015

jsmeix commented Dec 2, 2015

jsmeix commented Dec 3, 2015

jsmeix commented Dec 9, 2015

tbsky commented Dec 9, 2015

tbsky commented Dec 10, 2015

jsmeix commented Mar 15, 2016

Implement a generic "cleanupdisk" function. #540

Implement a generic "cleanupdisk" function. #540

Comments

jsmeix commented Jan 27, 2015

tbsky commented Jan 28, 2015

jsmeix commented Jan 28, 2015

tbsky commented Jan 28, 2015

jsmeix commented Jan 28, 2015

jsmeix commented Jan 28, 2015

jsmeix commented Jan 28, 2015

schlomo commented Jan 28, 2015

tbsky commented Jan 28, 2015

jsmeix commented Nov 6, 2015

gdha commented Nov 17, 2015

jsmeix commented Nov 19, 2015

jsmeix commented Dec 2, 2015

jsmeix commented Dec 3, 2015

jsmeix commented Dec 9, 2015

tbsky commented Dec 9, 2015

tbsky commented Dec 10, 2015

jsmeix commented Mar 15, 2016