Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure rescue contains all COPY_AS_IS files #3027

Merged
merged 2 commits into from Jul 20, 2023

Conversation

rmetrich
Copy link
Contributor

Relax-and-Recover (ReaR) Pull Request Template

Please fill in the following items before submitting a new pull request:

Pull Request Details:
  • Type: Bug Fix

  • Impact: Normal

  • Reference to related issue (URL): N/A

  • How was this pull request tested?

Using an automated reproducer implemented using systemtap

  1. Create a file in /dev which will be embedded in the rescue environment

    # dd if=/dev/urandom of=/dev/shrinking bs=10K count=3
    3+0 records in
    3+0 records out
    30720 bytes (31 kB, 30 KiB) copied, 0.000689566 s, 44.5 MB/s
    
  2. Execute the systemtap script in charge of shrinking the file while being copied

    # stap -v -g ./shrinking.stp
    [...]
    Pass 5: starting run.
    
  3. Execute rear mkrescue from another terminal

  • Brief description of the changes in this pull request:

The files copied as-is in the rescue are copied using a tar -c | tar -x command.
It may happen that inodes are shrinking during the copy (e.g. /dev/mqueue/nnsc (HPE device)), which can cause the tar -x command to stop extracting the next files, due to the padding zeros inserted in the shrank inode.
This leads to an error when checking the integrity of the rescue environment.

The solution is to add -i option when extracting, to continue the processing.

@rmetrich
Copy link
Contributor Author

Without the fix:

ERROR: ReaR recovery system in '/var/tmp/rear.tpfZyNy6ayS53wP/rootfs' not usable (required libraries are missing)
Some latest log messages since the last called script 990_verify_rootfs.sh:
  wipefs is /bin/wipefs
  mkfs is /bin/mkfs
  mkfs.xfs is /bin/mkfs.xfs
  xfs_admin is /bin/xfs_admin
  mkswap is /bin/mkswap
  cryptsetup is /bin/cryptsetup
  dmsetup is /bin/dmsetup
  ldconfig is /bin/ldconfig

The reason is everything after /dev/shrinking is not copied, because tar -x stopped silently (without error, believing the end of the archive was reached):

# grep -A 2 shrinking /var/tmp/rear.tpfZyNy6ayS53wP/tmp/copy-as-is-filelist 
/dev/shrinking
tar: /dev/shrinking: File shrank by 5120 bytes; padding with zeros
/dev/vcsa6
/dev/vcs6

# ls -l /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/shrinking /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/vcsa6
ls: cannot access '/var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/vcsa6': No such file or directory
-rw-r--r--. 1 root root 30720 Jul 19 10:12 /var/tmp/rear.tpfZyNy6ayS53wP/rootfs/dev/shrinking

@@ -112,7 +112,7 @@
# cf. https://github.com/rear/rear/pull/2405#issuecomment-633512932
# FIXME: The following code fails if file names contain characters from IFS (e.g. blanks),
# cf. https://github.com/rear/rear/issues/1372
if ! tar -v -X $copy_as_is_exclude_file -P -C / -c ${COPY_AS_IS[*]} 2>$copy_as_is_filelist_file | tar $v -C $ROOTFS_DIR/ -x 1>/dev/null ; then
if ! tar -v -X $copy_as_is_exclude_file -P -C / -c ${COPY_AS_IS[*]} 2>$copy_as_is_filelist_file | tar $v -C $ROOTFS_DIR/ -x -i 1>/dev/null ; then

Check notice

Code scanning / Shellcheck (reported by Codacy)

Double quote to prevent globbing and word splitting. Note

Double quote to prevent globbing and word splitting.
@@ -112,7 +112,7 @@
# cf. https://github.com/rear/rear/pull/2405#issuecomment-633512932
# FIXME: The following code fails if file names contain characters from IFS (e.g. blanks),
# cf. https://github.com/rear/rear/issues/1372
if ! tar -v -X $copy_as_is_exclude_file -P -C / -c ${COPY_AS_IS[*]} 2>$copy_as_is_filelist_file | tar $v -C $ROOTFS_DIR/ -x 1>/dev/null ; then
if ! tar -v -X $copy_as_is_exclude_file -P -C / -c ${COPY_AS_IS[*]} 2>$copy_as_is_filelist_file | tar $v -C $ROOTFS_DIR/ -x -i 1>/dev/null ; then

Check notice

Code scanning / Shellcheck (reported by Codacy)

Use "${array[@]}" (with quotes) to prevent whitespace problems. Note

Use "${array[@]}" (with quotes) to prevent whitespace problems.
@rmetrich
Copy link
Contributor Author

systemtap shrinking.stp script:

global openat
global to_catch
global shrinking_fd
global nread

probe syscall.openat {
	if (execname() != "tar") next
	if (filename_unquoted != "shrinking") next
	to_catch[tid()] = 1
	openat[tid()] = 1
}

probe syscall.openat.return {
	if (! openat[tid()]) next
	delete openat[tid()]
	shrinking_fd = retval
	nread = 0
	printf("FD is %ld\n", retval)
}

probe syscall.close {
	if (! to_catch[tid()]) next
	if (fd != shrinking_fd) next
	delete to_catch[tid()]
}

probe syscall.read {
	if (! to_catch[tid()]) next
	if (fd != shrinking_fd) next
	nread++
	if (nread < 3) next
	printf("Truncating the file\n")
	system("truncate -s 25K /dev/shrinking")
	mdelay(2000)
	exit()
}

@gdha
Copy link
Member

gdha commented Jul 19, 2023

@rmetrich Hi - is this related to Message Queue brokers? Could you say something about the background of this issue? I also saw from time to time that dd seems to hang, but this could be the same issue as I saw it on kubernetes nodes with RabbitMQ pods...

@rmetrich
Copy link
Contributor Author

@rmetrich Hi - is this related to Message Queue brokers? Could you say something about the background of this issue? I also saw from time to time that dd seems to hang, but this could be the same issue as I saw it on kubernetes nodes with RabbitMQ pods...

Yes this was seen with /dev/mqueue/nnsc device node, something shipped by HPE apparently. One of my customer constantly hits the issue when building the ISO.

@pcahyna
Copy link
Member

pcahyna commented Jul 19, 2023

Is /dev/mqueue/nnsc really a device node, or a regular file? I think the content of device special files should not be copied - only the inode (metadata) ...

@rmetrich
Copy link
Contributor Author

Is /dev/mqueue/nnsc really a device node, or a regular file? I think the content of device special files should not be copied - only the inode (metadata) ...

It seems to be a regular file:

/dev/mqueue:
total 0
drwxrwxrwt.  2 0 0   60 Jun 17 20:04 .
drwxr-xr-x. 23 0 0 5740 Jun 18 19:30 ..
---x--x--T.  1 0 0   80 Jun 21 13:30 nnsc

But none of this really matters, all we must take care of is archives with zero padding can be extracted properly.

@jsmeix jsmeix added the bug label Jul 19, 2023
@jsmeix jsmeix added this to the ReaR v2.8 milestone Jul 19, 2023
Copy link
Member

@jsmeix jsmeix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because even the old GNU tar version 1.15.1 in SLES10-SP4 supports (according to its man page)

-i, --ignore-zeros
ignore blocks of zeros in archive (normally mean EOF)

and because I cannot imagine that using '-i' may cause
unexpected bad consequences or regressions in certain cases,
I "just approve" it.

@jsmeix jsmeix self-assigned this Jul 19, 2023
@jsmeix jsmeix requested a review from a team July 19, 2023 11:34
The files copied as-is in the rescue are copied using a 'tar -c | tar
-x' command. It may happen that inodes are shrinking during the copy
(e.g. /dev/mqueue/nnsc (HPE device)), which can cause the 'tar -x'
command to stop extracting the next files, due to the padding zeros
inserted in the shrank inode.
The solution is to add '-i' option when extracting, to continue the
processing.
@jsmeix
Copy link
Member

jsmeix commented Jul 19, 2023

@rmetrich
thank you for your prompt explanatory comment!

@jsmeix
Copy link
Member

jsmeix commented Jul 19, 2023

@rear/contributors
I would like to merge it tomorrow afternoon
unless there are objections

Copy link
Member

@schlomo schlomo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmetrich nice find and thanks for making ReaR more robust in general.

Beyond that, does the rescue system actually need those mqueue files? If not then I'd suggest to also exclude them from the rescue image, as it will debloat or unclutter the rescue image.

If this is a common thing then we can also add it to the default exclude list, IMHO.

@jsmeix
Copy link
Member

jsmeix commented Jul 19, 2023

Only for the log (for completeness):

I will never fully understand how 'tar' works.
Neither the old GNU tar version 1.15.1 man page (see above)
nor the newer GNU tar version 1.34 man page (in openSUSE Leap 15.4)

-i, --ignore-zeros
Ignore zeroed blocks in archive.
Normally two consecutive 512-blocks filled with zeroes mean EOF
and tar stops reading after encountering them.
This option instructs it to read further and
is useful when reading archives created with the -A option.

explain why "blocks filled with zeroes mean EOF" in 'tar'.
At least for me this is unexpected behaviour of a program
when reading a sequence of zero bytes means EOF
so "sequence of zero bytes" == "end of file" ?

@rmetrich
Copy link
Contributor Author

@rmetrich nice find and thanks for making ReaR more robust in general.

Beyond that, does the rescue system actually need those mqueue files? If not then I'd suggest to also exclude them from the rescue image, as it will debloat or unclutter the rescue image.

If this is a common thing then we can also add it to the default exclude list, IMHO.

It's probably not useful, as most of the devices in /dev are (knowing that all is supposed to be generated, on RHEL at least).

@jsmeix
Copy link
Member

jsmeix commented Jul 19, 2023

Related to this pull request
but not actually belonging to this pull request:

We copy by default all in /dev/ into the recovery system

# find /dev | wc -l
732

# usr/sbin/rear -D mkrescue
...

# find /var/tmp/rear.nSzlPYGQ1Nk9v9w/rootfs/dev | wc -l
728

because of

COPY_AS_IS+=( /dev ...

in usr/share/rear/conf/GNU/Linux.conf
https://github.com/rear/rear/blob/master/usr/share/rear/conf/GNU/Linux.conf#L231
which is there since the beginning according to

# git log --follow -p usr/share/rear/conf/GNU/Linux.conf

I made a separated issue
#3028

Moved the comment before the FIXME part and
expained in the comment why 'tar' exits regularly
when it reads a (longer) sequence of zeroes, cf.
rear#3027 (comment)
@jsmeix jsmeix merged commit 41c2d9b into rear:master Jul 20, 2023
20 checks passed
@jsmeix
Copy link
Member

jsmeix commented Jul 20, 2023

@rmetrich
thank you for your problem analysis and your fix!

It is much appreciated to get ReaR working
more fail-safe even in corner cases like this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants