Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc-test 006 sometimes fails when UUID tree is created at mkfs time #118

Closed
t-msn opened this issue Mar 28, 2018 · 2 comments
Closed

misc-test 006 sometimes fails when UUID tree is created at mkfs time #118

t-msn opened this issue Mar 28, 2018 · 2 comments
Labels
Milestone

Comments

@t-msn
Copy link
Contributor

t-msn commented Mar 28, 2018

With the patch fa754dd ("btrfs-progs: mkfs: precreate the uuid tree", currently in devel), misc-test 006 occasionally fails at misc-test 006 (kernel is 4.16.0-rc7):

$ sudo make test-misc TEST=006\*
    [LD]     fssum
    [TEST]   misc-tests.sh
    [TEST/misc]   006-image-on-missing-device
/usr/data/src/btrfs-progs/tests//common: line 177: 10819 Aborted
                 (core dumped) $INSTRUMENT "$@" >> "$RESULTS" 2>&1
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 006-image-on-missing-device
make: *** [Makefile:329: test-misc] Error 1

Full log of misc-tests-resut.txt when the test fails is below:

=== Entering /usr/data/src/btrfs-progs/tests//misc-tests/006-image-on-missing-device
############### losetup --find --show img1
/dev/loop0
############### losetup --find --show img2
/dev/loop1
############### /usr/data/src/btrfs-progs/mkfs.btrfs -f -d raid1 -m raid1 /dev/loop0 /dev/loop1
btrfs-progs v4.15.1
See http://btrfs.wiki.kernel.org for more information.

Performing full device TRIM /dev/loop0 (2.00GiB) ...
Performing full device TRIM /dev/loop1 (2.00GiB) ...
Label:              (null)
UUID:               220272d0-2454-429c-a198-83eaf954aceb
Node size:          16384
Sector size:        4096
Filesystem size:    4.00GiB
Block group profiles:
  Data:             RAID1           204.75MiB
  Metadata:         RAID1           204.75MiB
  System:           RAID1             8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  2
Devices:
   ID        SIZE  PATH
    1     2.00GiB  /dev/loop0
    2     2.00GiB  /dev/loop1

############### mount /dev/loop0 /usr/data/src/btrfs-progs/tests//mnt
############### dd if=/dev/zero of=/usr/data/src/btrfs-progs/tests//mnt/a bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.00643606 s, 1.6 GB/s
############### dd if=/dev/zero of=/usr/data/src/btrfs-progs/tests//mnt/b bs=4k count=1000 conv=sync
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 0.0056388 s, 726 MB/s
############### umount /usr/data/src/btrfs-progs/tests//mnt
############### /usr/data/src/btrfs-progs/btrfs check /dev/loop0
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/loop0
UUID: 220272d0-2454-429c-a198-83eaf954aceb
found 14843904 bytes used, no error found
total csum bytes: 14240
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 108171
file data blocks allocated: 14712832
 referenced 14712832
############### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump
############### /usr/data/src/btrfs-progs/btrfs filesystem show /dev/loop0
Label: none  uuid: 220272d0-2454-429c-a198-83eaf954aceb
        Total devices 2 FS bytes used 14.16MiB
        devid    1 size 2.00GiB used 417.50MiB path /dev/loop0
        devid    2 size 2.00GiB used 417.50MiB path /dev/loop1

############### wipefs -a /dev/loop1
/dev/loop1: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
############### losetup -d /dev/loop1
############### /usr/data/src/btrfs-progs/btrfs filesystem show /dev/loop0
warning, device 2 is missing
Label: none  uuid: 220272d0-2454-429c-a198-83eaf954aceb
        Total devices 2 FS bytes used 14.16MiB
        devid    1 size 2.00GiB used 417.50MiB path /dev/loop0
        *** Some devices missing

############### /usr/data/src/btrfs-progs/btrfs check /dev/loop0
checking extents
checking free space cache
failed to load free space cache for block group 30408704
failed to load free space cache for block group 245104640
checking fs roots
checking csums
checking root refs
warning, device 2 is missing
Checking filesystem on /dev/loop0
UUID: 220272d0-2454-429c-a198-83eaf954aceb
found 14843904 bytes used, no error found
total csum bytes: 14240
total tree bytes: 131072
total fs tree bytes: 32768
total extent tree bytes: 16384
btree space waste bytes: 108171
file data blocks allocated: 14712832
 referenced 14712832
############### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump
Couldn't map the block 459800576
free(): invalid pointer
failed (ignored, ret=134): /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 006-image-on-missing-device

So, the last btrfs-image failed.
Also, it seems that even when the test succeeds, the last btrfs-image actually fails:

############### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump
Couldn't map the block 459800576
ERROR: failed to flush pending data: -5
ERROR: create failed: Bad address
warning, device 2 is missing
failed (ignored, ret=1): /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump

The test uses may_fail and the test itself succeeds.

However, when v.4.15.1 is used (or reverting the commit), last btrfs-image does not fail:

############### /usr/data/src/btrfs-progs/btrfs-image /dev/loop0 /tmp/test-img.dump
warning, device 2 is missing
@adam900710
Copy link
Collaborator

adam900710 commented Mar 30, 2018

EDIT: Sorry, the btrfs_map_block() part is not changed at all, and it's doing what it suppose to do.

It's my fault.
Although I added the ability to read extra copy in btrfs-image, I did an offset-by-one error for mirror_num, so for RAID1, it will only reach the first mirror, never reaching 2nd mirror.
So for certain chunks (data chunk in this case), if the missing device is in the first stripe, it will never try to read 2nd (the valid) copy.

I'd say it's new UUID tree metadata makes it to generate space cache so it could trigger the hidden bug.

Since the cause is located, fix is under way.

Thanks for reporting this.

@kdave kdave added the bug label Mar 30, 2018
@kdave kdave added this to the v4.16 milestone Mar 30, 2018
@kdave
Copy link
Owner

kdave commented Mar 30, 2018

Added to devel, thanks.

@kdave kdave closed this as completed Mar 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants