Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign uprepair: extent_io.c:603: free_extent_buffer: Assertion `!(eb->flags & 1)' failed. #1
Comments
|
I've fixed a bunch of shit today, can you give it another go around and see if it's fixed? |
|
I just pulled in your changes... Apparently the problem persists. BTW: I pulled your changes into integration-20111012 from darksatanic because otherwise I get an error about unsupported features:
Anything I can send you for further debugging? The kernel is not able to access some files - it ooopses. That's why I'm trying your repair command - and of course I'd like to support development, btrfs needs a repair command. This is the current output with your latest commits:
I tried commenting out some of the BUG_ON lines but after a few steps of doing so I reach some BUG_ON lines which I should clearly not comment out. |
|
run it in gdb and do a bt when that happens so I know where it came from. |
|
Well, this seems not very helpful... Strange thing is: when I tried to run it in gdb before I got meaningfull backtraces.
|
If a device could not be opened in volumes.c:read_one_dev(), a btrfs_device instance was allocated and added to the list of devices of the fs - however this device instance had its fd, name and label fields not initialized. This is problematic in disk-io.c:close_all_devices() as it tried to sync, fadvise and close the (invalid) fd of the device, and kfree() its name and label, which pointed to random memory locations. Thread 1 (Thread 0x7f0a3d2d1740 (LWP 23585)): #0 __GI___libc_free (mem=0xa5a5a5a5a5a5a5a5) at malloc.c:2970 #1 0x000000000042054b in close_all_devices (fs_info=0x1e92bf0) at disk-io.c:1276 #2 0x0000000000421dcd in close_ctree (root=<optimized out>) at disk-io.c:1336 #3 0x0000000000418cfa in cmd_check (argc=<optimized out>, argv=<optimized out>) at cmds-check.c:4171 #4 0x0000000000403ed4 in main (argc=2, argv=0x7fff9a583d28) at btrfs.c:295 v2: Added Liu Bo's review mention. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
A mdrestore_struct was being written to without its mutex being held.
This race was found with ThreadSanitizer; the relevant part of the report
looks like this:
WARNING: ThreadSanitizer: data race (pid=18828)
Write of size 8 at 0x7fffffc3d088 by main thread:
#0 build_chunk_tree .../btrfs-progs/btrfs-image.c:2233
#1 __restore_metadump .../btrfs-progs/btrfs-image.c:2294
#2 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#3 main .../btrfs-progs/btrfs-image.c:2545
Previous read of size 8 at 0x7fffffc3d088 by thread T1 (mutexes: write M0):
#0 restore_worker .../btrfs-progs/btrfs-image.c:1636
Location is stack of main thread.
Mutex M0 created at:
#0 pthread_mutex_init ??:0
#1 mdrestore_init .../btrfs-progs/btrfs-image.c:1766
#2 __restore_metadump .../btrfs-progs/btrfs-image.c:2286
#3 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#4 main .../btrfs-progs/btrfs-image.c:2545
Thread T1 (tid=18830, running) created by main thread at:
#0 pthread_create ??:0
#1 mdrestore_init .../btrfs-progs/btrfs-image.c:1784
#2 __restore_metadump .../btrfs-progs/btrfs-image.c:2286
#3 restore_metadump .../btrfs-progs/btrfs-image.c:2345
#4 main .../btrfs-progs/btrfs-image.c:2545
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
When a struct btrfs_fs_devices was being torn down by
btrfs_close_devices(), there was an invalidated pointer in the global
list fs_uuids which still pointed to it; if a device was closed and
then reopened (which btrfs-convert does), freed memory would be
accessed.
This was found using ThreadSanitizer (pretty much doing what
AddressSanitizer would, but not exiting after the first failure).
To reproduce, build with -fsanitize=thread and run 'make test'.
Representative output is below.
This change makes the current tests TSan-clean.
WARNING: ThreadSanitizer: heap-use-after-free (pid=29161)
Read of size 8 at 0x7d180000eee0 by main thread:
#0 memcmp ??:0
#1 find_fsid .../volumes.c:81
#2 device_list_add .../volumes.c:95
#3 btrfs_scan_one_device .../volumes.c:259
#4 btrfs_scan_fs_devices .../disk-io.c:1002
#5 __open_ctree_fd .../disk-io.c:1090
#6 open_ctree_fd .../disk-io.c:1191
#7 do_convert .../btrfs-convert.c:2317
#8 main .../btrfs-convert.c:2745
Previous write of size 8 at 0x7d180000eee0 by main thread:
#0 free ??:0
#1 btrfs_close_devices .../volumes.c:191
#2 close_ctree .../disk-io.c:1401
#3 do_convert .../btrfs-convert.c:2300
#4 main .../btrfs-convert.c:2745
Location is heap block of size 96 at 0x7d180000eee0 allocated by main thread:
#0 calloc ??:0 (exe+0x00000002acc6)
#1 device_list_add .../volumes.c:97
#2 btrfs_scan_one_device .../volumes.c:259
#3 btrfs_scan_fs_devices .../disk-io.c:1002
#4 __open_ctree_fd .../disk-io.c:1090
#5 open_ctree_fd .../disk-io.c:1191
#6 do_convert .../btrfs-convert.c:2256
#7 main .../btrfs-convert.c:2745
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Running "btrfsck --repair /dev/sdd2" crashed as it can happen in (corrupted) file systems, that slot > nritems: > (gdb) bt full > #0 0x00007ffff7020e71 in __memmove_sse2_unaligned_erms () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x0000000000438764 in btrfs_del_ptr (trans=<optimized out>, root=0x6e4fe0, path=0x1d17880, level=0, slot=7) > at ctree.c:2611 > parent = 0xcd96980 > nritems = <optimized out> > __func__ = "btrfs_del_ptr" > #2 0x0000000000421b15 in repair_btree (corrupt_blocks=<optimized out>, root=<optimized out>) at cmds-check.c:3539 > key = {objectid = 77990592512, type = 168 '\250', offset = 16384} > trans = 0x8f48c0 > path = 0x1d17880 > level = 0 > #3 check_fs_root (wc=<optimized out>, root_cache=<optimized out>, root=<optimized out>) at cmds-check.c:3703 > corrupt = 0x1d17880 > corrupt_blocks = {root = {rb_node = 0x6e80c60}} > path = {nodes = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, slots = {0, 0, 0, 0, 0, 0, 0, 0}, locks = {0, 0, > 0, 0, 0, 0, 0, 0}, reada = 0, lowest_level = 0, search_for_split = 0, skip_check_block = 0} > nrefs = {bytenr = {271663104, 271646720, 560021504, 0, 0, 0, 0, 0}, refs = {1, 1, 1, 0, 0, 0, 0, 0}} > wret = 215575372 > root_node = {cache = {rb_node = {__rb_parent_color = 0, rb_right = 0x0, rb_left = 0x0}, objectid = 0, > start = 0, size = 0}, root_cache = {root = {rb_node = 0x0}}, inode_cache = {root = { > rb_node = 0x781c80}}, current = 0x819530, refs = 0} > status = 215575372 > rec = 0x1 > #4 check_fs_roots (root_cache=0xcd96b6d, root=<optimized out>) at cmds-check.c:3809 > path = {nodes = {0x6eed90, 0x6a2f40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, slots = {18, 2, 0, 0, 0, 0, 0, 0}, > locks = {0, 0, 0, 0, 0, 0, 0, 0}, reada = 0, lowest_level = 0, search_for_split = 0, > skip_check_block = 0} > key = {objectid = 323, type = 132 '\204', offset = 18446744073709551615} > wc = {shared = {root = {rb_node = 0x0}}, nodes = {0x0, 0x0, 0x7fffffffe428, 0x0, 0x0, 0x0, 0x0, 0x0}, > active_node = 2, root_level = 2} > leaf = 0x6e4fe0 > tmp_root = 0x6e4fe0 > #5 0x00000000004287c3 in cmd_check (argc=215575372, argv=0x1d17880) at cmds-check.c:11521 > root_cache = {root = {rb_node = 0x98c2940}} > info = 0x6927b0 > bytenr = 6891440 > tree_root_bytenr = 0 > uuidbuf = "f65ff1a1-76ef-456e-beb5-c6c3841e7534" > num = 215575372 > readonly = 218080104 > qgroups_repaired = 0 > #6 0x000000000040a41f in main (argc=3, argv=0x7fffffffebe8) at btrfs.c:243 > cmd = 0x689868 > bname = <optimized out> > ret = <optimized out> in that case the count of remaining items (nritems - slot - 1) gets negative. That is then casted to (unsigned long len), which leads to the observed crash. Change the tests before the move to handle only the non-corrupted case, were slow < nritems. This does not fix the corruption, but allows btrfsck to finish without crashing. Signed-off-by: Philipp Hahn <hahn@univention.de> Signed-off-by: David Sterba <dsterba@suse.com>
The status display was reading the state while the task was updating
it. Use a mutex to prevent the race.
This race was detected using ThreadSanitizer and
misc-tests/005-convert-progress-thread-crash.
==================
WARNING: ThreadSanitizer: data race
Write of size 8 by main thread:
#0 ext2_copy_inodes btrfs-progs/convert/source-ext2.c:853
#1 copy_inodes btrfs-progs/convert/main.c:145
#2 do_convert btrfs-progs/convert/main.c:1297
#3 main btrfs-progs/convert/main.c:1924
Previous read of size 8 by thread T1:
#0 print_copied_inodes btrfs-progs/convert/main.c:124
Location is stack of main thread.
Thread T1 (running) created by main thread at:
#0 pthread_create <null>
#1 task_start btrfs-progs/task-utils.c:50
#2 do_convert btrfs-progs/convert/main.c:1295
#3 main btrfs-progs/convert/main.c:1924
SUMMARY: ThreadSanitizer: data race
btrfs-progs/convert/source-ext2.c:853 in ext2_copy_inodes
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Making the code data-race safe requires that reads *and* writes
happen under a mutex lock, if any of the access are writes. See
Dmitri Vyukov, "Benign data races: what could possibly go wrong?"
for more details.
The fix here was to put most of the main loop of restore_worker
under a mutex lock.
This race was detected using fsck-tests/012-leaf-corruption.
==================
WARNING: ThreadSanitizer: data race
Write of size 4 by main thread:
#0 add_cluster btrfs-progs/image/main.c:1931
#1 restore_metadump btrfs-progs/image/main.c:2566
#2 main btrfs-progs/image/main.c:2859
Previous read of size 4 by thread T6:
#0 restore_worker btrfs-progs/image/main.c:1720
Location is stack of main thread.
Thread T6 (running) created by main thread at:
#0 pthread_create <null>
#1 mdrestore_init btrfs-progs/image/main.c:1868
#2 restore_metadump btrfs-progs/image/main.c:2534
#3 main btrfs-progs/image/main.c:2859
SUMMARY: ThreadSanitizer: data race btrfs-progs/image/main.c:1931 in
add_cluster
Signed-off-by: Adam Buchbinder <abuchbinder@google.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
Didn't see this problem in a long time... |
…_info_cache()
This bug is exposed by fsck-test with D=asan, hit by test case 020, with
the following error report:
=================================================================
==10740==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x621000061580 at pc 0x56051f0db6cd bp 0x7ffe170f3e20 sp 0x7ffe170f3e10
READ of size 1 at 0x621000061580 thread T0
#0 0x56051f0db6cc in btrfs_extent_inline_ref_type /home/adam/btrfs/btrfs-progs/ctree.h:1727
#1 0x56051f13b669 in build_roots_info_cache /home/adam/btrfs/btrfs-progs/cmds-check.c:14306
#2 0x56051f13c86a in repair_root_items /home/adam/btrfs/btrfs-progs/cmds-check.c:14450
#3 0x56051f13ea89 in cmd_check /home/adam/btrfs/btrfs-progs/cmds-check.c:14965
#4 0x56051efe75bb in main /home/adam/btrfs/btrfs-progs/btrfs.c:302
#5 0x7f04ddbb0f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49)
#6 0x56051efe68c9 in _start (/home/adam/btrfs/btrfs-progs/btrfs+0x5b8c9)
0x621000061580 is located 0 bytes to the right of 4224-byte region [0x621000060500,0x621000061580)
allocated by thread T0 here:
#0 0x7f04ded50ce1 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:70
#1 0x56051f04685e in __alloc_extent_buffer /home/adam/btrfs/btrfs-progs/extent_io.c:553
#2 0x56051f047563 in alloc_extent_buffer /home/adam/btrfs/btrfs-progs/extent_io.c:687
#3 0x56051efff1d1 in btrfs_find_create_tree_block /home/adam/btrfs/btrfs-progs/disk-io.c:187
#4 0x56051f000133 in read_tree_block /home/adam/btrfs/btrfs-progs/disk-io.c:327
#5 0x56051efeddb8 in read_node_slot /home/adam/btrfs/btrfs-progs/ctree.c:652
#6 0x56051effb0d9 in btrfs_next_leaf /home/adam/btrfs/btrfs-progs/ctree.c:2853
#7 0x56051f13b343 in build_roots_info_cache /home/adam/btrfs/btrfs-progs/cmds-check.c:14267
#8 0x56051f13c86a in repair_root_items /home/adam/btrfs/btrfs-progs/cmds-check.c:14450
#9 0x56051f13ea89 in cmd_check /home/adam/btrfs/btrfs-progs/cmds-check.c:14965
#10 0x56051efe75bb in main /home/adam/btrfs/btrfs-progs/btrfs.c:302
#11 0x7f04ddbb0f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49)
It's completely possible that one extent/metadata item has no inline
reference, while build_roots_info_cache() doesn't have such check.
Fix it by checking @iref against item end to avoid such problem.
Issue: #92
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…y wrong condition to free delayed ref/head.
[BUG]
When btrfs-progs is compiled with D=asan, it can't pass even the very
basic fsck tests due to btrfs-image has memory leak:
=== START TEST /home/adam/btrfs/btrfs-progs/tests//fsck-tests/001-bad-file-extent-bytenr
restoring image default_case.img
=================================================================
==7790==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 104 byte(s) in 1 object(s) allocated from:
#0 0x7f1d3b738389 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x560ca6b7f4ff in btrfs_add_delayed_tree_ref /home/adam/btrfs/btrfs-progs/delayed-ref.c:569
#2 0x560ca6af2d0b in btrfs_free_extent /home/adam/btrfs/btrfs-progs/extent-tree.c:2155
#3 0x560ca6ac16ca in __btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:319
#4 0x560ca6ac1d8c in btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:383
#5 0x560ca6ac6c8e in btrfs_search_slot /home/adam/btrfs/btrfs-progs/ctree.c:1153
#6 0x560ca6ab7e83 in fixup_device_size image/main.c:2113
#7 0x560ca6ab9279 in fixup_chunks_and_devices image/main.c:2333
#8 0x560ca6ab9ada in restore_metadump image/main.c:2455
#9 0x560ca6abaeba in main image/main.c:2723
#10 0x7f1d3b148ce2 in __libc_start_main (/usr/lib/libc.so.6+0x23ce2)
... tons of similar leakage for delayed_tree_ref ...
Direct leak of 96 byte(s) in 1 object(s) allocated from:
#0 0x7f1d3b738389 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cc:86
#1 0x560ca6b7f5fb in btrfs_add_delayed_tree_ref /home/adam/btrfs/btrfs-progs/delayed-ref.c:583
#2 0x560ca6af5679 in alloc_tree_block /home/adam/btrfs/btrfs-progs/extent-tree.c:2503
#3 0x560ca6af57ac in btrfs_alloc_free_block /home/adam/btrfs/btrfs-progs/extent-tree.c:2524
#4 0x560ca6ac115b in __btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:290
#5 0x560ca6ac1d8c in btrfs_cow_block /home/adam/btrfs/btrfs-progs/ctree.c:383
#6 0x560ca6b7bb15 in commit_tree_roots /home/adam/btrfs/btrfs-progs/transaction.c:98
#7 0x560ca6b7c525 in btrfs_commit_transaction /home/adam/btrfs/btrfs-progs/transaction.c:192
#8 0x560ca6ab92be in fixup_chunks_and_devices image/main.c:2337
#9 0x560ca6ab9ada in restore_metadump image/main.c:2455
#10 0x560ca6abaeba in main image/main.c:2723
#11 0x7f1d3b148ce2 in __libc_start_main (/usr/lib/libc.so.6+0x23ce2)
... tons of similar leakage for delayed_ref_head ...
SUMMARY: AddressSanitizer: 1600 byte(s) leaked in 16 allocation(s).
failed to restore image ./default_case.img
[CAUSE]
Commit c603970 ("btrfs-progs: Add delayed refs infrastructure")
introduces delayed ref infrastructure for free space tree, however the
refcount_dec_and_test() from kernel code is wrongly backported.
refcount_dec_and_test() will return true if the refcount reaches 0.
So kernel code will free the allocated space as expected:
if (refcount_dec_and_test(&ref->refs)) {
kmem_cache_free();
}
However btrfs-progs backport is using the opposite condition:
if (--ref->refs) {
kfree();
}
This will not free the memory for the last user, but for refs >= 2.
Causing both use-after-free and memory leak for any offline write
operation.
[FIX]
Fix the (--ref->refs) condition to (--ref->refs == 0) to fix the
backport error.
Fixes: c603970 ("btrfs-progs: Add delayed refs infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG] For certain fuzzed image, `btrfs check` will fail with the following call trace: Checking filesystem on issue_213.raw UUID: 99e50868-0bda-4d89-b0e4-7e8560312ef9 [1/7] checking root items [2/7] checking extents Program received signal SIGABRT, Aborted. 0x00007ffff7c88f25 in raise () from /usr/lib/libc.so.6 (gdb) bt #0 0x00007ffff7c88f25 in raise () from /usr/lib/libc.so.6 #1 0x00007ffff7c72897 in abort () from /usr/lib/libc.so.6 #2 0x00005555555abc3e in run_next_block (...) at check/main.c:6398 #3 0x00005555555b0f36 in deal_root_from_list (...) at check/main.c:8408 #4 0x00005555555b1a3d in check_chunks_and_extents (fs_info=0x5555556a1e30) at check/main.c:8690 #5 0x00005555555b1e3e in do_check_chunks_and_extents (fs_info=0x5555556a1e30) a #6 0x00005555555b5710 in cmd_check (cmd=0x555555696920 <cmd_struct_check>, argc #7 0x0000555555568dc7 in cmd_execute (cmd=0x555555696920 <cmd_struct_check>, ar #8 0x0000555555569713 in main (argc=2, argv=0x7fffffffde70) at btrfs.c:386 [CAUSE] This fuzzed images has a corrupted EXTENT_DATA item in data reloc tree: item 1 key (256 EXTENT_DATA 256) itemoff 16111 itemsize 12 generation 0 type 2 (prealloc) prealloc data disk byte 16777216 nr 0 prealloc data offset 0 nr 0 There are several problems with the item: - Bad item size 12 is too small. - Bad key offset offset of EXTENT_DATA type key represents file offset, which should always be aligned to sector size (4K in this particular case). [FIX] Do extra item size and key offset check for original mode, and remove the abort() call in run_next_block(). And to show off how robust lowmem mode is, lowmem can handle it without any hiccup. With this fix, original mode can detect the problem properly: Checking filesystem on issue_213.raw UUID: 99e50868-0bda-4d89-b0e4-7e8560312ef9 [1/7] checking root items [2/7] checking extents ERROR: invalid file extent item size, have 12 expect (21, 16283] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space cache [4/7] checking fs roots root 18446744073709551607 root dir 256 error root 18446744073709551607 inode 256 errors 62, no orphan item, odd file extent, bad file extent ERROR: errors found in fs roots found 131072 bytes used, error(s) found total csum bytes: 0 total tree bytes: 131072 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 124774 file data blocks allocated: 0 referenced 0 Issue: #213 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Su Yue <Damenly_Su@gmx.com> Signed-off-by: David Sterba <dsterba@suse.com>
free_extent_buffer dumps core after failing an assertion. It looks like the extent has the dirty flag set.
Thank you for developing a btrfs repair utility.