Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb 2.1.4 segfault on reading corrupted pool #14016

Open
charlesnix opened this issue Oct 11, 2022 · 5 comments
Open

zdb 2.1.4 segfault on reading corrupted pool #14016

charlesnix opened this issue Oct 11, 2022 · 5 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@charlesnix
Copy link

System information

Type Version/Name
Distribution Name UbuntuMate
Distribution Version 22.04
Kernel Version 5.15.0-43
Architecture X86_64
OpenZFS Version 2.1.4

Describe the problem you're observing

2 disk mirror ssd root pool in operation since 2018 suddenly refuses to import with error "blkptr is invalid type 102" at several
different locations on mirror. Problem persists with only single disks of the mirror attached. Pool created and operated with ubuntu 18.04, kernel 4.15.0, zfs 0.7.5+. Debugging undertaken on LiveUSB of 22.04, where described problem happens.

zpool import shows mirror available and online
zdb -e -bLAAA poolname shows "Traversing all blocks" then issues error:

5.93G completed (  77MB/s) estimated time remaining: 0hr 12min 25sec        type < ZDB_OT_TOTAL
ASSERT at zdb.c:5178:zdb_count_block()

Segmentation fault (core dumped)

Not sure how pool became corrupted in the first place, however, it seems zdb should never segfault on an error, especially with AAA options.

from dmesg

[  681.962123] zdb[18908]: segfault at 2b70 ip 000055f08091a2c3 sp 00007ffe1942a4c0 error 4 in zdb[55f08090f000+1c000]
@charlesnix charlesnix added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 11, 2022
@ryao
Copy link
Contributor

ryao commented Oct 12, 2022

The assertion was suppressed. The segfault is from a NULL pointer dereference, which is not always avoidable when something is wrong.

@ryao
Copy link
Contributor

ryao commented Oct 12, 2022

You might want to try running zdb from master. It is a little more resilient:

git clone https://github.com/openzfs/zfs.git
cd zfs
./autogen.sh && ./configure --enable-debug --enable-debuginfo --with-config=all && make -j$(nproc)
env LD_LIBRARY_PATH=.libs ./.libs/zdb ...

@charlesnix
Copy link
Author

That was an adventure working from a LiveUSB environment. But I found and installed enough packages to complete the build.

Unfortunately, zdb from master also fails.

root@ubuntu-mate:/home/ubuntu-mate/zfs# env LD_LIBRARY_PATH=.libs ./.libs/zdb -e -bLAAA harppool

Traversing all blocks ...

5.51G completed (  83MB/s) estimated time remaining: 0hr 11min 32sec        type < ZDB_OT_TOTAL
ASSERT at cmd/zdb/zdb.c:5315:zdb_count_block()Segmentation fault (core dumped)

Even if it would build, is this the kind of error in the pool that could be corrected manually?

@ryao
Copy link
Contributor

ryao commented Oct 12, 2022

I am not sure. You should use gdb to run a zdb built with debug symbols (i.e. CFLAGS=-Og) so you can get a stack trace from the segmentation fault (as gdb will catch it when it happens). We really should be printing a backtrace when we segfault, but for some reason, that is not working at the moment. It is something that I hope to investigate in the near future.

A backtrace should tell us roughly what is causing it to crash. You can run env LD_LIBRARY_PATH=.libs gdb ./.libs/zdb and then run -e -bLAAA harppool to get it to be run by gdb.

@stale
Copy link

stale bot commented Oct 15, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants