ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

dioni21 · 2018-07-02T16:35:16Z

System information

Describe the problem you're observing

I am debugging a problem while copying a whole pool to a new drive using zfs send/recv. Some files on the receiving side have different checksums.

While running zdb -cccv with the installed version, I got a segmentation fault. So, I went to try a newer version, and compiled ZDB from master repo (commit id e03a41a). Now I have an assertion.

Describe how to reproduce the problem

./zdb -cccvv tank

Traversing all blocks to verify checksums and verify nothing leaked ...

loading concrete vdev 0, metaslab 69 of 145 ...space_map_load(msp->ms_sm, msp->ms_allocatable, maptype) == 0 (0x5 == 0x0)
ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees()Aborted (core dumped)

Include any warning/errors/backtraces from the system logs

Nothing useful:

Jul  2 13:12:52 nexus systemd[1]: Started Process Core Dump (PID 21018/UID 0).
Jul  2 13:12:52 nexus systemd-coredump[21020]: Resource limits disable core dumping for process 31773 (lt-zdb).
Jul  2 13:12:52 nexus systemd-coredump[21020]: Process 31773 (lt-zdb) of user 0 dumped core.
Jul  2 13:12:52 nexus abrt-dump-journal-core[17302]: Failed to obtain all required information from journald
Jul  2 13:12:52 nexus abrt-dump-journal-core[17302]: Failed to save detect problem data in abrt database

How could I help more?

The text was updated successfully, but these errors were encountered:

dioni21 · 2018-07-05T00:16:22Z

Running zdb -cccvvAAA it can get past this and finally dies with SEGV.

No error from the operating system.

No error found, after two scrub passes.

rincebrain · 2018-07-05T00:45:53Z

@dioni21 Mixing and matching userland and kernel versions is gonna produce exciting results. I would suggest that, if you want to try using a git version, purge all traces of the old SPL/ZFS packages, and then install the git version.

What do you mean by "have differing checksums"? According to e.g. md5sum, or zdb examining on the affected files, or ...?

You haven't included anything about the source pool layout, or the properties on the datasets, or even the arguments for running send|recv.

dioni21 · 2018-07-05T18:41:16Z

@rincebrain Thanks for your answer.

Using mixed zdb was a last resort after a SIGSEGV with no more info. I know it is not recommended. Since the matching zdb/kernel setup did not had this assertion, maybe should I close this issue? Sorry for that...

I think I found the reason for this SEGV (default inflight I/Os), I'll fill another issue as soon as I confirm. Since my disks are SATA, every full disk operation takes a long time.

"differing checksums" => According to md5sum, or, to be more specific, mtree (yes, I'm a FreeBSD user running Linux)

This is a simple home setup. I am upgrading a 2x4TB pool to a 2x10TB pool, both configured with simple mirroring, and both with log and cache on SSD LVM partitions. There are about 11 dataset, some with dedup, some with compression, some without.

Also, I think that file corruption was caused by using a full parameter zfs send (--dedup --large-block --replicate --embed --compressed --props). I've seen a previous bug with this config, but it is marked as solved.

rincebrain · 2018-07-05T18:57:46Z

@dioni21 Which bug did you see with this config?

The only ones I can think of involving misbehavior with send|recv are #6224 or #4809; the former shouldn't happen if you specify -L, as I understand it, and the latter should be mitigated by the default-on tunable on platforms from 2017 on.

dioni21 · 2018-07-05T19:42:23Z

@rincebrain If I understood correctly, #6224 is not applied to my setup (0.7.9). This may explain why I got errors even without any zend options. I'll try with only --large-block --replicate --props as soon as current zdb run finishes.

#4809 appear exactly with my problem, I'm not sure if this is the one I've read about before. The feature@hole_birth is active on both my pools. The source pool is very old, but the destination pool has just been created, in 0.7.9. Also, it is currently disabled in parameters:

/sys/module/zfs/parameters/ignore_hole_birth:1
/sys/module/zfs/parameters/send_holes_without_birth_time:1

Should I worry?

rincebrain · 2018-07-05T19:46:13Z

@dioni21 If the source doing the sending has either of those tunables, #4809 shouldn't happen. What makes you think it's #4809 and not some other kind of data mangling? Have you looked to see if the affected files are the same every time you send/recv and how they differ between src and dst using e.g. zdb?

dioni21 · 2018-07-05T22:00:51Z

@rincebrain I do not know yet the reason for corruption. Still searching.

ZDB failure is what started this issue. Right now I could only use mtree/md5 and/or rsync -c to check file consistency.

What I already know:

A new copy of the same dataset above with all options (not sure about that, though) except --replicate generate a faulty dataset, but with different files with error from the previous copy.
A new copy of one dataset (with dedup and compression) with only --props in zfs send passed without errors.
I have checked a big file with error. Only the final part of it is in different. (similar to Silently corrupted file in snapshots after send/receive #4809)?

Note that I made these new copies without deleting the previous ones, since the new pool has much more free space. Also, the source pool is still in "production" with all my personal stuff, changing content as we talk...

dioni21 · 2018-07-10T22:22:55Z

My last tests lead me to believe the reason for corruption is send --dedup. Taking it off was enough to copy all datasets with no md5sum error in content.

I opened a new issue, #7703

Now that I could copy all data without corruption, I'll try zdb -ccc again ASAP.

dioni21 · 2018-07-10T22:24:20Z

@rincebrain closing this issue since, as you pointed, could have been caused by using mixed kernel/userland binary versions. I'll open another if I can find more details about the SIGSEGV.

Thanks a lot...

dioni21 closed this as completed Jul 10, 2018

sneumann mentioned this issue Nov 18, 2022

zpool not importable and ASSERT at zdb.c:3908:load_concrete_ms_allocatable_trees() #14193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

dioni21 commented Jul 2, 2018

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

dioni21 commented Jul 10, 2018

dioni21 commented Jul 10, 2018

ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees() #7672

Comments

dioni21 commented Jul 2, 2018

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

rincebrain commented Jul 5, 2018

dioni21 commented Jul 5, 2018

dioni21 commented Jul 10, 2018

dioni21 commented Jul 10, 2018