-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: unable to handle kernel paging request at ffffc90029404000 #5742
Comments
Receiving stream from |
@gmelikov is it send or receive related? I already tried master on the recv side, same bug ocurred at the same time. |
@hunbalazs IIRC it's receive related. |
I double checked, v0.7.0-rc3_72_g298ec40 is actually current master.
|
@hunbalazs do you still have the send stream lying around? It'd be interesting to see how the record that is causing the fault looks like, can you patch diff --git a/cmd/zstreamdump/zstreamdump.c b/cmd/zstreamdump/zstreamdump.c
index aa594b8..dc1f9d0 100644
--- a/cmd/zstreamdump/zstreamdump.c
+++ b/cmd/zstreamdump/zstreamdump.c
@@ -95,6 +95,8 @@ ssread(void *buf, size_t len, zio_cksum_t *cksum)
if ((outlen = fread(buf, len, 1, send_stream)) == 0)
return (0);
+ fwrite(buf, len, 1, stderr);
+
if (do_cksum) {
if (do_byteswap)
fletcher_4_incremental_byteswap(buf, len, cksum); and then try to receive the send stream again with:
I'm not sure it's going to work but if it's the only data remaining from the old pool it's probably worth a try. |
@loli10K thank you for the help |
@hunbalazs hum, i'm not very good with the shell, maybe remove the "cat >" part? The idea here is to "print" the stream data in human readable form (via zstreamdump) just before we receive it, hopefully the very last DRR printed is the one causing the fault. We try to do that with the shell by piping |
@loli10K I'm not a shell wizard either 😄
no checksum at the end, as I can see other object have checksum after offset line |
@hunbalazs that doesn't look like anything to me. Unfortunately. I would have expected a Can you share the stream that is causing the fault? The next step i'd take to troubleshoot something like this is to get kernel+zfs with debugging symbols, try some |
@loli10K compiled everything again with --debug-* and imported everything without a hitch. |
@hunbalazs if you feel comfortable i'd encourage you to try and debug this issue on a live kernel by yourself (i think you can find a lot of tutorials online on how to do it, and hopefully one day on the ZoL wiki https://github.com/zfsonlinux/zfs/wiki/Debugging). The way i usually do this for development is a QEMU box with a serial port via gdb, so if anything goes wrong i can just bounce it and restart the debugging session. Basically just attach gdb to the virtual machine, load all the symbols, setup some breakpoints (i'd start with If you don't mind sharing the backup stream i'd also like to try and debug this issue. It's a real pity we lost the source pool, now we don't know if it's reproducible on the send side. |
I'm finding this very easy to reproduce using tag:zfs-0.7.2 built from source on Ubuntu 16.04.3. (And this bug itself is a huge nightmare for me, but that's besides the point.)
For me also using -L in step 4 makes absolutely no difference; it still crashes. The exact same filesystem sent without -D recv's fine (everything as above but no D -- still compressed, etc.). If you try the above and still can't reproduce the problem, ping me and I'll provide more help. |
It's failing in
And judging from Copying a linear 128KB in kernel memory sounds off. Maybe something to do with the ABD changes, e.g. trying to do a linear copy on a non-linear buffer? |
@williamstein are you able to provide a disassembly of your
Reason being, I'm trying to work out what the actual args to the
|
Hi @chrisrd I'm with @williamstein .... and I have to admit I barely know what I'm doing here. The kernel is from Ubuntu, and running this gdb command on the variant with debug symbols gives this:
|
Hi @haraldschilly, thanks, that's exactly what I was after. However your disassembly is (practically) identical to my disassembly (using a different kernel version and likely a different compiler version), i.e. neither contains the code sequence from the crash dump. That's a mystery! |
ok, wait, maybe there is an inconsistency I'm not aware of |
Scratch that! If you ignore the 'nops' at the start of the I think that gives me enough to work on, I'll see what I can see. Thanks again! |
I was just checking the kernel versions, they're all the same. |
So, looks like there's something wrong with the source address (and/or size), i.e.
Basically:
The faulting instruction, The CR2 register contains the faulting address, The previous instruction, And now it's way past my bedtime! G'night. |
@hunbalazs The problem is the difference between compressed size and uncompressed size. The bcopy is trying to copy the uncompressed size for a compressed buffer. The ARC buffer in |
Actually, it looks like an ABD issue after all, although compression likely comes into play, too. |
The Also, it seems that in some cases, a compressed ARC buffer is in the dirty record, but its header indicates no compression. I'm been playing with a fix in dweeezil:issue-5742 but its as likely as not that the real problem lies well outside the The goal of the patch referenced above is to allocate a new compressed ARC buffer when the dirty record's arc buffer is compressed. Unfortunately, there's no API for the dbuf layer to determine this so I added the At this patch stands now, it does eliminate the panics, however, there is typically at least one corrupted file after the receive is done and I suspect it has to do with the mismatch between the compression flag on the ARC buffer and its corresponding header. I'm not sure where to go with this because I'm not really very tuned into the intricacies of the compressed ARC. FWIW, it seems that this bug is only triggered when receiving send streams with both compression and dedup enabled. |
Just to clarify, do you mean: I hope you mean (a) and that (b) is completely irrelevant to this bug. I'm only asking, because I'm trying to find a workaround -- I'm going to be storing over a half million ZFS streams soon, where some of the data is very dedup-friendly (collected homework assignments in classes), so it would be very good if I could use dedup on them. If completely avoiding zfs stream compression, and instead just piping through lz4 would work, that would be fine for me. |
@williamstein I meant (a), but need to run a few more tests to be sure. That said, however, it's not clear to me why this is only every triggered by the act of receiving specifically-formatted send streams as opposed to normal filesystem access. I have a feeling that Keep in mind that deduped send streams (zfs send -D) don't really have anything to do with ZFS' dedup facility but are more of a fancy form of compression; they (dedup send stream) can be received on a file system which is not dedup-enabled. |
@dweeezil thanks. I also tested a lot yesterday, and (a) seems to be true, at least in all my tests. I'm now storing my zfs streams using the output of |
@williamstein That's good information. For my part, I'll not likely have a chance to do more work on this until the coming weekend during which I'll try to track down how we're getting a compressed ARC buffer with an associated ARC header indicating no compression (which is what I think is causing the corruption I'm seeing even with the WIP patch referenced above). |
I worked on this issue with @grwilson at the OpenZFS summit and I believe we've gotten to the root causes. There should be a PR posted within the next couple of days with a fix. |
@dweeezil Thank you very much! |
I tested a complete version of the patch and it now receives the stream properly. After cleaning up the patch, it'll be posted as a PR tomorrow. |
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Fixes: openzfs#5742 Signed-off-by: Tim Chase <tim@chase2k.com>
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Fixes: openzfs#5742 Signed-off-by: Tim Chase <tim@chase2k.com>
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #5742 Closes #6797
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed buffers. The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling compressed buffers with arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle compressed buffers. Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed buffers. The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling compressed buffers with arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle compressed buffers. Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797 Requires-spl: refs/heads/spl-0.7-release
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797
In __dbuf_hold_impl(), if a buffer is currently syncing and is still referenced from db_data, a copy is made in case it is dirtied again in the txg. Previously, the buffer for the copy was simply allocated with arc_alloc_buf() which doesn't handle compressed or encrypted buffers (which are a special case of a compressed buffer). The result was typically an invalid memory access because the newly-allocated buffer was of the uncompressed size. This commit fixes the problem by handling the 2 compressed cases, encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and arc_alloc_compressed_buf(). Although using the proper allocation functions fixes the invalid memory access by allocating a buffer of the compressed size, another unrelated issue made it impossible to properly detect compressed buffers in the first place. The header's compression flag was set to ZIO_COMPRESS_OFF in arc_write() when it was possible that an attached buffer was actually compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in the non-ZIO_RAW case which wil handle both cases of compressed buffers (encrypted or unencrypted). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#5742 Closes openzfs#6797
System information
Describe the problem you're observing
Saved a pool with the same version (fully upgraded pool)
zfs send -R -D -e -v rpool@beforesend
.When I want to recv it back I get the following BUG
Describe how to reproduce the problem
zfs upgrade with 0.7.0-rc3
zfs send -R -D -e -v <pool>@<snap>
zfs revc -v -s -F <pool>
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: