Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle compressed buffers in __dbuf_hold_impl() #6797

Merged
merged 2 commits into from Nov 8, 2017

Conversation

dweeezil
Copy link
Contributor

Description

In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg. Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer). The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place. The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed. This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Motivation and Context

Fixes #5742 and possibly other yet-undetected cases.

How Has This Been Tested?

A compressed deduped send stream which previously caused the system to panic was used as the test for this patch. The fix to __dbuf_hold_impl() fixes the out-of-bound memory access and the fix to arc_write() causes received dataset to properly match the original dataset.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Fixes: openzfs#5742
Signed-off-by: Tim Chase <tim@chase2k.com>
@gmelikov
Copy link
Member

I think it would be great to add the test for this case in zfs-tests.

@dinatale2
Copy link
Contributor

@dweeezil I agree with @gmelikov. If possible, please add a test case for this to the test suite. Even if it's for only this specific case, it will set the ground work for others to make similar test cases.

@dweeezil
Copy link
Contributor Author

@dinatale2 and @gmelikov I'm open to ideas as to how to design a test suite component for this. In this case, I have a 50MB send stream saved in a file which was used for testing. I suppose the test suite could try to fabricate the send stream but there would be no guarantee that the send stream would actually trigger the bug in the first place. The bug fixed here is highly timing dependent. I think at some point, the test suite infrastructure ought to have the ability to fetch a known-triggering send stream from somewhere. Maybe we could store them in the zfs-images repository? I'll try to work up a test case script which creates the send stream as I did manually because I think it's the best we can do now.

@codecov
Copy link

codecov bot commented Oct 29, 2017

Codecov Report

Merging #6797 into master will decrease coverage by 0.08%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6797      +/-   ##
==========================================
- Coverage   75.35%   75.26%   -0.09%     
==========================================
  Files         297      297              
  Lines       94422    94433      +11     
==========================================
- Hits        71151    71077      -74     
- Misses      23271    23356      +85
Flag Coverage Δ
#kernel 74.69% <75%> (+0.15%) ⬆️
#user 67.55% <75%> (-0.2%) ⬇️
Impacted Files Coverage Δ
module/zfs/arc.c 89.09% <100%> (+0.19%) ⬆️
module/zfs/dbuf.c 94.71% <72.22%> (-0.26%) ⬇️
cmd/zed/agents/zfs_agents.c 75.78% <0%> (-15.63%) ⬇️
module/zfs/dmu_tx.c 81.67% <0%> (-4.99%) ⬇️
module/zfs/lzjb.c 96.29% <0%> (-3.71%) ⬇️
cmd/zed/agents/zfs_retire.c 77.01% <0%> (-2.49%) ⬇️
module/zfs/spa_stats.c 68.5% <0%> (-1.25%) ⬇️
module/zcommon/zfs_uio.c 91.86% <0%> (-1.17%) ⬇️
module/zfs/vdev_disk.c 69.75% <0%> (-1.07%) ⬇️
module/zfs/spa.c 83.1% <0%> (-1.02%) ⬇️
... and 41 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4ae39a...94219cb. Read the comment docs.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks right to me, nice find! Review comments inline.

As for adding a test case, normally I'm all for it. But I'd expect this to be a difficult issue to hit so it may not be worth the complexity. @dweeezil unless you've thought of a clever way to generate a stream which reliably hits this I'm OK with not including a test case for this one.

@@ -2784,15 +2789,45 @@ __dbuf_hold_impl(struct dbuf_hold_impl_data *dh)
dh->dh_dn->dn_object != DMU_META_DNODE_OBJECT &&
dh->dh_db->db_state == DB_CACHED && dh->dh_db->db_data_pending) {
dh->dh_dr = dh->dh_db->db_data_pending;

if (dh->dh_dr->dt.dl.dr_data == dh->dh_db->db_buf) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of readability what do you think about moving the contents of this conditional in to a helper function. We can declare it noinline static void to prevent gcc from inline it and increasing the stack depth. This should also let you avoid having to add additional member to the dbuf_hold_impl_data struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which conditional are you referring to? In line 2792, both dh_dr and dh_db were already in the dbuf_hold_impl_data struct. I'm not sure how this could be made more readable versus the rest of the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweeezil sorry I wasn't very clear. What I'm suggesting is pulling this entire block out in to a helper function like this, behlendorf/zfs@950e2b1 (untested). As long as the function is marked noinline we should be able to prevent gcc from inlining it and contributing to the stack depth. Then we can use some local variable to make it more readable. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@behlendorf OK, that makes sense. I'll refactor it as you suggest and rerun the tests. For future regression testing, maybe we could stash the send stream I'm using (50MiB) in the zfs-images repo or maybe even make a new repo for send streams.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would make sense. Another idea which was floated was to provide a counterpart to zstreamdump which would make it easy to generate interesting streams for testing.

@dweeezil
Copy link
Contributor Author

@behlendorf Regarding a test case, I could give it a try, but as I alluded to in a previous comment, I'm not sure how reliable it would be. My plan for a test case would be to concoct a test which did something similar to what I did to reproduce this in the first place:

  • Create a pool with dedup and compression enabled on the top-level dataset
  • Create a new dataset in the pool
  • Repeatedly copy /usr/bin into different directories in the dataset
    • cp -a /usr/bin /tank/ds/bin1
    • cp -a /usr/bin /tank/ds/bin2
    • ... a bunch more times
  • Snapshot the dataset
  • Produce a compressed deduped send stream from the dataset and receive it to a new dataset

Then compare the received dataset against the original dataset. Using /usr/bin can be a pain because in most distros the /usr/bin directory contains a bunch of "wide" symlinks which aren't handled properly by a simple "diff -ur" between the source and the copy.

Also, as I suggested in a previous comment, there's no guarantee that such a send stream would even trigger the bug so the test wouldn't be completely reliable in the first place.

@behlendorf
Copy link
Contributor

I'm ok with relying on manual testing for this one.

Signed-off-by: Tim Chase <tim@chase2k.com>
@dweeezil
Copy link
Contributor Author

dweeezil commented Nov 6, 2017

I just re-pushed the branch with @behlendorf's re-factoring and added a comment to the new function. For the moment, the re-factoring is a separate commit to preserve the original patch. They can be squashed before merging.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweeezil thanks. It looks like you got really unlucky on the test run with unrelated problems, I've resubmitted the failed builds just to make sure but this looks good to me. Let's get @grwilson's thoughts on the patch.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@behlendorf behlendorf merged commit 71a24c3 into openzfs:master Nov 8, 2017
@behlendorf
Copy link
Contributor

@dweeezil merged, thanks for getting to the root cause. Since this issue impacts the 0.7.x releases could you also open a new PR against the zfs-0.7-release branch sans the encryption portion.

@behlendorf behlendorf added this to PR Needed for 0.7.4 in 0.7.4 Nov 8, 2017
@dweeezil
Copy link
Contributor Author

dweeezil commented Nov 9, 2017

@behlendorf OK, I'll get an 0.7 version made ASAP.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Nov 13, 2017
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed buffers.  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling compressed buffers with
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle compressed buffers.

Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742
Closes openzfs#6797
dweeezil added a commit to dweeezil/zfs that referenced this pull request Nov 13, 2017
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed buffers.  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling compressed buffers with
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle compressed buffers.

Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742
Closes openzfs#6797

Requires-spl: refs/heads/spl-0.7-release
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Nov 21, 2017
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742
Closes openzfs#6797
@behlendorf behlendorf moved this from PR Needed for 0.7.4 to Merged to 0.7.4 in 0.7.4 Dec 12, 2017
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Jan 29, 2018
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742
Closes openzfs#6797
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742
Closes openzfs#6797
FransUrbo pushed a commit to FransUrbo/zfs that referenced this pull request Apr 28, 2019
In __dbuf_hold_impl(), if a buffer is currently syncing and is still
referenced from db_data, a copy is made in case it is dirtied again in
the txg.  Previously, the buffer for the copy was simply allocated with
arc_alloc_buf() which doesn't handle compressed or encrypted buffers
(which are a special case of a compressed buffer).  The result was
typically an invalid memory access because the newly-allocated buffer
was of the uncompressed size.

This commit fixes the problem by handling the 2 compressed cases,
encrypted and unencrypted, respectively, with arc_alloc_raw_buf() and
arc_alloc_compressed_buf().

Although using the proper allocation functions fixes the invalid memory
access by allocating a buffer of the compressed size, another unrelated
issue made it impossible to properly detect compressed buffers in the
first place.  The header's compression flag was set to ZIO_COMPRESS_OFF
in arc_write() when it was possible that an attached buffer was actually
compressed.  This commit adds logic to only set ZIO_COMPRESS_OFF in
the non-ZIO_RAW case which wil handle both cases of compressed buffers
(encrypted or unencrypted).

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#5742 
Closes openzfs#6797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.4
Merged to 0.7.4
Development

Successfully merging this pull request may close these issues.

BUG: unable to handle kernel paging request at ffffc90029404000
4 participants