Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnode_sync is careless with range tree #10823

Merged
merged 1 commit into from
Aug 27, 2020
Merged

Conversation

pfmooney
Copy link
Contributor

Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe. No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Signed-off-by: Patrick Mooney pmooney@oxide.computer
Closes #10708

Motivation and Context

While running a specific workload under a DEBUG illumos kernel, we observed panics in dnode_block_freed() as it accessed what was then an invalid zfs btree. This btree was undergoing a range_tree_vacate() operation from another thread, which had dropped dn_mtx on the dnode while it synced the free state. This is written up in #10708 and illumos #13034.

Description

Since the underlying implementation of range_tree_vacate() states that while the vacate is under way, no other operations are valid on the data structure, using a callback to perform the sync as part of vacate is impossible to do safely. Instead a separate range_tree_walk() call can perform the syncs (which drop the dn_mtx) followed by a callback-less range_tree_vacate() call which empties that element of dn_free_ranges[] under the safety of dn_mtx.

How Has This Been Tested?

This bug was only observed under very specific circumstances (a HVM workload atop a zvol, while running a DEBUG kernel). Re-running the workload was a fairly reliable reproducer, so the absence of a crash when running a DEBUG kernel featuring the fix was taken as a positive result.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain Signed-off-by.

Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes openzfs#10708
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 26, 2020
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 26, 2020
@ikozhukhov
Copy link
Contributor

i hope it'll fix my one old known panic too

@codecov
Copy link

codecov bot commented Aug 27, 2020

Codecov Report

Merging #10823 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10823      +/-   ##
==========================================
+ Coverage   79.69%   79.78%   +0.09%     
==========================================
  Files         395      395              
  Lines      125044   125045       +1     
==========================================
+ Hits        99654    99773     +119     
+ Misses      25390    25272     -118     
Flag Coverage Δ
#kernel 80.40% <100.00%> (+0.08%) ⬆️
#user 64.99% <100.00%> (-1.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/zfs/dnode_sync.c 93.58% <100.00%> (+0.01%) ⬆️
module/zstd/zfs_zstd.c 80.74% <0.00%> (-4.28%) ⬇️
module/zfs/vdev_rebuild.c 93.26% <0.00%> (-4.14%) ⬇️
module/zfs/vdev_indirect_mapping.c 96.61% <0.00%> (-2.42%) ⬇️
module/zfs/vdev_removal.c 94.28% <0.00%> (-2.41%) ⬇️
module/zfs/zio_compress.c 92.72% <0.00%> (-1.82%) ⬇️
module/zfs/zrlock.c 95.08% <0.00%> (-1.64%) ⬇️
module/zfs/metaslab.c 94.35% <0.00%> (-1.49%) ⬇️
module/zfs/zap_micro.c 85.35% <0.00%> (-1.26%) ⬇️
module/zfs/vdev_mirror.c 94.07% <0.00%> (-1.12%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b596585...188e839. Read the comment docs.

@behlendorf behlendorf merged commit 8d42c98 into openzfs:master Aug 27, 2020
@behlendorf behlendorf added this to Backport for OpenZFS 2.0 in OpenZFS 2.0 Aug 27, 2020
behlendorf pushed a commit that referenced this pull request Aug 27, 2020
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes #10708
Closes #10823
@behlendorf behlendorf moved this from Backport for OpenZFS 2.0 to FreeBSD Features (In Progress) in OpenZFS 2.0 Aug 28, 2020
@behlendorf behlendorf moved this from FreeBSD Features (In Progress) to Done in OpenZFS 2.0 Aug 30, 2020
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes openzfs#10708 
Closes openzfs#10823
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe.  No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Signed-off-by: Patrick Mooney <pmooney@oxide.computer>
Closes openzfs#10708 
Closes openzfs#10823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
No open projects
OpenZFS 2.0
  
Done
Development

Successfully merging this pull request may close these issues.

dnode_sync is careless with range tree
4 participants