-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnode_sync is careless with range tree #10823
Conversation
Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Signed-off-by: Patrick Mooney <pmooney@oxide.computer> Closes openzfs#10708
i hope it'll fix my one old known panic too |
Codecov Report
@@ Coverage Diff @@
## master #10823 +/- ##
==========================================
+ Coverage 79.69% 79.78% +0.09%
==========================================
Files 395 395
Lines 125044 125045 +1
==========================================
+ Hits 99654 99773 +119
+ Misses 25390 25272 -118
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Patrick Mooney <pmooney@oxide.computer> Closes #10708 Closes #10823
Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Patrick Mooney <pmooney@oxide.computer> Closes openzfs#10708 Closes openzfs#10823
Because dnode_sync_free_range() must drop dn_mtx during its processing, using it as a callback to range_tree_vacate() is not safe. No other operations (besides destroy) are allowed once range_tree_vacate() has begun, and dropping dn_mtx would leave a window open for another thread to observe that invalid (and unsafe) state via dnode_block_freed(). Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Patrick Mooney <pmooney@oxide.computer> Closes openzfs#10708 Closes openzfs#10823
Because dnode_sync_free_range() must drop dn_mtx during its processing,
using it as a callback to range_tree_vacate() is not safe. No other
operations (besides destroy) are allowed once range_tree_vacate() has
begun, and dropping dn_mtx would leave a window open for another thread
to observe that invalid (and unsafe) state via dnode_block_freed().
Signed-off-by: Patrick Mooney pmooney@oxide.computer
Closes #10708
Motivation and Context
While running a specific workload under a DEBUG illumos kernel, we observed panics in
dnode_block_freed()
as it accessed what was then an invalid zfs btree. This btree was undergoing arange_tree_vacate()
operation from another thread, which had droppeddn_mtx
on the dnode while it synced the free state. This is written up in #10708 and illumos #13034.Description
Since the underlying implementation of
range_tree_vacate()
states that while the vacate is under way, no other operations are valid on the data structure, using a callback to perform the sync as part of vacate is impossible to do safely. Instead a separaterange_tree_walk()
call can perform the syncs (which drop thedn_mtx
) followed by a callback-lessrange_tree_vacate()
call which empties that element ofdn_free_ranges[]
under the safety ofdn_mtx
.How Has This Been Tested?
This bug was only observed under very specific circumstances (a HVM workload atop a zvol, while running a DEBUG kernel). Re-running the workload was a fairly reliable reproducer, so the absence of a crash when running a DEBUG kernel featuring the fix was taken as a positive result.
Types of changes
Checklist:
Signed-off-by
.