Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent race in blkptr_verify against device removal #9112

Merged
merged 1 commit into from Aug 14, 2019

Conversation

pcd1193182
Copy link
Contributor

Motivation and Context

When we check the vdev of the blkptr in zfs_blkptr_verify, we can run into a race condition where that vdev is temporarily unavailable. This happens when a device removal operation and the old vdev_t has been removed from the array, but the new indirect vdev has not yet been inserted.

Description

We hold the spa_config_lock while doing our sensitive verification. To ensure that we don't deadlock, we only grab the lock if we don't have config_writer held. In addition, I had to const the tags of the refcounts and the spa_config_lock arguments.

How Has This Been Tested?

zfs-test, zloop

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a correctness standpoint this makes sense to me. I just have a nit and two questions below:
[1] Do you have a stack trace or anything from the original issue that we add in the description of the PR or something, so it is at least a little bit searchable.
[2] Do we know if that change has any performance implication due to any introduced lock contention between zio_reads() that try to grab the lock under different circumstances (e.g. some reads are blocked because they want to grab that lock as writers and wait for all the reads that hold the lock as readers to exit the critical section)...or that's not even possible?

module/zfs/zio.c Show resolved Hide resolved
@pcd1193182
Copy link
Contributor Author

@sdimitro 1 is done, 2 I haven't tested, but it shouldn't be much of an issue because we grab this config lock else elsewhere in the IO path. This is the lowest level config lock, and it shouldn't have much contention on it.

Signed-off-by: Paul Dagnelie <pcd@delphix.com>
@ahrens ahrens added the Status: Code Review Needed Ready for review and testing label Aug 5, 2019
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 13, 2019
@behlendorf behlendorf merged commit dc04a8c into openzfs:master Aug 14, 2019
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 24, 2019
When we check the vdev of the blkptr in zfs_blkptr_verify, we can run
into a race condition where that vdev is temporarily unavailable. This
happens when a device removal operation and the old vdev_t has been
removed from the array, but the new indirect vdev has not yet been
inserted.

We hold the spa_config_lock while doing our sensitive verification.
To ensure that we don't deadlock, we only grab the lock if we don't
have config_writer held. In addition, I had to const the tags of the
refcounts and the spa_config_lock arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#9112
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
When we check the vdev of the blkptr in zfs_blkptr_verify, we can run
into a race condition where that vdev is temporarily unavailable. This
happens when a device removal operation and the old vdev_t has been
removed from the array, but the new indirect vdev has not yet been
inserted.

We hold the spa_config_lock while doing our sensitive verification.
To ensure that we don't deadlock, we only grab the lock if we don't
have config_writer held. In addition, I had to const the tags of the
refcounts and the spa_config_lock arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes openzfs#9112
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
When we check the vdev of the blkptr in zfs_blkptr_verify, we can run
into a race condition where that vdev is temporarily unavailable. This
happens when a device removal operation and the old vdev_t has been
removed from the array, but the new indirect vdev has not yet been
inserted.

We hold the spa_config_lock while doing our sensitive verification.
To ensure that we don't deadlock, we only grab the lock if we don't
have config_writer held. In addition, I had to const the tags of the
refcounts and the spa_config_lock arguments.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
Closes #9112
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants