Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2523 (DEBUG) #2658

Closed
wants to merge 2 commits into from
Closed

Issue 2523 (DEBUG) #2658

wants to merge 2 commits into from

Conversation

behlendorf
Copy link
Contributor

These are debugging patches designed to help determine what is happened for issue #2523.

@behlendorf
Copy link
Contributor Author

@nedbass could you please review these debug patches. I'd like to merge them to master so we can start ruling out some of the possible causes for #2523.

From the information gathered about issue openzfs#2523 it's clear that
somehow the spin lock is being damaged.  This could occur if the
spin lock is reinitialized, the memory is accidentally overwritten,
or freed back to the cache to soon.  To determine exactly what is
happening this patch adds a couple new sanity tests.

* A zio->io_magic field is added before the io_lock.  This field
  is designed to act as a red zone allow us to detect if the zio
  has been written.

* The zio->io_magic field is also used to detect if somehow the
  constructor or destructor is running multiple for the object.
  This would effectively cause the spin lock to be reinitialized.

* The destructor has been updated to poison the entire structure.
  This should cause us to quickly detect any use-after-free bugs.

Once the root cause of this issue can be determined this patch
should be reverted.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2523
There exists a plausible cache concurrency issue with zio_wait().
This might occur because the zio->io_waiter to not assigned under
a lock in zio_wait(), is not checked under a lock in zio_done(),
and the zio may be dispatched to another thread for handling.
That said, none of the actual crash dumps I've looked at show
that this has ever occurred.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2523
@behlendorf
Copy link
Contributor Author

Closing the real root cause has been identified.

@behlendorf behlendorf closed this Dec 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant