Skip to content

Fixed a race condition in 'spl_cv_wait' that could potentially cause a thread to miss a broadcast event#324

Merged
lundman merged 1 commit intoopenzfsonwindows:masterfrom
DataCoreSoftware:zfs-449
Jan 14, 2021
Merged

Fixed a race condition in 'spl_cv_wait' that could potentially cause a thread to miss a broadcast event#324
lundman merged 1 commit intoopenzfsonwindows:masterfrom
DataCoreSoftware:zfs-449

Conversation

@arun-kv
Copy link
Copy Markdown
Contributor

@arun-kv arun-kv commented Jan 12, 2021

Moved mutex_enter above waiters_count == 1 check in spl_cv_wait, to avoid the possibility of current thread wrongly detecting that is the only thread waiting for the event while there could be another thread that acquired the mutex and is about to increment the 'waiters_count' a few lines above.

cause a thread to miss a broadcast event.
@imtiazdc
Copy link
Copy Markdown
Contributor

@lundman For some background, we ran into an issue where the dump seems to indicate there is at least one thread waiting endlessly choking other threads depending on it.

I spent some thought with @arun-kv on this change and this PR looks good to me. Is there any specific test you want us to run as this change is at the core of synchronization in ZFSin?

@lundman
Copy link
Copy Markdown
Collaborator

lundman commented Jan 13, 2021

That looks quite reasonable. I suspect there is further issue, I have had mutex_exit() panic due to not held. But it happens so infrequently it has been hard to track.

@lundman lundman merged commit 979482d into openzfsonwindows:master Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants