-
Notifications
You must be signed in to change notification settings - Fork 914
oshmem: fix race condition on new contexts #7065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jladd-mlnx fyi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the race?
oshmem/mca/spml/ucx/spml_ucx.c
Outdated
@@ -698,6 +698,11 @@ int mca_spml_ucx_ctx_create(long options, shmem_ctx_t *ctx) | |||
opal_progress_register(spml_ucx_ctx_progress); | |||
} | |||
|
|||
if (options & SHMEM_CTX_PRIVATE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: could avoid code duplication if handle if (!(options & SHMEM_CTX_PRIVATE))
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, @janjust you can put
https://github.com/open-mpi/ompi/pull/7065/files#diff-8af288f2560951b1798fcbf1131caa78L701-L703
into the if statement as
https://github.com/open-mpi/ompi/pull/7065/files#diff-8af288f2560951b1798fcbf1131caa78R702-R703
and https://github.com/open-mpi/ompi/pull/7065/files#diff-8af288f2560951b1798fcbf1131caa78R710-R711
are essentially the same.
bot:ompi:retest |
@brminich the race is when the private context that is not lock-protected is being progressed bo some other thread as part of opal progress while the thread that owns this private context is accessing it. |
Per discussion with @manjugv: |
PR isn't ready yet, don't merge |
07bb274
to
81c2e23
Compare
@brminich please have a look.
|
1) Race condition: Do not add private contexts to active list. Private contexts are only visible to the user. 2) Recycled contexts: Destroyed contexts are put on an idle list until finalize, continuous context creation will lead to oom condition. Instead, check if context from idle list meets new context requirements and reuse it. Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>, Manjunath Gorentla Venkata <manjunath@mellanox.com> Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
@artpol84 pushed the latest, and tested |
@janjust is this PR ready to merge now? |
@hppritcha yes |
Do not add private contexts to active list, they need only be visible to user
Signed-off-by: Tomislav Janjusic tomislavj@mellanox.com
Signed-off-by: Artem Y. Polyakov artemp@mellanox.com