Skip to content

Conversation

janjust
Copy link
Contributor

@janjust janjust commented Oct 9, 2019

Do not add private contexts to active list, they need only be visible to user

Signed-off-by: Tomislav Janjusic tomislavj@mellanox.com
Signed-off-by: Artem Y. Polyakov artemp@mellanox.com

@janjust janjust requested review from artpol84 and brminich October 9, 2019 16:32
@janjust
Copy link
Contributor Author

janjust commented Oct 9, 2019

@jladd-mlnx fyi

@jladd-mlnx jladd-mlnx added the bug label Oct 9, 2019
@jladd-mlnx jladd-mlnx added this to the v4.0.3 milestone Oct 9, 2019
Copy link
Member

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the race?

@@ -698,6 +698,11 @@ int mca_spml_ucx_ctx_create(long options, shmem_ctx_t *ctx)
opal_progress_register(spml_ucx_ctx_progress);
}

if (options & SHMEM_CTX_PRIVATE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: could avoid code duplication if handle if (!(options & SHMEM_CTX_PRIVATE)) instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hppritcha
Copy link
Member

bot:ompi:retest

@artpol84
Copy link
Contributor

@brminich the race is when the private context that is not lock-protected is being progressed bo some other thread as part of opal progress while the thread that owns this private context is accessing it.

@artpol84
Copy link
Contributor

Per discussion with @manjugv:
A private context should only be progressed by the thread that has created it. This allows avoiding locking around it to get the best performance.
This PR excludes private contexts from being included in the list of contexts to be processed during opal progress so it won't be implicitly progressed by an arbitrary thread calling progress.

@janjust
Copy link
Contributor Author

janjust commented Oct 29, 2019

PR isn't ready yet, don't merge

@janjust janjust force-pushed the master branch 2 times, most recently from 07bb274 to 81c2e23 Compare November 21, 2019 22:25
@janjust janjust requested review from artpol84 and brminich November 21, 2019 22:26
@artpol84
Copy link
Contributor

@brminich please have a look.
This PR addresses 2 issues now:

  • Race condition for PRIV contexts
  • Memory leak when a large number of contexts is created/destroyed. Now we reuse released contexts when possible.

1) Race condition: Do not add private contexts to active list.
Private contexts are only visible to the user.
2) Recycled contexts: Destroyed contexts are put on an idle list until
finalize, continuous context creation will lead to oom condition.
Instead, check if context from idle list meets new context requirements
and reuse it.

Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>,
                  Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
@janjust
Copy link
Contributor Author

janjust commented Nov 23, 2019

@artpol84 pushed the latest, and tested

@hppritcha
Copy link
Member

@janjust is this PR ready to merge now?

@janjust
Copy link
Contributor Author

janjust commented Nov 25, 2019

@hppritcha yes

@hppritcha hppritcha removed this from the v4.0.3 milestone Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants