-
Notifications
You must be signed in to change notification settings - Fork 935
UCX support for ompi and oshmem #1008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Adding @bosilca |
|
Need to ensure this works with the new add procs behavior. We need to be able to support the direct modex. As far as I can tell, there is only support for bulk add procs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bosilca Are you ok with adding this member to the ompi_datatype_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This addition only affects the OMPI datatypes, everything below (and the real datatype engine is down in OPAL) remains unaffected.
|
@jladd-mlnx UCX itself does not make any assumptions about wire up protocols used by the above layer. Upper layers may use direct/inderect/bulk wire ups. |
|
@shamisp I think @jladd-mlnx was referring to the recent change in the PML add_procs() behavior to allow partial wireup. His question is not related to wire-line protocols. |
|
@jsquyres - I reviewed this code as well already existing code in the master. Our add_proc() behavior is aligned with the add_proc behavior in the master. |
|
@shamisp Ok, sweet. |
|
FWIW, the "OSHMEM/SPML/IKRIT: fixes typo in .h file" commit should be squashed down into the commit that it is fixing before this is merged in. |
|
@jsquyres - sure, we will squash it once we address all the comments. |
b025251 to
e6be425
Compare
|
Test FAILed. |
|
Which version of ucx do I need to compile this code? |
|
@hppritcha https://github.com/yosefe/ucx/tree/topic/nb-tag-matching |
|
I'm not sure if |
|
hello_c timed out after 3 minutes. Does that happen if you --disable-dlopen with ucx? |
|
Hi Jeff. That may be spurious (the hang). |
|
bot:retest |
|
Has anyone configured with --enable-picky? I get lots of warnings with it configured that way. |
|
@rolfv I've fixed the warning in OMPI code, some warnings come from UCX headers, and this is addressed by openucx/ucx#448 |
|
Thanks, things look much better. |
|
Updated to latest ucx and then recompiled Open MPI and I know see this: |
|
ucx probe is not merged yet, we just need to finalize the api |
|
@yosefe Did you commit something to Open MPI that does not yet exist in the downstream UCX API? That does not seem like a good idea. |
|
@jsquyres Yes, i've opened this PR before all code was merged to UCX, to get comments and address them as early as possibly. Obviously, this will not be merged to Open MPI before all UCX code is in place. |
|
@yosefe Ah, got it -- the code in question is just here on the PR, not already in master. Thanks for the clarification. |
Fast path lookup is done in inline funcion.
PML UCX will use it to cache a handle for UCX datatype.
a17394e to
a313588
Compare
|
squashed the various fixes into the commits. |
|
👍 |
|
@yosefe - please integrate this PR into v2.x and v1.10.1 |
UCX support for ompi and oshmem
osc/pt2pt: do no use OPAL_THREAD_ADD64 for lock serial number
@miked-mellanox @shamisp @yosefe @jladd-mlnx
please take a look