-
Notifications
You must be signed in to change notification settings - Fork 934
ompi: ompi_mpi_init(): do not export threading level to modex. #4826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is a minor enhancement to reduce the modex size by 20% for some of the configurations. |
|
We were using it to choose between multiple algorithm to select communicator ids. But that functionality has been disabled for some time. |
|
Thankyou, George. This was my understanding as well. |
|
Yup. I tried to remove it awhile ago. If you succeed 👍 |
|
I have no objections - I'd only point out that the percentage is a little misleading. It's a big percentage because we have emptied out the modex, not because the threading level is a large blob of info (I think we pass an int since that is the MPI standard, which means 4 bytes/proc). Still worth removing if it isn't being used. |
jjhursey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. Reducing the modex size even if just by a little per-process is generally a good thing.
One minor change request:
Since you are removing the use of threadlevel_bf in this function, can you remove the declaration:
uint8_t threadlevel_bf;
|
@rhc54, our measurements showed 40 bytes reduction per process. In our case modex was reduced from 199 B down to 159B, which is 20%. |
|
This is per proc |
|
We are continuing looking into that as out of 158B UCX endpoint is only 50B. All the rest is PMIx overhead (200%!!). |
For some of our configuration this flag increases per-process contribution by ~20% while it is not being used currently. The consumer of this flag was communicator ID calculation logic, but it was changed in 0bf06de. Signed-off-by: Artem Polyakov <artpol84@gmail.com>
b296e2c to
b601dd5
Compare
|
@jjhursey, thank you. Addressed. |
|
Not sure how it can be PMIx-driven, @artpol84 - PMIx doesn't put anything in the modex. OMPI adds things in various places, and ORTE adds something (on a per-node basis) to tag the operation since multiple fences can be occurring in parallel. The numbers you describe sound like they are coming from debug buffers - i.e., buffers that are fully described, which means they carry all the extra data identifying data types etc. You might check as there is a considerable size difference when you switch to optimized buffers. |
|
Thank you for the hint @rhc54. |
|
@rhc54 to clarify here:
|
|
@jsquyres @bwbarrett @jladd-mlnx Do you think we can take it into v3.1? Seems self-contained and harmless. Though might violate our rules, |
|
Okay - not unexpected then. Each payload has to be tagged with the identity of the proc that contributed it, so nothing unusual there. Only thing we could perhaps do is tag the fence as involving only procs from one nspace, and then mark the payload with just the rank. |
|
@artpol84 I'm fine with these changes in 3.1. |
|
@bwbarrett great! Will cherry-pick now. |
For some of our configuration this flag increases per-process contribution
by ~20% while it is not being used currently.
The consumer of this flag was communicator ID calculation logic, but it was
changed in 0bf06de.
Signed-off-by: Artem Polyakov artpol84@gmail.com