Skip to content

Conversation

@artpol84
Copy link
Contributor

For some of our configuration this flag increases per-process contribution
by ~20% while it is not being used currently.

The consumer of this flag was communicator ID calculation logic, but it was
changed in 0bf06de.

Signed-off-by: Artem Polyakov artpol84@gmail.com

@artpol84 artpol84 requested a review from jjhursey February 16, 2018 21:55
@artpol84
Copy link
Contributor Author

This is a minor enhancement to reduce the modex size by 20% for some of the configurations.
I'd like to hear what others think about this flag and it's possible usages.

@bosilca
Copy link
Member

bosilca commented Feb 16, 2018

We were using it to choose between multiple algorithm to select communicator ids. But that functionality has been disabled for some time.

@artpol84
Copy link
Contributor Author

Thankyou, George. This was my understanding as well.

@hjelmn
Copy link
Member

hjelmn commented Feb 17, 2018

Yup. I tried to remove it awhile ago. If you succeed 👍

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2018

I have no objections - I'd only point out that the percentage is a little misleading. It's a big percentage because we have emptied out the modex, not because the threading level is a large blob of info (I think we pass an int since that is the MPI standard, which means 4 bytes/proc). Still worth removing if it isn't being used.

Copy link
Member

@jjhursey jjhursey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. Reducing the modex size even if just by a little per-process is generally a good thing.

One minor change request:
Since you are removing the use of threadlevel_bf in this function, can you remove the declaration:
uint8_t threadlevel_bf;

@artpol84
Copy link
Contributor Author

artpol84 commented Feb 17, 2018

@rhc54, our measurements showed 40 bytes reduction per process.
Key name alone is 17 bytes and it is in modex.
Still 40 bytes is somewhat high but we haven’t looked closer there. Might be additional bufrops overhead.

In our case modex was reduced from 199 B down to 159B, which is 20%.

@artpol84
Copy link
Contributor Author

This is per proc

@artpol84
Copy link
Contributor Author

We are continuing looking into that as out of 158B UCX endpoint is only 50B. All the rest is PMIx overhead (200%!!).

For some of our configuration this flag increases per-process contribution
by ~20% while it is not being used currently.

The consumer of this flag was communicator ID calculation logic, but it was
changed in 0bf06de.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
@artpol84
Copy link
Contributor Author

@jjhursey, thank you. Addressed.

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2018

Not sure how it can be PMIx-driven, @artpol84 - PMIx doesn't put anything in the modex. OMPI adds things in various places, and ORTE adds something (on a per-node basis) to tag the operation since multiple fences can be occurring in parallel.

The numbers you describe sound like they are coming from debug buffers - i.e., buffers that are fully described, which means they carry all the extra data identifying data types etc. You might check as there is a considerable size difference when you switch to optimized buffers.

@artpol84
Copy link
Contributor Author

Thank you for the hint @rhc54.
As I've said we haven't yet looked closely on why it was 40B. We only knew that there was ~20B of real payload consisted from the key name and its value.
It is quite possible that @karasevb was using non-optimized build during this measurements. We will need to revisit it.

@artpol84 artpol84 merged commit cd35c49 into open-mpi:master Feb 17, 2018
@artpol84
Copy link
Contributor Author

artpol84 commented Feb 17, 2018

@rhc54 to clarify here:

  • we were inspecting modex dumps and we only saw 2 keys: Thread level and UCX endpoint. Key names are clearly visible there as the dump is both in HEX and ASCII.
  • One thing that was also contribution to the modex size was namespace and proc description.
  • There also were areas of binary data that we haven't yet pinpointed. I think that it is likely that your guess is correct and this is a full descriptive buffers with debug info.

@artpol84
Copy link
Contributor Author

@jsquyres @bwbarrett @jladd-mlnx Do you think we can take it into v3.1?

Seems self-contained and harmless. Though might violate our rules,
I'd really like to have it in 3.1 so I was unable to resist to suggest this :).

@rhc54
Copy link
Contributor

rhc54 commented Feb 17, 2018

Okay - not unexpected then. Each payload has to be tagged with the identity of the proc that contributed it, so nothing unusual there. Only thing we could perhaps do is tag the fence as involving only procs from one nspace, and then mark the payload with just the rank.

@bwbarrett
Copy link
Member

@artpol84 I'm fine with these changes in 3.1.

@artpol84
Copy link
Contributor Author

@bwbarrett great! Will cherry-pick now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants