Relax the definition of SHMEM_TEAM_SHARED#307
Relax the definition of SHMEM_TEAM_SHARED#307davidozog wants to merge 1 commit intoopenshmem-org:masterfrom
Conversation
SHMEM_TEAM_SHARED to include only a subset of the PEs that are accessible via shmem_ptr, not necessarily all of them. Signed-off-by: David Ozog <david.m.ozog@intel.com>
|
@davidozog I actually think it might be more straightforward to user if the spec ensures that |
|
Whoops, sorry for the slow response @minsii. Implementations (like the SOS prototype) may prefer to represent teams using only a (start, stride, size) triplet. As of now, the entire teams proposal supports this design, except for This does not prevent implementations from including all the accessible PEs if they want to. I hope that the SOS prototype supports arbitrary PE sets eventually, but it would require implementing lookup-table translations between teams and is potentially a bit less efficient. |
|
Thanks for the explanation. I assume that SOS needs store network address related information for each remote PE, and any communication over |
|
Yes, that's pretty much correct (except in the case when we enable XPMEM or CMA, then communication may not go through the network). I was also referring to the efficiency of the |
|
I do not think the cost of |
|
I'm sure it depends on the implementation, but the translation cost may also affect RMA and AMO operations if the implementation needs to use the world-based PE numbers. We already know we need table lookups for RKEYs and whatnot; the current rigidity of the teams structure (linear-stride PE triplet) is intended to minimize the overhead of these translations. I would expect that at implementation with an RKEY table indexes this table by world-based PE numbers, and point-to-point operations on a team would need to know the world-based PE number to provide such an index. In that case, the translation overhead would appear in the critical path for low-latency operations. #JustAUserNotAnImplementer |
|
Yes - on the critical path of every RMA and AMO operation (outside the default context), the SOS prototype does a translation to This is a little different than |
|
I believe the lookup table overhead can be nonnegligible on issuing side (we have to pay the cost in MPI in some cases :-( ). Anyway, I agree that the spec should allow the implementation to define the scope of However, from a user's standpoint, is it confusing that both If the above understanding is correct, I'd think it is better to have a consistent PE scope for both routines. |
|
Initially, the motivation for this relaxation made sense to me. But, I'm getting confused. To me, there is enough leeway in the current specification to implement a complete lookup table free PE subset list for all possible configurations on every team.
@davidozog Are you saying that even for RMA and AMO operations performed through I can understand this argument for |
|
@naveen-rn Sorry for the confusion - I don't believe this discussion regarding the performance impact of PE index translation is a strong motivation for the change. To me, it's is a pertinent side-discussion. The primary motivation for the proposal is that the definition of I think @minsii is correct that implementations do have the freedom to change the behavior of Anyway, regarding Naveen's question:
I'm not saying "you" need to do it, only that SOS currently does it. It's definitely not an absolute requirement. |
Either approach is fine with me, if I was a user. If we go with the latter, I think the spec should clearly define the expected usage of the two routines in order to avoid any confusion. |
|
The collectives requirement (overlapping AS and Teams) make sense. |
|
I suggest that we do a quick straw poll. Please vote for your preference, as follows: 👍 -- 👎 -- |
|
On Nov 20, 2019, at 7:19 AM, James Dinan ***@***.***> wrote:
I suggest that we do a quick straw poll. Please vote for your preference, as follows:
👍 -- SHMEM_TEAM_SHARED and shmem_ptr must match (keep the current semantic). This may require an implementation to restrict shmem_ptr to what can be captured in SHMEM_TEAM_SHARED in implementations that are strictly <start, stride, size>.
👎 -- SHMEM_TEAM_SHARED is a subset of shmem_ptr. This provides more flexibility to implementors and users. In most cases, user complexity is opt-in if apps choose to discover more shared memory than is captured by SHMEM_TEAM_SHARED.
I’d go for (potential) subset, seems like a bit of future-proofing for interesting architectures.
Tony
|
|
I voted |
|
|
@naveen-rn It sounds like you are advocating for |
|
@jdinan Ah, I see the issue. I'm not sure, how to vote. May be I should clarify my previous statement with an example. Let us take an example job config where the following PEs share the same node: 0, 1, 7, 8, 100, 101 and all these PEs are capable of accessing other PEs SHEAP using As per the options to vote:
|
|
The relaxed option can help optimize OpenSHMEM implementation for a use case that I would identify as unusual for this community. |
|
@naveen-rn 's example captures my opinion exactly! |
|
Thank you all for the discussion and helpful examples. Looks like we have 3 votes for each option. A split like this doesn't seem to support a change to the existing text. Anybody have a case we should think about that would better motivate the change to |
|
On Dec 4, 2019, at 3:16 PM, James Dinan ***@***.***> wrote:
Thank you all for the discussion and helpful examples. Looks like we have 3 votes for each option. A split like this doesn't seem to support a change to the existing text. Anybody have a case we should think about that would better motivate the change to SHMEM_TEAM_SHARED <= shmem_ptr?
I was mulling over this looking for loopholes and edge cases…so here’s a riff…
SHMEM_TEAM_SHARED does indeed seem to imply PEs where shmem_ptr() returns non-NULL.
I was thinking of a situation where you have some kind of future architecture SoC nodes that contain e.g. federated NUMA islands potentially involving accelerators, so that some PEs can do shmem_ptr() with each other and others can’t, despite being on the same “compute node”.
What kind of team structure would help that?
Tony
|
|
@tonycurtis - you raise a good point. During the RMA WG call yesterday, we discussed similar issues with the definition of
Anyway, I'm not sure where these observations take us, but it makes me leery of defining |
|
On Dec 6, 2019, at 2:02 PM, David Ozog ***@***.***> wrote:
@tonycurtis <https://github.com/tonycurtis> - you raise a good point.
During the RMA WG call yesterday, we discussed similar issues with the definition of SHMEM_TEAM_SHARED. Let me capture a few observations we made:
shmem_ptr may not have symmetry with regard to accessibility. For instance, PE 1 may return non-null values from shmem_ptr when passing PE 2, but perhaps PE 2 may return NULL values when passing PE 1. Mustn't we require shmem_ptr to have such symmetry for the definition of SHMEM_TEAM_SHARED to make sense?
shmem_ptr may not return a consistent (null or non-null) value for ALL symmetric objects. For instance, a PE 1 might return non-null for object A on PE 2, but NULL for object B on PE 2. In particular, @jdinan <https://github.com/jdinan> says that OpenMPI-SHMEM returns null for objects on the data segment and non-null for objects on the symmetric heap.
We just updated OSSS-UCX to handle self-PE in bss/[ro]data (Thanks @wenblu).
Tony
|
|
Closing this PR, which is superseded by #325. |
This PR changes
SHMEM_TEAM_SHAREDto include only a subset of the PEs that are accessible viashmem_ptr, not necessarily all of them. This would better support implementations that track teams with a (start, stride, size) triplet, as opposed to a table/list of PEs.