Queries on OpenSHMEM collectives usage #58

naveen-rn · 2016-10-19T18:14:34Z

I just happened to look at the OpenSHMEM usage in the library; it looks like that the collectives usage is little buggy. As per the OpenSHMEM standards, "Every element of this array(here pSync array) must be initialized with the value SHMEM_SYNC_VALUE (in C/C++) or SHMEM_SYNC_VALUE (in Fortran) before any of the PEs in the Active set enter the reduction routine."

Some random example from the library; In CartesianCommunicator::GlobalSumVector(double *d,int N), it looks like psync lacks initialization to SHMEM_SYNC_VALUE

paboyle · 2016-10-25T22:17:36Z

yes... it is incorrect at present. Thanks.

Hasn't been tested for some time, and fails under OpenSHMEM, and yo may have saved me a lot of time debugging this.

Also -- perhaps you can help me. I'm really annoyed that the OpenSHMEM and CraySHMEM argument ordering for shmem_align is reversed. Any comments ?

Peter

naveen-rn · 2016-10-26T16:48:22Z

I'm really annoyed that the OpenSHMEM and CraySHMEM argument ordering for shmem_align is reversed.

It was a Bug, fixed now. Fix will be available from CraySHMEM/7.5.1

This Bug went unnoticed because, shmem_align is one those least used routines in CraySHMEM and there are some fundamental functional differences between OpenSHMEM and CraySHMEM on this routine. By default, in CraySHMEM the maximum alignment value allowed on shmem_align routine is 64 bytes. And, we don't prefer users to self align anything more than 64 bytes and attempting anything more would error out. This is because, supporting alignment values greater than 64 bytes would create too much memory wastage.

That said, if there are any actual use cases which shows some performance benefits for alignments greater than 64 bytes, we can always look for ways to implement it. Let me know your shmem_align usage, I can look at it.

paboyle · 2016-10-26T21:31:44Z

The align is normally the L2 line size, which on Intel is 64Bytes, but I would prefer to support
128B on Power for example.

See no need to go to a page size for alignment, though, despite prefetches not crossing page boundaries. Rather, page sizes and cache line sizes should both go up after around 45 years....

I honestly think Intel should move to a larger L2 line size as L2 prefetch
overhead gets suppressed by the line size. Issuing L2 prefetch, L1 prefetch and load for each individual 512 bit vector is now ridiculous.

Clearly if the vectors went to 1024 bits, they and you would need >128B align, but... surely the above argument about line touching is true.

Or to put this in more "sexy" Hennessy and Patterson language, there is no way to obtain gain from spatial locality of reference in the memory system when the vector size is equal to the cache line size. :)

naveen-rn · 2016-11-07T03:15:13Z

That said, if there are any actual use cases which shows some performance benefits for alignments greater than 64 bytes, we can always look for ways to implement it. Let me know your shmem_align usage, I can look at it.

FYI, shmem_align() in Cray SHMEM is fixed. We can use any value for alignment size.

paboyle · 2017-04-26T08:00:52Z

Just a small comment -- Chris Kelly cleaned up the SHMEM comms recently and made it all work again. Closing thanks.

paboyle closed this as completed Apr 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries on OpenSHMEM collectives usage #58

Queries on OpenSHMEM collectives usage #58

naveen-rn commented Oct 19, 2016

paboyle commented Oct 25, 2016

naveen-rn commented Oct 26, 2016 •

edited

Loading

paboyle commented Oct 26, 2016

naveen-rn commented Nov 7, 2016

paboyle commented Apr 26, 2017

Queries on OpenSHMEM collectives usage #58

Queries on OpenSHMEM collectives usage #58

Comments

naveen-rn commented Oct 19, 2016

paboyle commented Oct 25, 2016

naveen-rn commented Oct 26, 2016 • edited Loading

paboyle commented Oct 26, 2016

naveen-rn commented Nov 7, 2016

paboyle commented Apr 26, 2017

naveen-rn commented Oct 26, 2016 •

edited

Loading