Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new FI_ORDER bits for RMA and atomics #5019

Merged
merged 2 commits into from
May 14, 2019
Merged

Conversation

shefty
Copy link
Member

@shefty shefty commented May 9, 2019

The new bits allow for ordering within RMA operations or atomic operations, without requiring ordering between RMA and atomics, as the current bits do.

shefty added 2 commits May 8, 2019 04:05
The current FI_ORDER bits combine RMA and atomic ordering together
under a single set of bits.  However, it's possible for both apps
and providers to have these unordered wrt each other.

Introduce a new set of bits that allow ordering RMA with other RMA
operations, and atomic with other atomics, but keep RMA and atomics
separate.

This does not change the existing bits, but allows for fine grain
control over ordering.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
The existing RAR, RAW, WAR, and WAW ordering bits automatically
imply that the equivalent RMA and ATOMIC ordering.  Enable
those bits in the provider and core.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
@shefty
Copy link
Member Author

shefty commented May 9, 2019

@j-xiong @a-ilango @aingerson @mblockso @vkrishna -- These changes touch most of the providers.

@hppritcha - This doesn't change the gni provider. I only see SAS support listed in that provider. But maybe the new bits might apply, where the old ones didn't fit?

@shefty
Copy link
Member Author

shefty commented May 9, 2019

@pkcoff - adding Paul for BGQ updates

@mblockso
Copy link
Contributor

mblockso commented May 9, 2019

BGQ code looks ok to me.

Copy link
Contributor

@arn314 arn314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verbs, rxm changes look okay.

@shefty
Copy link
Member Author

shefty commented May 9, 2019

The windows build is having issues connecting to github. The tests that do connect, work.

This is from the Intel CI:

 Starting test 83-1-1: [shm, latency, sendmsg--, FI_EP_RDM, FI_AV_UNSPEC, eq_wait_none, cq_wait_none, cntr_wait_none, comp_cntr -- tx (bind-FI_SELECTIVE_COMPLETION, op-FI_COMPLETION), rx: (bind-NONE, op-NONE),  FI_PROGRESS_MANUAL, [], [], [FI_MSG, FI_RECV, FI_SEND]]
    lat                                               16      10k     312k        0.07s      4.46       3.59       0.28
    lat                                               32      10k     625k        0.07s      9.04       3.54       0.28
    lat                                               64      10k     1.2m        0.07s     18.81       3.40       0.29
    lat                                               128     10k     2.4m        0.08s     34.08       3.76       0.27
    lat                                               192     10k     3.6m        0.07s     53.31       3.60       0.28
    lat                                               256     10k     4.8m        0.07s     70.44       3.63       0.28
    lat                                               384     10k     7.3m        0.07s    105.17       3.65       0.27
    lat                                               512     10k     9.7m        0.07s    137.04       3.74       0.27
    lat                                               768     10k     14m         0.07s    205.22       3.74       0.27
    Killed by signal 15.

It's unrelated, but I don't know what happened here. @aingerson

@j-xiong
Copy link
Contributor

j-xiong commented May 9, 2019

psm and psm2 changes look fine.

@vkrishna
Copy link
Contributor

vkrishna commented May 9, 2019

tcp changes looks good

@aingerson
Copy link
Contributor

Shm change looks good.
As for the CI, I'm not sure what's going on. It's happened a couple of times and almost always on one specific node. The last time I traced it down to some Intel CI processes that hadn't been cleaned up and were taking over the CPU causing the test to take too long and timeout but I don't see any on the node right now. I'll look into it but for now I'll just trigger a retest and see if it happens again.

@shefty shefty merged commit e790e6b into ofiwg:master May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants