Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@hjelmn
Copy link
Member

@hjelmn hjelmn commented Oct 7, 2014

before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.

before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.
@jsquyres jsquyres added this to the v1.8.4 milestone Oct 8, 2014
@jsquyres
Copy link
Member

jsquyres commented Oct 8, 2014

@hjelmn Who did you want to review this PR?

@hjelmn
Copy link
Member Author

hjelmn commented Oct 9, 2014

Master hash open-mpi/ompi@eed7b45

@hjelmn
Copy link
Member Author

hjelmn commented Oct 9, 2014

@jsquyres Not many besides myself know the OSC code. Either you or dave could probably review it. I added the reproducer in ompi-tests. c_post_start

@jsquyres
Copy link
Member

Nathan: use the "open-mpi/ompi@[hash]" syntax to get appropriate cross-linking in the Github web UI.

Thanks!

@goodell
Copy link
Member

goodell commented Oct 10, 2014

I took a quick look at this. I didn't see any obvious coding mistakes or anything like that, but I don't really have time to grok the one-sided impl again to give this a really thorough review.

@goodell
Copy link
Member

goodell commented Oct 10, 2014

Actually, I did spot one issue that isn't in this commit/PR directly. The active_incoming_frag_count and similar fields are all int32_t types that are intended to overflow. But signed overflow in C is undefined. I think you should either manually implement the overflow logic or convert these types to unsigned.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 22, 2014

Was after a minimal change but I see no problem with also addressing Dave's comments. I will fix in master then refile this CMR.

@hjelmn
Copy link
Member Author

hjelmn commented Oct 22, 2014

Closing this so I can refactor my branch and refile.

@hjelmn hjelmn closed this Oct 22, 2014
mike-dubman added a commit to mike-dubman/ompi-release that referenced this pull request Nov 13, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants