Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Conversation

@derbeyn
Copy link

@derbeyn derbeyn commented Mar 10, 2016

If tuned is unselected, it's up to the basic module to handle the collectives.
The problem is that the request arrays are not consistently used in that module.
Cherry picking three commits fixes the issue.

bosilca added 3 commits March 10, 2016 13:38
module_data to hold one with the largest necessary size. This array is
only allocated when needed, and it is released upon communicator
destruction.

(cherry picked from commit a324602)
number of requests.

Remove unnecessary fields.We don't need these fields.

(cherry picked from commit 01b32ca)
in the base).
Correctly deal with persistent requests (they must be always freed when
they are stored in the request array associated with the communicator).
Always use MPI_STATUS_IGNORE for single request waiting functions.

(cherry picked from commit 88492a1)
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1423/ for details.

@jsquyres
Copy link
Member

@derbeyn Please mark a milestone and appropriate labels for this issue. See https://github.com/open-mpi/ompi/wiki/OmpiReleaseBotCommands. If this issue is a must-fix for v2.0.0, please add the "blocker" label to this issue.

@derbeyn These commits look like they came from @bosilca; are you you saying that you have reviewed them? (per our requirement that all PRs to release branches must be reviewed)

@derbeyn
Copy link
Author

derbeyn commented Mar 11, 2016

For the labels, I didn't know I had to add them my self.
Didn't review the code. I'll do it and come back to you.

@jsquyres
Copy link
Member

@derbeyn Thanks!

@rhc54 rhc54 added the bug label Mar 15, 2016
@rhc54 rhc54 added this to the v2.0.0 milestone Mar 15, 2016
@derbeyn
Copy link
Author

derbeyn commented Mar 21, 2016

Jeff, I just finished reviewing the 3 commits, and I have some remarks.
Sorry for the stupid question, but what should I do now? send my review to @bosilca?

@jsquyres
Copy link
Member

@derbeyn Not a stupid question at all! Click on the "Files changed" tab on this PR, and click on the "+" sign on any line number where you want to ask a question. It'll pop open a text box where you can type your comment / question / etc. Feel free to put "@bosilca" in there and use like 1st person address -- it'll guarantee to send him a mail with any comment that contains @bosilca.

All your comments will also show up here on the main PR tab, too, with some code around it for context. It's a very handy way to comment on exactly the code you're talking about.

Make sense?

@jsquyres
Copy link
Member

BTW, I see that Github just recently added some changes to the "Files changed" tab -- you can view the changes per commit (e.g., there are 3 commits on this PR), and/or by file (vs. looking at the union of all changes / all diffs on this PR). So if it helps you break up your review to look by commit and/or by individual file, you can do so.

github-changes

@derbeyn
Copy link
Author

derbeyn commented Mar 21, 2016

Thx a lot!

ompi_request_t** requests;

requests = (ompi_request_t**)malloc( size * sizeof(ompi_request_t*) );
requests = coll_base_comm_get_reqs(module->base_data, size);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca (minor issue) Missing check on the pointer returned by coll_base_comm_get_reqs()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't have checks regarding the return from the coll_base_comm_get_reqs anywhere.

preq = coll_base_comm_get_reqs(base_module->base_data, size * 2);

if (i == rank) {
/* Copy the data into the temporary buffer */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca (line 77 above): no need to allocate size*2 requests, we are only posting 2 requests in each turn of the loop

@bosilca
Copy link
Member

bosilca commented Mar 23, 2016

@derbeyn you might want to cherry-pick open-mpi/ompi@4b38b6bd. It addresses most of your concerns, except the one where you suggest to move the check for NULL in the free_reqs.

@derbeyn
Copy link
Author

derbeyn commented Mar 24, 2016

@bosilca OK, Thanks

@bosilca
Copy link
Member

bosilca commented Mar 24, 2016

And also open-mpi/ompi@57eadb0d to fix an issue identified by Coverity.

@derbeyn
Copy link
Author

derbeyn commented Mar 25, 2016

Thanks!

@jsquyres
Copy link
Member

@derbeyn Could you add those 2 commits to this PR in the very near future? We're trying to release the next rc for v2.0.0.

Do you know how to add these 2 commits to this PR? You should be able to cherry-pick them from master (see the wiki) and then push them to your branch here. The PR will automatically update (because a PR always reflects the current state of a branch -- so if you add/change commits on that branch, they will automatically be reflected here on the PR).

@jsquyres
Copy link
Member

@derbeyn Any update, perchance?

This patch addresses most (if not all) @derbeyn concerns
expressed on open-mpi#1015. I added checks for the requests allocation
in all functions, ompi_coll_base_free_reqs is called with the
right number of requests, I removed the unnecessary basic_module_comm_t
and use the base_module_comm_t instead, I remove all uses of the
COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.

(cherry picked from commit 4b38b6b)
@derbeyn
Copy link
Author

derbeyn commented Mar 29, 2016

So sorry for the late answer! We had a long week-end for eastern, so I didn't connect.
Actually, I'll push only one commit. The 2nd one is not appropriate for 2.0.0: it fixes something that has not been added yet to this branch.
Now, I still have some issues for the last commit I cherry-picked. Actually, these issues are not critical since they are in error paths.

@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1483/ for details.


/* Initiate all send/recv to/from others. */
reqs = coll_base_comm_get_reqs(base_module->base_data, 2);
if( NULL == reqs ) { err = OMPI_ERR_OUT_OF_RESOURCE; line = __LINE__; goto error_hndl; }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bosilca
Why not just returning OMPI_ERR_OUT_OF_RESOURCE here?
If you don't, you should check reqs!=NULL before calling free_reqs() at error_hndl
(or insert a check inside the routine free_reqs() as proposed in the first review)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 reasons:

  • I prefer to have a single return statement for errors
  • to get access to the error message.

hjelmn and others added 2 commits March 31, 2016 08:17
Fix CID 1325868 (open-mpi#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 1325869 (open-mpi#1-2 of 2): Dereference after null check (FORWARD_NULL):

Here reqs can indeed be NULL. Added a check to
ompi_coll_base_free_reqs to prevent dereferencing NULL pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 2f4e532)
(cherry picked from commit 004c0cc)
@mellanox-github
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ompi-release-pr/1485/ for details.

@derbeyn
Copy link
Author

derbeyn commented Mar 31, 2016

@jsquyres can you please tell me 2 things:

    • where can I find the definitions for the labels, more precisely the pushed-back label that has been set on my PR.
    • where can I find some kind of flow chart describing the whole process: what label should I set on my PR to indicate it is actually ready
      Thx

@jsquyres
Copy link
Member

@derbeyn Here's all the labels on the ompi-release repo: https://github.com/open-mpi/ompi-release/labels (i.e., go to the github repo, go into the pull requests, and click on the "labels" button). We probably don't have a wiki page describing the process (probably should add that...).

Generally, a PR on ompi-release needs to be reviewed and RM approved, and then we'll merge it in. The "thumbs up" emoticon adds "reviewed" and removes "pushed-back". The thumbs down is the opposite (it adds "pushed-back" and removes "reviewed"). These labels are just a simple visual way for someone to scan the list of PRs in https://github.com/open-mpi/ompi-release/pulls and know what still needs to be done.

Loosely speaking:

  • reviewed: means someone reviewed the PR and is happy with it
  • pushed-back: means that someone reviewed the PR and was unhappy with it, and pushed it back to the PR submitter (i.e., it means the PR is not ready). Pushed back is removed when the submitter addresses the review comments and wants the PR reviewed again.
  • RM-Approved: is added by the release managers of a PR is approved to go in to the specific milestone on the PR.

Does that help?

@derbeyn
Copy link
Author

derbeyn commented Mar 31, 2016

Thanks a lot!

@derbeyn
Copy link
Author

derbeyn commented Mar 31, 2016

I'm OK with the review
👍

@jsquyres
Copy link
Member

@hppritcha Good to go

@hppritcha hppritcha merged commit e7f909e into open-mpi:v2.x Mar 31, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants