-
Notifications
You must be signed in to change notification settings - Fork 932
Bring the ofi/rml component online #3836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Does this fix the orte_no_op over ofi ? |
|
Yes - I've been launching MPI jobs with it just fine. One of the builds in Jenkins is failing, though, so I'm trying to track that down |
|
hmmm...I can't make it fail. I wonder if this is something peculiar to suse? @bwbarrett Is there any way to get more debug info out of that build? I've tried building with the same config, forcing various components off, etc. - to no avail. However, I don't have a suse build. |
…for the daemons. Cleanup the current confusion over how connection info gets created and passed to make it all flow thru the opal/pmix "put/get" operations. Update the PMIx code to latest master to pickup some required behaviors. Remove the no-longer-required get_contact_info and set_contact_info from the RML layer. Add an MCA param to allow the ofi/rml component to route messages if desired. This is mainly for experimentation at this point as we aren't sure if routing wi ll be beneficial at large scales. Leave it "off" by default. Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
| PMIX_DESTRUCT(&pbkt); | ||
| return rc; | ||
| } | ||
| <<<<<<< c632784ca34c467055eadcb4efe84a25e5a3911b:opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| } | ||
|
|
||
| <<<<<<< c632784ca34c467055eadcb4efe84a25e5a3911b:opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhc54 - looks like unresolved merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah, looks like i just missed removing the last line of the conflict flag - i can clean that up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw multiple of them and there was clearly an unresolved merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that last part is true, but no matter - it runs just fine and passed an MTT scan, so I think things resolved correctly. Anyway, I'm updating from tarball now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/rhc54/ompi/blob/f7e8780a42eb512ed5c54715471168814ef63383/opal/mca/pmix/pmix2x/pmix/src/mca/gds/ds12/gds_dstore.c#L2811:L2829
I'm 100% that this will not compile!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relax dude - do you not see the .pmix_ignore in that component? It is prevented from compiling due to the reported problems in the dstore. I am removing the ignore now as it seems functional, albeit the memory footprint problem remains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What memory footprint problem? Is there any issue on that?
I think that @karasevb have fixed (at least) one of the possible causes he discovered when keys was sent in the message from the server to the client even if the dstore was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the issue he filed this morning in PMIx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I thought it's something else.
Still only used if rml_ofi_desired=1