Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Add MPI support on Ray cluster #40917
[core] Add MPI support on Ray cluster #40917
Changes from 11 commits
0da2538
40e2346
356fda7
c6806ad
3978df9
51d523d
14d03ae
0d92333
789a4e2
43ae02a
e471637
7727229
183dfbb
032a5fb
2bdf42e
7005fa3
c9611da
97ccd24
a6831d1
02afd02
3852f9c
271e49f
f780c05
a784b57
5002815
a492234
2807165
7268695
8844056
f4abd66
9044563
22eabbd
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we kill proc here? Or does it guarantee the proc is killed? (if so can you comment here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify context is in runtime env agent I think. Exception should be good? I can test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the main function? or the parser?
This will will be used as the mpi entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main function. I don't see any main function from other plugins though. Maybe it should be a part of mpi_worker.py not here? (this means the function is executed when you import it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPIRUN is like a fork and the rest plugin doesn't have this.
The function won't execute when import since it checks main. If you import,
__name__
won't be__main__
.This piece of code is part of the plugin, that's why I put it here and it's simple.
But open to move if you insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see. I think this is a bit confusing when I read code first time. I prefer to move it to a separate file (something like mpi_start.py), but it is also okay if you add comments in details in the main block. E.g., "the plugin starts a subprocess that runs this main method. It is not executed as a part of normal plugin" or sth like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about moving it to
python/ray/_private/workers/mpi_workers.py
but feel it just moves the code to far away from the place where it's used. I think split it and move it to the other file is better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update a doc, or consider
_mpi
!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll update the doc. i think it's a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does it find the file? From the current directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be clear about it in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's from the working dir. I'll add the doc.