[core] Add GPU support for MPI #41349

fishbone · 2023-11-23T00:02:45Z

Why are these changes needed?

By default, each mpi worker will see all GPUs on the node. This will cause issue when the num_gpus is given for tasks or actors, because their view of the GPU visible is different.

This PR fixed it. It requires the user to call mpi_init inside the function before any operations.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

rkooo567

I'd like to understand why we should require users to call mpi_init. I feel like it should be possible for us to just call it ourselves under the hood?

I think we should choose one of them instead;

raise an exception if mpi_init is not called
automatically call mpi_init all the time

I'd like to understand why 2 is not possible here. What's the reason why we can't just call mpi_init when rank is 0?

fishbone · 2023-11-27T19:20:24Z

I'd like to understand why we should require users to call mpi_init. I feel like it should be possible for us to just call it ourselves under the hood?

I think we should choose one of them instead;

raise an exception if mpi_init is not called

automatically call mpi_init all the time

I'd like to understand why 2 is not possible here. What's the reason why we can't just call mpi_init when rank is 0?

Hi Sang, thanks for helping reviewing this.

is not possible because GPU allocation happens when you start to run the function. And runtime env can't get the actual function, so we can't wrap it.

For example,

@ray.remote(runtime_env={...})
de f():
   pass

So two way to do it: wrapper f inside runtime env and call it. Or, we do it in ray core's code. but both don't fit.

raise an exception if mpi_init is not called

I think this is also hard. If this can be done, option 1) should be simple. The reason is the same as above.

rkooo567 · 2023-11-28T11:38:06Z

hmm what about we always pass special env var (RAY_MPI_WORKER=1) and invoke this method whenever this env var is set in default_worker.py?

fishbone · 2023-11-29T22:56:09Z

@rkooo567 it's not about the default worker. The problem here is that the GPU is set in the runtime when you execute the task. So the only way to do is to wrapper the actual function.

In this way, we need to update @ray.remote implementation which I'm not sure whether it's the right implementation.

If we shall do this, then the flow is like:

Setup the special OS env when launching ray worker
Always wrap the actual function and check the OS env and call mpi init.

In this way, the user doesn't need to do anything. But the downside is that the implementation is not clean.

The best way to do this is to make runtime env able to rewrite the function. But the difficulity is that runtime env can also be cluster layer and doesn't fit into the scenario here. So in this way, we'll have runtime env specific for functions, which might also be good.

I think we can do this, make this experimental in the documentation. If people complaint, and we see places where wrapper the function in runtime env is useful, we shall update it.

fix

af39dd3

fishbone assigned rkooo567 Nov 23, 2023

fix

a878293

rkooo567 reviewed Nov 27, 2023

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 27, 2023

:Merge remote-tracking branch 'upstream/master' into mpi-gpu

4b885cf

fishbone removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Nov 30, 2023

jjyao approved these changes Dec 1, 2023

View reviewed changes

rkooo567 approved these changes Dec 1, 2023

View reviewed changes

fishbone merged commit c1a7100 into ray-project:master Dec 1, 2023
16 of 17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Add GPU support for MPI #41349

[core] Add GPU support for MPI #41349

fishbone commented Nov 23, 2023

rkooo567 left a comment

fishbone commented Nov 27, 2023

rkooo567 commented Nov 28, 2023

fishbone commented Nov 29, 2023 •

edited

Loading

[core] Add GPU support for MPI #41349

[core] Add GPU support for MPI #41349

Conversation

fishbone commented Nov 23, 2023

Why are these changes needed?

Related issue number

Checks

rkooo567 left a comment

Choose a reason for hiding this comment

fishbone commented Nov 27, 2023

rkooo567 commented Nov 28, 2023

fishbone commented Nov 29, 2023 • edited Loading

fishbone commented Nov 29, 2023 •

edited

Loading