-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous Integration of an existing MPI application with Pytorch MPI Backend #33943
Comments
I would agree with your diagnosis. Do you think you'd be able to submit a PR fixing this? |
Sure, I will do that. Thanks a lot for checking this. |
I am following the contributor's guide. In my understanding, I may need to modify some of the places in the code where MPI has been initialized without checking whether it is initialized or not. CPP TestsIs there a developer guide for running cpp test cases? For instance the following test, https://github.com/pytorch/pytorch/blob/master/caffe2/mpi/mpi_test.cc Python TestsI want to set up a few test cases for the distributed module python3 test/distributed/test_distributed.py I guess this should be the best place to test the code base, Is this the correct place to work on the test cases? |
@vibhatha The Python location looks reasonable. I wouldn't worry too much about the C++ tests; they'll run when you open a PR on our CI, we can talk about how to run them if they're specifically failing. (BTW, the caffe2 tests aren't applicable, don't worry about those). |
Great :) I will look into the code and evaluate the possible changes. Keep you posted :) |
Has there been an update on this issue by any chance? I may be running into a similar situation. |
I couldn't work on this when I reported it, but I think I can allocate some time for this now. |
馃悰 Bug
Using Pytorch MPI backend with an existing MPI application. I am trying to use a data pre-processing logic done with MPI (using mpi4py) and feed the processed data to Pytorch.
Here I found a problem in doing this using the distributed packaged with the MPI backend.
To Reproduce
In order to reproduce the error, please run the following code. In simply, by just re-running the following code, the error can be reproduced.
Error is the following,
Expected behaviour
So, what I can guess here what happens it without checking whether MPI is already initialized, the dist backend is initializing MPI. If this can be done with checking whether MPI is initialized, I think this error won't take place.
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
conda
,pip
, source): Installed from sourceAdditional context
The text was updated successfully, but these errors were encountered: