-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Document change for DDP enabled on Windows platform #45392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document change for DDP enabled on Windows platform #45392
Conversation
Update my fork from Pytorch repo
Rebase from pytorch/pytorch master
Rebase latest code from master branch of PyTorch repo
Rebase code
Rebase to latest commits
Rebase to latest commits
Rebase to latest commit
Rebase to latest commits
Rebase to latest commits
Rebase to latest commits
Rebase latest commits
Rebase latest commits
💊 CI failures summary and remediationsAs of commit 30ad222 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 7 times. |
Codecov Report
@@ Coverage Diff @@
## master #45392 +/- ##
=======================================
Coverage 68.15% 68.15%
=======================================
Files 396 396
Lines 51133 51133
=======================================
+ Hits 34849 34850 +1
+ Misses 16284 16283 -1
Continue to review full report at Codecov.
|
```bash | ||
# Add these packages and set libuv_ROOT environment variable if torch.distributed is needed | ||
conda install -y -q -c rdonnelly libuv | ||
set libuv_ROOT={conda active env location}\Library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why MacOS does not require this? MacOS requires install pkg-config instead. Can we do the same for Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Gloo PR #45392, I first tried with pkg-config, but Windows doesn't come with pkg-config, later on, base on the code review suggestion, I switched to use find_library and find_file to work with libuv_ROOT. So libuv_ROOT need to be set as a environment variable, or defined in cmake command line as -Dlibuv_ROOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but Windows doesn't come with pkg-config, later on, base on the code review suggestion
Does this mean even if we do conda install pkg-config libuv
, we still cannot find libuv on Windows automatically?
README.md
Outdated
|
||
On Windows | ||
```bash | ||
# Add these packages and set libuv_ROOT environment variable if torch.distributed is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add one line:
# Distributed package support on Windows is a prototype feature and is subject to changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/source/distributed.rst
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
PyTorch distributed currently only supports Linux. By default, the Gloo and NCCL backends | ||
PyTorch distributed currently only supports Linux and Windows. By default for Linux, the Gloo and NCCL backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"PyTorch distributed currently only supports Linux and Windows." -> "PyTorch distributed package supports Linux (stable), MacOS (stable), and Windows (prototype)."
docs/source/distributed.rst
Outdated
optional backend that can only be included if you build PyTorch from source. (e.g. | ||
building PyTorch on a host that has MPI installed.) | ||
|
||
.. note :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use .. warning::
instead of .. note::
docs/source/distributed.rst
Outdated
building PyTorch on a host that has MPI installed.) | ||
|
||
.. note :: | ||
On Windows, Pytorch distributed currently only supports Gloo backend, and FileStore as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of PyTorch v1.7, Windows support for the distributed package only covers collective
communications with Gloo backend, `FileStore`, and `DistributedDataParallel`. Therefore,
the `init_method` argument in :func:`init_process_group` must point to a file. This
works for both local and shared file systems: local file system example
``init_method="file:///d:/tmp/some_file"``, shared file system example
``init_method="file://////{machine_name}/{share_folder_name}/some_file"``.
Similarly, if you directly pass in a `store` argument, it must be a ``FileStore``
instance.
``torch.distributed`` is available on Linux, MacOS and Windows. Set | ||
``USE_DISTRIBUTED=1`` to enable it when building PyTorch from source. | ||
Currently, the default value is ``USE_DISTRIBUTED=1`` for Linux and | ||
Currently, the default value is ``USE_DISTRIBUTED=1`` for Linux and Windows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make USE_DISTRIBUTED=0
by default, and let users set it to 0 when necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Document change for DDP enabled on Windows platform