Skip to content

Conversation

gunandrose4u
Copy link
Contributor

Document change for DDP enabled on Windows platform

gunandrose4u and others added 13 commits August 17, 2020 11:27
Update my fork from Pytorch repo
Rebase from pytorch/pytorch master
Rebase latest code from master branch of PyTorch repo
Rebase to latest commits
Rebase to latest commits
Rebase to latest commit
Rebase to latest commits
Rebase to latest commits
Rebase to latest commits
@dr-ci
Copy link

dr-ci bot commented Sep 27, 2020

💊 CI failures summary and remediations

As of commit 30ad222 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@codecov
Copy link

codecov bot commented Sep 27, 2020

Codecov Report

Merging #45392 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #45392   +/-   ##
=======================================
  Coverage   68.15%   68.15%           
=======================================
  Files         396      396           
  Lines       51133    51133           
=======================================
+ Hits        34849    34850    +1     
+ Misses      16284    16283    -1     
Impacted Files Coverage Δ
torch/distributed/__init__.py 87.50% <ø> (ø)
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bc5710f...30ad222. Read the comment docs.

```bash
# Add these packages and set libuv_ROOT environment variable if torch.distributed is needed
conda install -y -q -c rdonnelly libuv
set libuv_ROOT={conda active env location}\Library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why MacOS does not require this? MacOS requires install pkg-config instead. Can we do the same for Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Gloo PR #45392, I first tried with pkg-config, but Windows doesn't come with pkg-config, later on, base on the code review suggestion, I switched to use find_library and find_file to work with libuv_ROOT. So libuv_ROOT need to be set as a environment variable, or defined in cmake command line as -Dlibuv_ROOT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but Windows doesn't come with pkg-config, later on, base on the code review suggestion

Does this mean even if we do conda install pkg-config libuv, we still cannot find libuv on Windows automatically?

README.md Outdated

On Windows
```bash
# Add these packages and set libuv_ROOT environment variable if torch.distributed is needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add one line:

# Distributed package support on Windows is a prototype feature and is subject to changes. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jspisak @gchanan is it OK to mention a prototype feature in master README.md?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PyTorch distributed currently only supports Linux. By default, the Gloo and NCCL backends
PyTorch distributed currently only supports Linux and Windows. By default for Linux, the Gloo and NCCL backends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"PyTorch distributed currently only supports Linux and Windows." -> "PyTorch distributed package supports Linux (stable), MacOS (stable), and Windows (prototype)."

optional backend that can only be included if you build PyTorch from source. (e.g.
building PyTorch on a host that has MPI installed.)

.. note ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use .. warning:: instead of .. note::

building PyTorch on a host that has MPI installed.)

.. note ::
On Windows, Pytorch distributed currently only supports Gloo backend, and FileStore as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of PyTorch v1.7, Windows support for the distributed package only covers collective 
communications with Gloo backend, `FileStore`, and `DistributedDataParallel`.  Therefore,
 the `init_method` argument in :func:`init_process_group` must point to a file. This 
works for both local and shared file systems: local file system example 
``init_method="file:///d:/tmp/some_file"``, shared file system example 
``init_method="file://////{machine_name}/{share_folder_name}/some_file"``.
Similarly, if you directly pass in a `store` argument, it must be a ``FileStore``
instance. 

``torch.distributed`` is available on Linux, MacOS and Windows. Set
``USE_DISTRIBUTED=1`` to enable it when building PyTorch from source.
Currently, the default value is ``USE_DISTRIBUTED=1`` for Linux and
Currently, the default value is ``USE_DISTRIBUTED=1`` for Linux and Windows,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should make USE_DISTRIBUTED=0 by default, and let users set it to 0 when necessary?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrshenli has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in 47debdc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants