Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allgather_base as per our discussion re: ProcessGroup interface. #31892

Closed
wants to merge 1 commit into from

Conversation

agolynski
Copy link
Contributor

Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@kostmo
Copy link
Member

kostmo commented Jan 6, 2020

💊 CircleCI build failures summary and remediations

As of commit 287e6a1:

None of the build failures appear to be your fault.

  • 1/1 broken upstream at merge base 3d01e3d since Jan 15

    Please rebase on the viable/strict branch (expand for instructions)

    If your commit is newer than viable/strict, you can try basing on an older, stable commit:

    git fetch origin viable/strict
    git rebase --onto viable/strict $(git merge-base origin/master HEAD)
    

    If your commit is older than viable/strict:

    git fetch origin viable/strict
    git rebase viable/strict
    

    Check out the recency history of this "viable master" tracking branch.

Detailed failure analysis

One may explore the probable reasons each build failed interactively on the Dr. CI website.

🚧 1 upstream failure recognized by patterns:

These builds matched patterns, but were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 24 times.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

torch/lib/c10d/ProcessGroup.cpp Outdated Show resolved Hide resolved
torch/lib/c10d/ProcessGroup.hpp Outdated Show resolved Hide resolved
@@ -128,6 +128,15 @@ class ProcessGroup {
std::vector<at::Tensor>& inputTensors,
const AllgatherOptions& opts = AllgatherOptions()) = 0;

// Q: Should we move it to protected or annotate with _ since it can be
// confusing to use and is intended for internal use only?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could expose it for power users. I think it's fine to leave public. Btw, didn't we call it single before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agreed on _base during our meeting with @mrshenli

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pietern this looks OK to me. Is this sufficient to unblock #28068?

torch/lib/c10d/ProcessGroup.hpp Outdated Show resolved Hide resolved
@pietern
Copy link
Contributor

pietern commented Jan 9, 2020

@mrshenli We need to remove the coalesced variant first as well.

@mrshenli
Copy link
Contributor

mrshenli commented Jan 9, 2020

Try to understand the minimum requirements to unblock #28068

  1. The allgather_coalesced API is already in the public API (but not in doc yet), is it OK to remove it? Or do we have to replace it with an implementation in c10d/comm.h/cpp?
  2. I guess this also applies to the allreduce_coalesced API. Do we need to replace that with allreduce_base too?

@pietern @agolynski

@agolynski
Copy link
Contributor Author

Discussed with @mrshenli: If we just remove allgather_coalesced it'll break python API right now. How about we leave it and add comments not to touch it: "deprecated. do not use it. do not implement it".

Let's do similar exercise with allreduce_coalesced in a follow up?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

@agolynski
Copy link
Contributor Author

This pull request was exported from Phabricator. Differential Revision: [D19290739](https://phabricator.intern.facebook.com/D19290739)

Fixed broken build

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is benign. Good to go. Thanks, @agolynski!

…ytorch#31892)

Summary:
Pull Request resolved: pytorch#31892

Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

fbshipit-source-id: ec91720ef4f1f6aa445836d8153324ae769cdfd1
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D19290739

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agolynski has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@agolynski merged this pull request in 74621ca.

wuhuikx pushed a commit to wuhuikx/pytorch that referenced this pull request Jan 30, 2020
…ytorch#31892)

Summary:
Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.
Pull Request resolved: pytorch#31892

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

Pulled By: agolynski

fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
…ytorch#31892)

Summary:
Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up.
Pull Request resolved: pytorch#31892

Test Plan: No functional changes, no tests yet.

Differential Revision: D19290739

Pulled By: agolynski

fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants