Skip to content

Conversation

kumpera
Copy link
Contributor

@kumpera kumpera commented Mar 2, 2023

_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/95897

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 82e9af2:

BROKEN TRUNK - The following jobs failed but were present on the merge base 004bcff:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@kumpera kumpera requested a review from soulitzer as a code owner March 2, 2023 19:05
@facebook-github-bot
Copy link
Contributor

@kumpera has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@kumpera has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@wconstab
Copy link
Contributor

wconstab commented Mar 3, 2023

oops @wanchao already landed reduce_scatter, so that op also needs the derivatives.yml and codegen fix.

you can do it in a separate PR if its easier to land this first

@fegin fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 3, 2023
Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@kumpera kumpera added topic: bug fixes topic category module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category labels Mar 3, 2023
@kumpera
Copy link
Contributor Author

kumpera commented Mar 3, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Rodrigo Kumpera added 2 commits March 3, 2023 20:08
_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce
@facebook-github-bot
Copy link
Contributor

@kumpera has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kumpera
Copy link
Contributor Author

kumpera commented Mar 6, 2023

@pytorchmergebot merge -f "the inductor failure is unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce

Pull Request resolved: pytorch/pytorch#95897
Approved by: https://github.com/wconstab, https://github.com/fegin
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce

Pull Request resolved: pytorch/pytorch#95897
Approved by: https://github.com/wconstab, https://github.com/fegin
ydwu4 added a commit to ydwu4/pytorch that referenced this pull request Mar 13, 2023
_functional_collectives.py: Ensure we always wait all collectives.
derivatives.yaml: mark all_reduce as non differentiable
gen_variable_type.py: Add all_reduce to DONT_ENFORCE_TENSOR_IMPL_USE_COUNT
common_dtensor.py: replace dist.barrier with all_reduce

Pull Request resolved: pytorch#95897
Approved by: https://github.com/wconstab, https://github.com/fegin
pytorchmergebot pushed a commit that referenced this pull request Mar 13, 2023
…arts of the codebase (#96460)

Recent master breakage on focal and bionic PTD tests since we switched to all_reduce in #95897
Pull Request resolved: #96460
Approved by: https://github.com/fegin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dtensor distributed tensor tag release notes: distributed (dtensor) release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants