Skip to content

Conversation

wanchaol
Copy link
Collaborator

@wanchaol wanchaol commented Nov 13, 2023

Stack from ghstack (oldest at bottom):

as titled, built on top of the work @wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @fduwjj @wz337 @tianyu-l @wconstab @yf225 @kiukchung @d4l3k @LucasLLC

as titled, built on top of the work @wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 13, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/113547

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8132ed0 with merge base b3a76cc (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wanchaol added a commit that referenced this pull request Nov 13, 2023
as titled, built on top of the work wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

ghstack-source-id: 15cdbca
Pull Request resolved: #113547
@wanchaol wanchaol requested a review from wz337 November 13, 2023 22:55
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Ready to accept when these 2 comments get addressed.

as titled, built on top of the work wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu

[ghstack-poisoned]
as titled, built on top of the work wz337 enabled, this could save some
runtime CPU time to recreate DTensor parameters with correct
shape/stride, and avoid issues when un-even sharding parameters

cc H-Huang awgu kwen2501 fegin fduwjj wz337 wconstab mrshenli zhaojuanmao rohan-varma kiukchung d4l3k lucasllc XilunWu

[ghstack-poisoned]
@wanchaol wanchaol requested a review from XilunWu November 14, 2023 00:58
@wanchaol wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 14, 2023
Copy link
Contributor

@XilunWu XilunWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x bded860e6865885f81d0b64336cafa2ea0e246a5 returned non-zero exit code 1

Auto-merging torch/distributed/_tensor/api.py
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

Otherwise, please use 'git cherry-pick --skip'
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

You are currently cherry-picking commit bded860e686.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

nothing to commit, working tree clean
Details for Dev Infra team Raised by workflow job

@wanchaol
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x bded860e6865885f81d0b64336cafa2ea0e246a5 returned non-zero exit code 1

Auto-merging torch/distributed/_tensor/api.py
The previous cherry-pick is now empty, possibly due to conflict resolution.
If you wish to commit it anyway, use:

    git commit --allow-empty

Otherwise, please use 'git cherry-pick --skip'
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

You are currently cherry-picking commit bded860e686.
  (all conflicts fixed: run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

nothing to commit, working tree clean
Details for Dev Infra team Raised by workflow job

Copy link

pytorch-bot bot commented Nov 15, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@wanchaol
Copy link
Collaborator Author

@pytorchbot --help

Copy link

pytorch-bot bot commented Nov 15, 2023

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. 

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR

positional arguments:
  labels  Labels to add to given Pull Request

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

@wanchaol
Copy link
Collaborator Author

@pytorchbot revert -m "broken compile test" -c "ghfirst"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@wanchaol your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Nov 15, 2023
This reverts commit 9337245.

Reverted #113547 on behalf of https://github.com/wanchaol due to broken compile test ([comment](#113547 (comment)))
Copy link
Contributor

@wz337 wz337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@albanD albanD added oncall: distributed Add this issue/PR to distributed oncall triage queue and removed module: distributed labels Dec 8, 2023
wanchaol added a commit that referenced this pull request Jan 12, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 12, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 12, 2024
…nflatten"

Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 12, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 12, 2024
…nflatten"

Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 12, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 13, 2024
…nsor unflatten"

Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 13, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Jan 13, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

ghstack-source-id: a3b704e
Pull Request resolved: #117340
pytorchmergebot pushed a commit that referenced this pull request Jan 13, 2024
Reland of #113547 as the previous
PR reverted bc of torch.compile symbolic shape issue. Since we now disabled tensor
unflatten with dynamo.disable, we should not hit this issue again

Pull Request resolved: #117340
Approved by: https://github.com/Skylion007
ghstack dependencies: #117336
@wanchaol
Copy link
Collaborator Author

this relanded as a separate PR, closing this

@wanchaol wanchaol closed this Jan 17, 2024
@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/392/head branch January 20, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (dtensor) release notes category release notes: distributed (fsdp) release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants