[PP] Forward only schedule #132177

H-Huang · 2024-07-30T21:02:22Z

Stack from ghstack (oldest at bottom):

-> [PP] Forward only schedule #132177

python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only

cc @XilunWu @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

[ghstack-poisoned]

pytorch-bot · 2024-07-30T21:02:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132177

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 15931cf with merge base 19db4f6 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: c35bcc1 Pull Request resolved: #132177

lessw2020

looks great, thanks a ton for adding!

minor q on future test addition
linting errors are easy fixes

lessw2020 · 2024-07-31T18:28:51Z

test/distributed/pipelining/test_schedule_multiproc.py

+
+        # Run
+        num_iters = 20
+        for _ in range(num_iters):


generic q - is there an easy way to verify that things looped successfully (i.e. seed value for rand and then confirm expected final values after 20 iters)?
right now the test confirms it ran w/o error, but maybe future test should also verify an expected output?

Thanks! Added a validation check at the end of the test to compare the pipelined output with a reference model.

`python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only` cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: 8bbec63 Pull Request resolved: #132177

H-Huang · 2024-07-31T22:50:57Z

@pytorchbot merge

pytorchmergebot · 2024-07-31T22:53:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-31T23:08:44Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

`python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only` cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

ghstack-source-id: f6b32f4 Pull Request resolved: #132177

H-Huang · 2024-08-01T13:47:17Z

@pytorchbot merge

pytorchmergebot · 2024-08-01T13:49:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[draft] [PP] Forward only schedule

a69bb22

[ghstack-poisoned]

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue labels Jul 30, 2024

H-Huang marked this pull request as draft July 30, 2024 21:02

H-Huang requested a review from lessw2020 July 30, 2024 21:03

Update on "[draft] [PP] Forward only schedule"

4c60a87

python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

H-Huang added a commit that referenced this pull request Jul 30, 2024

[draft] [PP] Forward only schedule

f731fde

ghstack-source-id: c35bcc1 Pull Request resolved: #132177

lessw2020 approved these changes Jul 31, 2024

View reviewed changes

H-Huang marked this pull request as ready for review July 31, 2024 21:51

Update on "[draft] [PP] Forward only schedule"

0b9a246

`python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only` cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

H-Huang changed the title ~~[draft] [PP] Forward only schedule~~ [PP] Forward only schedule Jul 31, 2024

H-Huang added release notes: distributed (pipeline) release notes category module: pipelining Pipeline Parallelism labels Jul 31, 2024

Update on "[PP] Forward only schedule"

a869ded

`python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only` cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

H-Huang added a commit that referenced this pull request Jul 31, 2024

[PP] Forward only schedule

7db6e67

ghstack-source-id: 8bbec63 Pull Request resolved: #132177

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 31, 2024

pytorchmergebot added the merging label Jul 31, 2024

pytorchmergebot removed the merging label Jul 31, 2024

Update on "[PP] Forward only schedule"

15931cf

`python test/distributed/pipelining/test_schedule_multiproc.py -k test_forward_only` cc XilunWu awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]

H-Huang added a commit that referenced this pull request Aug 1, 2024

[PP] Forward only schedule

ac4465f

ghstack-source-id: f6b32f4 Pull Request resolved: #132177

pytorchmergebot added the merging label Aug 1, 2024

pytorchmergebot added the Merged label Aug 1, 2024

pytorchmergebot closed this in c59f3ff Aug 1, 2024

pytorchmergebot removed the merging label Aug 1, 2024

github-actions bot deleted the gh/H-Huang/134/head branch September 1, 2024 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PP] Forward only schedule #132177

[PP] Forward only schedule #132177

Uh oh!

H-Huang commented Jul 30, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 30, 2024 •

edited

Loading

Uh oh!

lessw2020 left a comment

Uh oh!

lessw2020 Jul 31, 2024

Uh oh!

H-Huang Jul 31, 2024

Uh oh!

H-Huang commented Jul 31, 2024

Uh oh!

pytorchmergebot commented Jul 31, 2024

Uh oh!

pytorchmergebot commented Jul 31, 2024

Uh oh!

H-Huang commented Aug 1, 2024

Uh oh!

pytorchmergebot commented Aug 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PP] Forward only schedule #132177

[PP] Forward only schedule #132177

Uh oh!

Conversation

H-Huang commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132177

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

lessw2020 Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

H-Huang Jul 31, 2024

Choose a reason for hiding this comment

Uh oh!

H-Huang commented Jul 31, 2024

Uh oh!

pytorchmergebot commented Jul 31, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 31, 2024

Merge failed

Uh oh!

H-Huang commented Aug 1, 2024

Uh oh!

pytorchmergebot commented Aug 1, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

H-Huang commented Jul 30, 2024 •

edited

Loading

pytorch-bot bot commented Jul 30, 2024 •

edited

Loading