Skip to content

Conversation

mengluy0125
Copy link
Contributor

@mengluy0125 mengluy0125 commented Nov 21, 2024

Summary:
We find another corner case in the merge splits, where the first split node does not have consecutive getitem indices, we need to skip such cases.

{F1964255863}

Test Plan:

local reproduce

buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split  --flow_id 666002198 2>&1 | tee ~/cmf.txt

P1683429791

Differential Revision: D66275387

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141194

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 5 Unrelated Failures

As of commit c36b50e with merge base 0155a11 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66275387

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66275387

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66275387

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66275387

Summary:

The merge splits has a strong assumption that the next split node will always use all the split getitems, which are not always true and we have seen many such cases.

Test Plan:
# local reproduce
```
buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split  --flow_id 666002198 2>&1 | tee ~/cmf.txt
```

P1683429791

# E2E
before fix
f666002198

after fix
f666722020

Differential Revision: D66275387
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D66275387

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 22, 2024
@mengluy0125
Copy link
Contributor Author

@pytorchbot merge -i

@mengluy0125
Copy link
Contributor Author

@pytorchbot merge -f

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 22, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot merge: error: argument -f/--force: expected one argument

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Try @pytorchbot --help for more info.

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 jobs have failed, first few of them are: inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2), inductor-rocm / rocm6.2-py3.10-inductor / test (inductor, 2, 2, linux.rocm.gpu.2)

Details for Dev Infra team Raised by workflow job

@izaitsevfb
Copy link
Contributor

@pytorchbot merge -i

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Summary:
We find another corner case in the merge splits, where the first split node does not have consecutive getitem indices, we need to skip such cases.

{F1964255863}

Test Plan:
# local reproduce
```
buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split  --flow_id 666002198 2>&1 | tee ~/cmf.txt
```

P1683429791

Differential Revision: D66275387

Pull Request resolved: pytorch#141194
Approved by: https://github.com/jackiexu1992
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants