Skip to content

Conversation

@mengluy0125
Copy link
Contributor

@mengluy0125 mengluy0125 commented Aug 2, 2024

Summary: We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:

unit test

CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes

Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

benchmark

CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697

P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

Differential Revision: D60411560

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132542

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2af6de8 with merge base 1954bfa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Aug 2, 2024
Summary:
Pull Request resolved: pytorch#132542

We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:
# unit test

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes
```
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB  Down: 728KiB  (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```
P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

Differential Revision: D60411560
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Aug 4, 2024
Summary:
Pull Request resolved: pytorch#132542

We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:
# unit test

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes
```
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB  Down: 728KiB  (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```
P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

Differential Revision: D60411560
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

pytorch-bot bot pushed a commit that referenced this pull request Aug 6, 2024
Summary:
Pull Request resolved: #132542

We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:
# unit test

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes
```
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB  Down: 728KiB  (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```
P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

Differential Revision: D60411560
mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Aug 6, 2024
Summary:
Pull Request resolved: pytorch#132542

We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:
# unit test

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes
```
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB  Down: 728KiB  (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```
P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

Differential Revision: D60411560
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

Summary:
Pull Request resolved: pytorch#132542

We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation

Test Plan:
# unit test

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes
```
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB  Down: 728KiB  (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```
P1503698962

before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718

# numerical check and e2e test
see in D60750275

Differential Revision: D60411560
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 6, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60411560

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants