-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[PT2][Optimus] Add unbind_stack_to_cat_pass #132542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132542
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2af6de8 with merge base 1954bfa ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
7e85273 to
f4093f4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
f4093f4 to
43ed9b6
Compare
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
43ed9b6 to
5775b4e
Compare
Summary: Pull Request resolved: pytorch#132542 We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation Test Plan: # unit test ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125 Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264) Jobs completed: 29. Time elapsed: 5:19.8s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0 # benchmark ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697 ``` P1503698962 before and after graph transformation https://www.internalfb.com/intern/diffing/?paste_number=1504050718 Differential Revision: D60411560
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
5775b4e to
ad1445c
Compare
Summary: Pull Request resolved: pytorch#132542 We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation Test Plan: # unit test ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125 Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264) Jobs completed: 29. Time elapsed: 5:19.8s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0 # benchmark ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697 ``` P1503698962 before and after graph transformation https://www.internalfb.com/intern/diffing/?paste_number=1504050718 Differential Revision: D60411560
ad1445c to
8831088
Compare
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
Summary: Pull Request resolved: #132542 We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation Test Plan: # unit test ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125 Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264) Jobs completed: 29. Time elapsed: 5:19.8s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0 # benchmark ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697 ``` P1503698962 before and after graph transformation https://www.internalfb.com/intern/diffing/?paste_number=1504050718 Differential Revision: D60411560
8831088 to
780650e
Compare
Summary: Pull Request resolved: pytorch#132542 We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation Test Plan: # unit test ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125 Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264) Jobs completed: 29. Time elapsed: 5:19.8s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0 # benchmark ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697 ``` P1503698962 before and after graph transformation https://www.internalfb.com/intern/diffing/?paste_number=1504050718 Differential Revision: D60411560
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
780650e to
270e20c
Compare
Summary: Pull Request resolved: pytorch#132542 We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation Test Plan: # unit test ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 test //caffe2/test/inductor:split_cat_fx_passes ``` Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125 Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264) Jobs completed: 29. Time elapsed: 5:19.8s. Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4) Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0 # benchmark ``` CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697 ``` P1503698962 before and after graph transformation https://www.internalfb.com/intern/diffing/?paste_number=1504050718 # numerical check and e2e test see in D60750275 Differential Revision: D60411560
|
This pull request was exported from Phabricator. Differential Revision: D60411560 |
270e20c to
2af6de8
Compare
|
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: We observe the stack mpde can be transformed to cat node to elimiate split nodes, which could further enable the unbind cat optimization, thus we add a more advanced pattern to do the graph transformation
Test Plan:
unit test
Buck UI: https://www.internalfb.com/buck2/de6c1cda-3d74-4a30-8980-7b209b6fe5dc
Test UI: https://www.internalfb.com/intern/testinfra/testrun/12103424042268125
Network: Up: 485KiB Down: 728KiB (reSessionID-2f2c01c3-79bb-4e37-b5be-fb77ec09b264)
Jobs completed: 29. Time elapsed: 5:19.8s.
Cache hits: 0%. Commands: 4 (cached: 0, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0
benchmark
P1503698962
before and after graph transformation
https://www.internalfb.com/intern/diffing/?paste_number=1504050718
Differential Revision: D60411560
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang