-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[PT2][Inductor] Fix "example_value" absent for stack nodes #120655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120655
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 9d91ae6 with merge base f36e00b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D54140488 |
c26b5a6
to
561950c
Compare
This pull request was exported from Phabricator. Differential Revision: D54140488 |
81296e0
to
b379313
Compare
Summary: We observed that stack nodes have missing exampe_value in DPA+FIRST, causing issue to further do split cat. Full error log: P1187633689. pre grad graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GPUFOBWniTeB6s8DAN8z9sHTadpxbr0LAAAz We found that it was introduced by the new stack nodes in the group batch fusion, thus we fix the bug to enable further split cat optimization. Test Plan: ``` buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode split_batch ``` before fix: P1187633689 ``` W0221 13:32:09.334000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: sigmoid_16 W0221 13:32:09.335000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_19 W0221 13:32:09.335000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_16 W0221 13:32:09.335000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_6 W0221 13:32:09.335000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_5 W0221 13:32:09.336000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_4 W0221 13:32:09.517000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_20 W0221 13:32:09.518000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_18 W0221 13:32:09.518000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_17 W0221 13:32:09.521000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_19 W0221 13:32:09.521000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_15 W0221 13:32:09.521000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_14 W0221 13:32:09.522000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_16 W0221 13:32:09.524000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_18 W0221 13:32:09.525000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_12 W0221 13:32:09.525000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_11 W0221 13:32:09.525000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_13 W0221 13:32:09.527000 139773455527936 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_17 W0221 13:32:09.528000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_9 W0221 13:32:09.528000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_8 W0221 13:32:09.528000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_10 W0221 13:32:09.528000 139773455527936 torch/_inductor/fx_passes/split_cat.py:274] [0/0_1] example value absent for node: stack_7 ``` after fix: P1189491364 ``` W0226 13:19:56.542000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: sigmoid_16 W0226 13:19:56.543000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_16 W0226 13:19:56.703000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_20 W0226 13:19:56.707000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_19 W0226 13:19:56.711000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_18 W0226 13:19:56.713000 139770599518208 torch/_inductor/fx_passes/split_cat.py:186] [0/0_1] example value absent for node: add_17 ``` Reviewed By: jackiexu1992 Differential Revision: D54140488
b379313
to
9d91ae6
Compare
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
We observed that stack nodes have missing exampe_value in DPA+FIRST, causing issue to further do split cat. Full error log: P1187633689.
pre grad graph: https://www.internalfb.com/intern/everpaste/?color=0&handle=GPUFOBWniTeB6s8DAN8z9sHTadpxbr0LAAAz
We found that it was introduced by the new stack nodes in the group batch fusion, thus we fix the bug to enable further split cat optimization.
Test Plan:
before fix: P1187633689
after fix:
P1189491364
Differential Revision: D54140488
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler @amjames