-
Notifications
You must be signed in to change notification settings - Fork 25.1k
[cpu] Modify inductor opt flag --- ftree-loop-vectorize #136827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136827
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 7f539db with merge base 565a794 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The two outliers ( |
The two outliers are expected to be fixed by #136422. Would land this PR after that. |
torch/_inductor/config.py
Outdated
# Use ftree-loop-vectorize when compiling | ||
enable_tree_loop_vec_opt_flag = ( | ||
os.environ.get("TORCHINDUCTOR_CPP_ENABLE_TREE_LOOP_VEC_OPT_FLAG", "0") == "1" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is it recommended to turn on this flag? Please add a meaningful note here. Or, would it work if we always disable this compiler flag without an option to turn it on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kept for the need of potential vectorization by this compiler flag. But as most of the vectorizations are supported now, I suppose that it is fine to remove the option.
4e9434c
to
599ddb0
Compare
599ddb0
to
fa384c1
Compare
@jgong5 Please help review, the model regressions and CI failures are all resolved. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
a98efcc
to
7f539db
Compare
Thanks, Jason. I have modified and please help review again. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32  Outlier models (speedup<0.8, single socket): None. #### BF16  Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5
…rch#136827)" This reverts commit cf0bb6c. Reverted pytorch#136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](pytorch#136827 (comment)))
Reopen pytorch#121782, as more optimizations have landed. Fixes pytorch#115261, pytorch#113017. For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues. ### Validation on 3 benchmark suites #### FP32  Outlier models (speedup<0.8, single socket): None. #### BF16  Outlier models (speedup<0.8, single socket multi threads): - functorch_dp_cifar10 0.58 - opacus_cifar10 0.57 Pull Request resolved: pytorch#136827 Approved by: https://github.com/jansel, https://github.com/jgong5
Reopen #121782, as more optimizations have landed.
Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.
Validation on 3 benchmark suites
FP32
Outlier models (speedup<0.8, single socket): None.
BF16
Outlier models (speedup<0.8, single socket multi threads):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov