Skip to content

Conversation

Valentine233
Copy link
Collaborator

@Valentine233 Valentine233 commented Sep 27, 2024

Reopen #121782, as more optimizations have landed.

Fixes #115261, #113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

Validation on 3 benchmark suites

FP32

image

Outlier models (speedup<0.8, single socket): None.

BF16

image

Outlier models (speedup<0.8, single socket multi threads):

  • functorch_dp_cifar10 0.58
  • opacus_cifar10 0.57

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

Copy link

pytorch-bot bot commented Sep 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136827

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7f539db with merge base 565a794 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Valentine233
Copy link
Collaborator Author

Valentine233 commented Oct 9, 2024

The two outliers (functorch_dp_cifar10 and opacus_cifar10) are caused by the lack of int64 vectorization. With disabling config enable_tiling_heuristics, the outliers can disappear. cc @leslie-fang-intel @jiayisunx

@Valentine233
Copy link
Collaborator Author

The two outliers are expected to be fixed by #136422. Would land this PR after that.

@Valentine233 Valentine233 marked this pull request as ready for review October 10, 2024 05:19
@Valentine233 Valentine233 requested a review from jgong5 October 10, 2024 05:19
@Valentine233 Valentine233 added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 10, 2024
@soulitzer soulitzer requested review from jansel and eellison October 10, 2024 15:08
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 10, 2024
Comment on lines 821 to 861
# Use ftree-loop-vectorize when compiling
enable_tree_loop_vec_opt_flag = (
os.environ.get("TORCHINDUCTOR_CPP_ENABLE_TREE_LOOP_VEC_OPT_FLAG", "0") == "1"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is it recommended to turn on this flag? Please add a meaningful note here. Or, would it work if we always disable this compiler flag without an option to turn it on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kept for the need of potential vectorization by this compiler flag. But as most of the vectorizations are supported now, I suppose that it is fine to remove the option.

@Valentine233 Valentine233 force-pushed the tree_loop_vec_target branch 2 times, most recently from 4e9434c to 599ddb0 Compare October 18, 2024 01:03
@eellison eellison removed their request for review October 23, 2024 15:27
@Valentine233
Copy link
Collaborator Author

@jgong5 Please help review, the model regressions and CI failures are all resolved.

@Valentine233 Valentine233 requested a review from jgong5 November 7, 2024 01:09
@Valentine233
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Valentine233
Copy link
Collaborator Author

I think we need an "is gcc" check on this flag.

Thanks, Jason. I have modified and please help review again.

@Valentine233 Valentine233 requested a review from jansel November 11, 2024 05:31
@Valentine233
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Reopen pytorch#121782, as more optimizations have landed.

Fixes pytorch#115261, pytorch#113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: pytorch#136827
Approved by: https://github.com/jansel, https://github.com/jgong5
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…rch#136827)"

This reverts commit cf0bb6c.

Reverted pytorch#136827 on behalf of https://github.com/ZainRizvi due to Sorry but this breaks internally. See D65605094 for more details ([comment](pytorch#136827 (comment)))
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Reopen pytorch#121782, as more optimizations have landed.

Fixes pytorch#115261, pytorch#113017.
For CPU inductor path, remove -ftree-loop-vectorize from optimization flags to fix functional issues.

### Validation on 3 benchmark suites

#### FP32
![image](https://github.com/user-attachments/assets/ec920928-fa36-467f-ba07-d2c05c51b92e)

Outlier models (speedup<0.8, single socket): None.

#### BF16
![image](https://github.com/user-attachments/assets/4a301e5e-147d-4b74-beb1-40290969ed80)

Outlier models (speedup<0.8, single socket multi threads):

- functorch_dp_cifar10 0.58
- opacus_cifar10 0.57

Pull Request resolved: pytorch#136827
Approved by: https://github.com/jansel, https://github.com/jgong5
@github-actions github-actions bot deleted the tree_loop_vec_target branch December 12, 2024 02:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor open source Reverted topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Inductor][cpu][miscompile] Outputs of torch.matmul abnormally change with extra outputs
7 participants