Skip to content

Conversation

jiayisunx
Copy link
Collaborator

@jiayisunx jiayisunx commented Jul 3, 2025

Stack from ghstack (oldest at bottom):

Summary:
Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode.
I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2% improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @mlazos

Copy link

pytorch-bot bot commented Jul 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157542

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 0ccd7e7 with merge base 3008d98 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jiayisunx added a commit that referenced this pull request Jul 3, 2025
ghstack-source-id: 6aa9624
Pull Request resolved: #157542
[ghstack-poisoned]
@jiayisunx jiayisunx marked this pull request as draft July 4, 2025 03:16
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Jul 4, 2025
ghstack-source-id: b245438
Pull Request resolved: #157542
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Jul 7, 2025
ghstack-source-id: 63d4055
Pull Request resolved: #157542
[ghstack-poisoned]
@CaoE CaoE changed the title [indcutor] pack linear for FP32 dynamic mode [inductor] pack linear for FP32 dynamic mode Jul 16, 2025
jiayisunx added a commit that referenced this pull request Jul 29, 2025
ghstack-source-id: 4c437fd
Pull Request resolved: #157542
[ghstack-poisoned]
@jiayisunx jiayisunx marked this pull request as ready for review August 4, 2025 03:38
@jiayisunx jiayisunx requested a review from CaoE August 4, 2025 03:39
jiayisunx added a commit that referenced this pull request Aug 4, 2025
ghstack-source-id: ced9db2
Pull Request resolved: #157542
[ghstack-poisoned]
jiayisunx added a commit that referenced this pull request Aug 13, 2025
ghstack-source-id: b7ed168
Pull Request resolved: #157542
@jiayisunx jiayisunx requested a review from CaoE August 13, 2025 07:00
[ghstack-poisoned]
@jiayisunx jiayisunx requested a review from jansel August 15, 2025 01:53
@jiayisunx
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 18, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

facebook-github-bot pushed a commit to pytorch/benchmark that referenced this pull request Aug 19, 2025
Summary:
Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode.
I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2%  improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement.

X-link: pytorch/pytorch#157542
Approved by: https://github.com/CaoE, https://github.com/jansel

Reviewed By: seemethere

Differential Revision: D80465691

fbshipit-source-id: 1a3627884c3769f292eec4c3ad396e7c91162c46
can-gaa-hou pushed a commit to can-gaa-hou/pytorch that referenced this pull request Aug 22, 2025
Summary:
Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode.
I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2%  improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement.

Pull Request resolved: pytorch#157542
Approved by: https://github.com/CaoE, https://github.com/jansel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants