[inductor] pack linear for FP32 dynamic mode #157542

jiayisunx · 2025-07-03T09:48:52Z

Stack from ghstack (oldest at bottom):

-> [inductor] pack linear for FP32 dynamic mode #157542

Summary:
Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode.
I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2% improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @mlazos

pytorch-bot · 2025-07-03T09:48:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157542

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 0ccd7e7 with merge base 3008d98 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 6aa9624 Pull Request resolved: #157542

[ghstack-poisoned]

ghstack-source-id: b245438 Pull Request resolved: #157542

[ghstack-poisoned]

ghstack-source-id: 63d4055 Pull Request resolved: #157542

[ghstack-poisoned]

ghstack-source-id: 4c437fd Pull Request resolved: #157542

[ghstack-poisoned]

ghstack-source-id: ced9db2 Pull Request resolved: #157542

[ghstack-poisoned]

test/inductor/test_cpu_select_algorithm.py

ghstack-source-id: b7ed168 Pull Request resolved: #157542

[ghstack-poisoned]

jiayisunx · 2025-08-18T07:43:46Z

@pytorchbot merge

pytorchmergebot · 2025-08-18T07:45:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode. I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2% improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement. X-link: pytorch/pytorch#157542 Approved by: https://github.com/CaoE, https://github.com/jansel Reviewed By: seemethere Differential Revision: D80465691 fbshipit-source-id: 1a3627884c3769f292eec4c3ad396e7c91162c46

Summary: Currently, Linear in FP32 dynamic mode(batch_size has free symbols) does not support weight prepacking since MKL Linear does not support dynamic mode. This PR uses oneDNN Linear to support Linear weight prepacking in FP32 dynamic mode. I tested the Inductor benchmark in FP32 dynamic mode on CPU using this PR, and saw ~8% improvement in timm_models geomean speedup, ~2% improvement in torchbench geomean speedup, and no change in huggingface. There are about 18 models with different degrees of performance improvement, among which BERT_pytorch, soft_actor_critic, BlenderbotForCausalLM, ElectraForCausalLM, crossvit_9_240, mobilevit_s, twins_pcpvt_base have more than 20% performance improvement. Pull Request resolved: pytorch#157542 Approved by: https://github.com/CaoE, https://github.com/jansel

pytorch-bot bot added ciflow/inductor module: inductor labels Jul 3, 2025

jiayisunx added a commit that referenced this pull request Jul 3, 2025

[indcutor] pack linear for FP32 dynamic mode

0805fc7

ghstack-source-id: 6aa9624 Pull Request resolved: #157542

Update

53bd379

[ghstack-poisoned]

pytorchbot added the open source label Jul 3, 2025

jiayisunx marked this pull request as draft July 4, 2025 03:16

Update

44d6682

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request Jul 4, 2025

[indcutor] pack linear for FP32 dynamic mode

e510d90

ghstack-source-id: b245438 Pull Request resolved: #157542

Update

a260ac3

[ghstack-poisoned]

jiayisunx added a commit that referenced this pull request Jul 7, 2025

[indcutor] pack linear for FP32 dynamic mode

6546e26

ghstack-source-id: 63d4055 Pull Request resolved: #157542

Update

a247c74

[ghstack-poisoned]

jiayisunx added the release notes: inductor label Jul 7, 2025

CaoE changed the title ~~[indcutor] pack linear for FP32 dynamic mode~~ [inductor] pack linear for FP32 dynamic mode Jul 16, 2025

pytorch-bot bot added the module: dynamo label Jul 29, 2025

jiayisunx added a commit that referenced this pull request Jul 29, 2025

[indcutor] pack linear for FP32 dynamic mode

a78ef42

ghstack-source-id: 4c437fd Pull Request resolved: #157542

Update

114f6ff

[ghstack-poisoned]

jiayisunx marked this pull request as ready for review August 4, 2025 03:38

jiayisunx requested a review from CaoE August 4, 2025 03:39

jiayisunx added a commit that referenced this pull request Aug 4, 2025

[indcutor] pack linear for FP32 dynamic mode

1c99a32

ghstack-source-id: ced9db2 Pull Request resolved: #157542

Update

a12209a

[ghstack-poisoned]

CaoE reviewed Aug 12, 2025

View reviewed changes

test/inductor/test_cpu_select_algorithm.py Show resolved Hide resolved

jiayisunx added a commit that referenced this pull request Aug 13, 2025

[indcutor] pack linear for FP32 dynamic mode

a7e421d

ghstack-source-id: b7ed168 Pull Request resolved: #157542

jiayisunx requested a review from CaoE August 13, 2025 07:00

CaoE approved these changes Aug 13, 2025

View reviewed changes

Update

0ccd7e7

[ghstack-poisoned]

jiayisunx requested a review from jansel August 15, 2025 01:53

jansel approved these changes Aug 15, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 18, 2025

pytorchmergebot added the merging label Aug 18, 2025

pytorchmergebot closed this in 95e456f Aug 18, 2025

pytorchmergebot added Merged and removed merging labels Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] pack linear for FP32 dynamic mode #157542

[inductor] pack linear for FP32 dynamic mode #157542

Uh oh!

jiayisunx commented Jul 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

jiayisunx commented Aug 18, 2025

Uh oh!

pytorchmergebot commented Aug 18, 2025

Uh oh!

Uh oh!

[inductor] pack linear for FP32 dynamic mode #157542

[inductor] pack linear for FP32 dynamic mode #157542

Uh oh!

Conversation

jiayisunx commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157542

⏳ No Failures, 1 Pending

Uh oh!

Uh oh!

jiayisunx commented Aug 18, 2025

Uh oh!

pytorchmergebot commented Aug 18, 2025

Merge started

Uh oh!

Uh oh!

jiayisunx commented Jul 3, 2025 •

edited

Loading

pytorch-bot bot commented Jul 3, 2025 •

edited

Loading