New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Inductor Freezing #100652

Closed

eellison wants to merge 31 commits into gh/eellison/439/base from gh/eellison/439/head

Contributor

eellison commented May 4, 2023 •

edited by pytorch-bot bot

Stack from ghstack (oldest at bottom):

Adds a freezing pass that will constant fold parameters in inductor config.freezing. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:

There is an option to discard parameters config.freezing_discard_parameters which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used.
I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed fw_metadata to avoid constant folding mutated paraemters.
Caching parameter transformations/constant folding across different inferences nyi
Checking version_counter of constant folded params nyi

I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @aakhundov @soumith @desertfire

Differential Revision: D46244033


          Inductor Optimize For Inference/Freezing

65d0a76

[ghstack-poisoned]

pytorch-bot bot commented May 4, 2023 •

edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100652

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 6 Unrelated Failures

As of commit 17a09e6 with merge base f37be77 ():

NEW FAILURE - The following job has failed:

inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added the release notes: AO frontend label

This was referenced May 4, 2023

[TMP] stand in for 100457 #100651

Closed

Fold Conv-Bn #100653

Closed

github-actions bot added ciflow/inductor module: inductor labels

This was referenced May 5, 2023

Inference Horizontal Fuse Addmm #100746

Closed

Enable reordering pass #100747

Closed

eellison marked this pull request as draft

May 5, 2023 22:33


          Update on "Inductor Optimize For Inference/Freezing"

85539ee

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

XiaobingSuper mentioned this pull request

inductor(CPU): skip weight packing when autocast is enabled #100844

Closed

pytorchmergebot pushed a commit that referenced this pull request


          inductor(CPU): skip weight packing when autocast is enabled (#100844)

46affcb

Currently, the packed op doesn't support autocast and the packing path happened before AOTAutograd, which changes the default autocast behavior. Now, we disable the packing path, and the bfloat16 packing path can work after we move this path after AOTAutograd(I will do it after #100652 is done).

Pull Request resolved: #100844
Approved by: https://github.com/jgong5, https://github.com/jansel

kiersten-stokes pushed a commit to kiersten-stokes/pytorch that referenced this pull request


          inductor(CPU): skip weight packing when autocast is enabled (pytorch#…

729fc77

…100844)

Currently, the packed op doesn't support autocast and the packing path happened before AOTAutograd, which changes the default autocast behavior. Now, we disable the packing path, and the bfloat16 packing path can work after we move this path after AOTAutograd(I will do it after pytorch#100652 is done).

Pull Request resolved: pytorch#100844
Approved by: https://github.com/jgong5, https://github.com/jansel

eellison added 5 commits

May 9, 2023 01:04


          Update on "Inductor Optimize For Inference/Freezing"

b26af0b

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]


          Update on "Inductor Optimize For Inference/Freezing"

7e1e2a9

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]


          Update on "Inductor Optimize For Inference/Freezing"

03bd79e

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]


          Update on "Inductor Optimize For Inference/Freezing"

edadb81

Gives 1% boost on hf_Bert inference.

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]


          Update on "Inductor Optimize For Inference/Freezing"

20100ca

Gives 1% boost on hf_Bert inference.

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

eellison changed the title ~~Inductor Optimize For Inference/Freezing~~ Inductor Optimize For Freezing

eellison changed the title ~~Inductor Optimize For Freezing~~ Inductor Freezing

eellison mentioned this pull request

Enable shape padding, cache benchmarking #100982

Closed


          Update on "Inductor Freezing"

955ba1d

Gives 1% boost on hf_Bert inference.

cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Inductor Optimize For Inference/Freezing

ee2318b

ghstack-source-id: 0ebf73bfa14ebebe376a2bba0d1acf0564379d44
Pull Request resolved: #100652


          Update on "Inductor Freezing"

6fdc419


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Inductor Optimize For Inference/Freezing

3ea1557

ghstack-source-id: b5dc8c70ee1b6fcd02d2d2b4a1c9c1b9ed4af9db
Pull Request resolved: #100652


          Update on "Inductor Freezing"

da65575


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

eellison added a commit that referenced this pull request


          Inductor Optimize For Inference/Freezing

b686992

ghstack-source-id: 2d504f26845f6c89b892a23ac48376a002894f51
Pull Request resolved: #100652

eellison added 2 commits

May 16, 2023 16:54


          Update on "Inductor Freezing"

6333b39


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]


          Update on "Inductor Freezing"

887649b


cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

Contributor Author

eellison commented Jun 9, 2023

@pytorchbot merge

pytorch-bot bot added the ciflow/trunk label

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Jun 9, 2023

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Collaborator

pytorchmergebot commented Jun 9, 2023

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label


          Update on "Inductor Freezing"

91fd27c


Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:

- There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. 
- I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. 
- Caching parameter transformations/constant folding across different inferences nyi
- Checking version_counter of constant folded params nyi


I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire

Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033)

[ghstack-poisoned]

eellison mentioned this pull request

[inductor] Store real inputs to be used for cpp wrapper codegen #103289

Closed

eellison added 2 commits

June 12, 2023 17:12


          Update on "Inductor Freezing"


Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:

- There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. 
- I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. 
- Caching parameter transformations/constant folding across different inferences nyi
- Checking version_counter of constant folded params nyi


I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire

Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033)

[ghstack-poisoned]


          Update on "Inductor Freezing"

17a09e6


Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:

- There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. 
- I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. 
- Caching parameter transformations/constant folding across different inferences nyi
- Checking version_counter of constant folded params nyi


I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire

Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033)

[ghstack-poisoned]

Contributor Author

eellison commented Jun 12, 2023

@pytorchbot merge -f "unrelated failures"

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Jun 12, 2023

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added Merged and removed merging labels

pytorchmergebot closed this in

d083d44

Member

osalpekar commented Jun 12, 2023

@pytorchbot revert -m "This seems to be breaking test_aliased_param_return_cpu on trunk. See for more details: https://www.torch-ci.com/pytorch/pytorch/commit/d083d444ff41cfb2352f4f5e1780c1b9a2126049" -c landrace

Collaborator

pytorchmergebot commented Jun 12, 2023

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

Collaborator

pytorchmergebot commented Jun 12, 2023

Can't revert PR that was landed via phabricator as D46244033. Please revert by going to the internal diff and clicking Unland.

eellison reopened this

eellison closed this

Contributor Author

eellison commented Jun 12, 2023

Test disabled here #103466

Contributor

huydhn commented Jun 12, 2023

There are now other failed tests besides the above disabled one https://hud.pytorch.org/pytorch/pytorch/commit/c3d3165f16dccd88872139b72cd421e0ceafdd9b, and the diff hasn't been landed internally yet, so should we submit a revert PR or disable the whole test_inductor_freezing file?

leslie-fang-intel mentioned this pull request

[Inductor] Constant folding support for FX module captured by Dynamo Export #103582

Closed

facebook-github-bot deleted the gh/eellison/439/head branch

June 16, 2023 14:16

leslie-fang-intel mentioned this pull request

[RFC] TorchInductor with X86 CPU as backend of Quantization in PyTorch 2.0 Export #104150

Closed

huydhn mentioned this pull request

Revisit pytorchbot logic when reverting PRs exported from internal diffs #104232

Closed

malfet mentioned this pull request

[GHF] Better check for internal diffs #104344

Closed

pytorchmergebot pushed a commit that referenced this pull request


          [GHF] Better check for internal diffs (#104344)

8464a6a

During revert, use title of "Meta Internal-Only Changes Check" to determine whether or not internal diff is associated with the PR. When PR is merged/closed, "Meta Internal-Only Changes Check" status is always success, but title message can differ:
- "There is no internal Diff connected, this can be merged now" means that there are no internal change associated with PR (or it was landed via GitHub First workflow)
- "The internal Diff has landed, this can be merged now" meaning that PR has associated internal DIFF, and OSS and internal reverts must happen in sync using internal tooling. (Or a revert PR can be authored in OSS)

Add regression test for #100652 that was originated from the internal diff, but was merged as OSS PR.

Fixes #104232

Pull Request resolved: #104344
Approved by: https://github.com/bigfootjon, https://github.com/huydhn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

bdhirsh bdhirsh left review comments

leslie-fang-intel leslie-fang-intel left review comments

jansel jansel approved these changes

Chillee Awaiting requested review from Chillee

ngimel Awaiting requested review from ngimel

ezyang Awaiting requested review from ezyang

desertfire Awaiting requested review from desertfire

ipiszy Awaiting requested review from ipiszy