New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inductor Freezing #100652
Inductor Freezing #100652
Conversation
[ghstack-poisoned]
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
Currently, the packed op doesn't support autocast and the packing path happened before AOTAutograd, which changes the default autocast behavior. Now, we disable the packing path, and the bfloat16 packing path can work after we move this path after AOTAutograd(I will do it after #100652 is done). Pull Request resolved: #100844 Approved by: https://github.com/jgong5, https://github.com/jansel
…100844) Currently, the packed op doesn't support autocast and the packing path happened before AOTAutograd, which changes the default autocast behavior. Now, we disable the packing path, and the bfloat16 packing path can work after we move this path after AOTAutograd(I will do it after pytorch#100652 is done). Pull Request resolved: pytorch#100844 Approved by: https://github.com/jgong5, https://github.com/jansel
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
Gives 1% boost on hf_Bert inference. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
Gives 1% boost on hf_Bert inference. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
Gives 1% boost on hf_Bert inference. cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
ghstack-source-id: 0ebf73bfa14ebebe376a2bba0d1acf0564379d44 Pull Request resolved: #100652
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
ghstack-source-id: b5dc8c70ee1b6fcd02d2d2b4a1c9c1b9ed4af9db Pull Request resolved: #100652
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
ghstack-source-id: 2d504f26845f6c89b892a23ac48376a002894f51 Pull Request resolved: #100652
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
cc soumith voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral) Details for Dev Infra teamRaised by workflow job |
Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) [ghstack-poisoned]
Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) [ghstack-poisoned]
Adds a freezing pass that will constant fold parameters in inductor `config.freezing`. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes: - There is an option to discard parameters `config.freezing_discard_parameters` which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used. - I needed to expose flat_params in aot_autograd in order to discard old references when we constant fold away parameters, like with amp. I also exposed `fw_metadata` to avoid constant folding mutated paraemters. - Caching parameter transformations/constant folding across different inferences nyi - Checking version_counter of constant folded params nyi I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 aakhundov soumith desertfire Differential Revision: [D46244033](https://our.internmc.facebook.com/intern/diff/D46244033) [ghstack-poisoned]
@pytorchbot merge -f "unrelated failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "This seems to be breaking test_aliased_param_return_cpu on trunk. See for more details: https://www.torch-ci.com/pytorch/pytorch/commit/d083d444ff41cfb2352f4f5e1780c1b9a2126049" -c landrace |
@pytorchbot successfully started a revert job. Check the current status here. |
Can't revert PR that was landed via phabricator as D46244033. Please revert by going to the internal diff and clicking Unland. |
Test disabled here #103466 |
There are now other failed tests besides the above disabled one https://hud.pytorch.org/pytorch/pytorch/commit/c3d3165f16dccd88872139b72cd421e0ceafdd9b, and the diff hasn't been landed internally yet, so should we submit a revert PR or disable the whole test_inductor_freezing file? |
During revert, use title of "Meta Internal-Only Changes Check" to determine whether or not internal diff is associated with the PR. When PR is merged/closed, "Meta Internal-Only Changes Check" status is always success, but title message can differ: - "There is no internal Diff connected, this can be merged now" means that there are no internal change associated with PR (or it was landed via GitHub First workflow) - "The internal Diff has landed, this can be merged now" meaning that PR has associated internal DIFF, and OSS and internal reverts must happen in sync using internal tooling. (Or a revert PR can be authored in OSS) Add regression test for #100652 that was originated from the internal diff, but was merged as OSS PR. Fixes #104232 Pull Request resolved: #104344 Approved by: https://github.com/bigfootjon, https://github.com/huydhn
Stack from ghstack (oldest at bottom):
Adds a freezing pass that will constant fold parameters in inductor
config.freezing
. This occurs post functionalization in aot autograd to capture both dispatching and allow passes to occur post functionalization. A few notes:config.freezing_discard_parameters
which will take the current eager modules and wrap parameters to a Tensor subclass which will error if used.fw_metadata
to avoid constant folding mutated paraemters.I'm not really sure what the actual naming should be. In jit there was both "freezing", which was platform agnostic, and "optimize for inference", which made device specific optimizations. We're doing the latter here but maybe freezing is a better name.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @aakhundov @soumith @desertfire
Differential Revision: D46244033