New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOTInductor] Include constants in AOTInductor .so file. #108473
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108473
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 1868687 with merge base c458fa0 (): BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D48927532 |
afaad01
to
814a456
Compare
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
This pull request was exported from Phabricator. Differential Revision: D48927532 |
814a456
to
df24558
Compare
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
This pull request was exported from Phabricator. Differential Revision: D48927532 |
df24558
to
2293345
Compare
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
This pull request was exported from Phabricator. Differential Revision: D48927532 |
2293345
to
30bb690
Compare
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
This pull request was exported from Phabricator. Differential Revision: D48927532 |
30bb690
to
51e49d1
Compare
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
51e49d1
to
365e48c
Compare
This pull request was exported from Phabricator. Differential Revision: D48927532 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D48927532 |
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
365e48c
to
b419fc9
Compare
This pull request was exported from Phabricator. Differential Revision: D48927532 |
) Summary: Include constants in AOTInductor .so file. Added some difference: 1) serialize with ctypes instead of the native of torch.storage 2) Use the underlying for_blob instead of from_blob to construct Tensor. Test Plan: Unit tests: ``` test/inductor/test_aot_inductor.py ``` fb: MRS tests (https://fburl.com/gdoc/ffllzw72): ``` LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor ``` Previous failed buck tests: ``` buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)' ``` Differential Revision: D48927532
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot label "release notes: export" |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Merge failedReason: New commits were pushed while merging. Please rerun the merge command. Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…108473)" Original commit changeset: 9494d031e3ac Original Phabricator Diff: D49075977 Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/) [ghstack-poisoned]
…108473)" Original commit changeset: 9494d031e3ac Original Phabricator Diff: D49075977 Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/) ghstack-source-id: 200617295 Pull Request resolved: #109243
…108473)" Original commit changeset: 9494d031e3ac Original Phabricator Diff: D49075977 Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/) ghstack-source-id: 6f57b61f2d11162ba2dfa8dcebecc3d4cb74f2aa Pull Request resolved: #109243
…in AOTInductor .so file. (#108473)"" Original commit changeset: 9494d031e3ac Original Phabricator Diff: D49075977 Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
…so file. (#108473)"" Original commit changeset: 9494d031e3ac Original Phabricator Diff: D49075977 Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/) cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov [ghstack-poisoned]
Summary: Same as #109560, made a new PR because we need to land from internal Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after #108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup. This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs. For example, ``` python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM ``` results in `1.359x` speedup. Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it. [Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8) Test Plan: CI Differential Revision: D49513934 Pull Request resolved: #109820 Approved by: https://github.com/desertfire
Summary:
Include constants in AOTInductor .so file.
Added some difference:
Test Plan:
Unit tests:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @aakhundov @anijain2305