New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[AOTInductor] Include constants in AOTInductor .so file. #108473

Closed

muchulee8 wants to merge 4 commits into pytorch:main from muchulee8:export-D48927532

Contributor

muchulee8 commented Sep 2, 2023 •

edited

Summary:
Include constants in AOTInductor .so file.
Added some difference:

serialize with ctypes instead of the native of torch.storage
Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:

test/inductor/test_aot_inductor.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @aakhundov @anijain2305

pytorch-bot bot commented Sep 2, 2023 •

edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108473

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1868687 with merge base c458fa0 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented Sep 2, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

facebook-github-bot added the fb-exported label

github-actions bot added module: inductor module: dynamo ciflow/inductor module: export labels

muchulee8 force-pushed the export-D48927532 branch from afaad01 to 814a456 Compare

September 4, 2023 18:33

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

814a456

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

Contributor

facebook-github-bot commented Sep 4, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 force-pushed the export-D48927532 branch from 814a456 to df24558 Compare

September 4, 2023 18:34

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

df24558

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

Contributor

facebook-github-bot commented Sep 4, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 force-pushed the export-D48927532 branch from df24558 to 2293345 Compare

September 5, 2023 17:53

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

Contributor

facebook-github-bot commented Sep 5, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 force-pushed the export-D48927532 branch from 2293345 to 30bb690 Compare

September 5, 2023 17:53

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

30bb690

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

Contributor

facebook-github-bot commented Sep 5, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

angelayi approved these changes

View reviewed changes

muchulee8 force-pushed the export-D48927532 branch from 30bb690 to 51e49d1 Compare

September 5, 2023 21:50

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

51e49d1

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

365e48c

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

muchulee8 force-pushed the export-D48927532 branch from 51e49d1 to 365e48c Compare

September 5, 2023 21:50

Contributor

facebook-github-bot commented Sep 5, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

1 similar comment

Contributor

facebook-github-bot commented Sep 5, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

b419fc9

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

muchulee8 force-pushed the export-D48927532 branch from 365e48c to b419fc9 Compare

September 5, 2023 23:18

Contributor

facebook-github-bot commented Sep 5, 2023

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request


          [AOTInductor] Include constants in AOTInductor .so file. (pytorch#108473

bcf1e79

)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Sep 7, 2023

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

pytorch deleted a comment from facebook-github-bot

Contributor Author

muchulee8 commented Sep 7, 2023

@pytorchbot label "release notes: export"

pytorch-bot bot added the release notes: export label

Contributor Author

muchulee8 commented Sep 7, 2023

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Sep 7, 2023

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here


          Merge branch 'main' into export-D48927532

Contributor

facebook-github-bot commented Sep 8, 2023

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Collaborator

pytorchmergebot commented Sep 8, 2023

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team

Raised by workflow job

pytorchmergebot removed the merging label

Contributor Author

muchulee8 commented Sep 8, 2023

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented Sep 8, 2023

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added Merged and removed merging labels

pytorchmergebot closed this in

30a33b7

huydhn mentioned this pull request

[inductor] Switch to use the runtime interface for AOTInductor testing #108663

Closed

bertmaher added a commit that referenced this pull request


          Back out "[AOTInductor] Include constants in AOTInductor .so file. (#…

df6a9b8

…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

[ghstack-poisoned]

bertmaher added a commit that referenced this pull request


          Back out "[AOTInductor] Include constants in AOTInductor .so file. (#…

3f553dd

…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

ghstack-source-id: 200617295
Pull Request resolved: #109243

bertmaher added a commit that referenced this pull request


          Back out "[AOTInductor] Include constants in AOTInductor .so file. (#…

41c4558

…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

ghstack-source-id: 6f57b61f2d11162ba2dfa8dcebecc3d4cb74f2aa
Pull Request resolved: #109243

bertmaher added a commit that referenced this pull request


          Update base for Update on "Back out "[AOTInductor] Include constants …

49dbf9a

…in AOTInductor .so file. (#108473)""

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

bertmaher added a commit that referenced this pull request


          Update on "Back out "[AOTInductor] Include constants in AOTInductor .…

ae55a4d

…so file. (#108473)""

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]

This was referenced Sep 18, 2023

[AOTInductor] Update performance benchmark code #109560

Closed

[aotinductor] Update performance benchmark code (109560) #109820

Closed

pytorchmergebot pushed a commit that referenced this pull request


          [aotinductor] Update performance benchmark code (109560) (#109820)

f7ddc54

Summary: Same as #109560, made a new PR because we need to land from internal

Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after #108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup.

This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs.

For example,
```
python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM
```
results in `1.359x` speedup.

Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it.

[Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8)

Test Plan: CI

Differential Revision: D49513934

Pull Request resolved: #109820
Approved by: https://github.com/desertfire

ezyang mentioned this pull request

Properly setup tracing context for aot_compilation #119712

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment