Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOTInductor] Include constants in AOTInductor .so file. #108473

Closed
wants to merge 4 commits into from

Conversation

muchulee8
Copy link
Contributor

@muchulee8 muchulee8 commented Sep 2, 2023

Summary:
Include constants in AOTInductor .so file.
Added some difference:

  1. serialize with ctypes instead of the native of torch.storage
  2. Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:

test/inductor/test_aot_inductor.py

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @aakhundov @anijain2305

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108473

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1868687 with merge base c458fa0 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 4, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 4, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48927532

muchulee8 added a commit to muchulee8/pytorch that referenced this pull request Sep 5, 2023
)

Summary:

Include constants in AOTInductor .so file.
Added some difference:
1) serialize with ctypes instead of the native of torch.storage
2) Use the underlying for_blob instead of from_blob to construct Tensor.

Test Plan:
Unit tests:
```
test/inductor/test_aot_inductor.py
```
fb:
MRS tests (https://fburl.com/gdoc/ffllzw72):
```
LOGLEVEL=DEBUG TORCHINDUCTOR_MAX_AUTOTUNE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 ../buck-out/v2/gen/fbcode/3408cf5f8424049a/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/966480198/289/gpu_lowering/input.predictor --skip-trt --sync-mode=0 --enable-aot-inductor
```

Previous failed buck tests:
```
buck2 test 'fbcode//mode/dev-nosan' fbcode//caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark -- --exact 'caffe2/torch/fb/model_transform/experimental/benchmark/test:test_aot_inductor_benchmark - test_aot_inductor_benchmark_oemae (caffe2.torch.fb.model_transform.experimental.benchmark.test.test_aot_inductor_benchmark.AOTInductorBenchmark)'
```

Differential Revision: D48927532
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@muchulee8
Copy link
Contributor Author

@pytorchbot label "release notes: export"

@muchulee8
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot
Copy link
Contributor

@muchulee8 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: New commits were pushed while merging. Please rerun the merge command.

Details for Dev Infra team Raised by workflow job

@muchulee8
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

bertmaher added a commit that referenced this pull request Sep 13, 2023
…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Sep 13, 2023
…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

ghstack-source-id: 200617295
Pull Request resolved: #109243
bertmaher added a commit that referenced this pull request Sep 13, 2023
…108473)"

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

ghstack-source-id: 6f57b61f2d11162ba2dfa8dcebecc3d4cb74f2aa
Pull Request resolved: #109243
bertmaher added a commit that referenced this pull request Sep 13, 2023
…in AOTInductor .so file. (#108473)""

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
bertmaher added a commit that referenced this pull request Sep 13, 2023
…so file. (#108473)""

Original commit changeset: 9494d031e3ac

Original Phabricator Diff: D49075977

Differential Revision: [D49243049](https://our.internmc.facebook.com/intern/diff/D49243049/)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 21, 2023
Summary: Same as #109560, made a new PR because we need to land from internal

Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after #108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup.

This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs.

For example,
```
python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM
```
results in `1.359x` speedup.

Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it.

[Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8)

Test Plan: CI

Differential Revision: D49513934

Pull Request resolved: #109820
Approved by: https://github.com/desertfire
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants