-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[FP8] Fix Benchmarking for certain Priors #155722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155722
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 08ad237 with merge base ffac0de ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D76092551 |
This pull request was exported from Phabricator. Differential Revision: D76092551 |
77d0e0d
to
b31b1ab
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
b31b1ab
to
3b8ff60
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
3b8ff60
to
b26ec80
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
b26ec80
to
0ec1c44
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
0ec1c44
to
f017a2f
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
f017a2f
to
1778fec
Compare
Summary: Pull Request resolved: pytorch#155722 For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Differential Revision: D76092551
This pull request was exported from Phabricator. Differential Revision: D76092551 |
1778fec
to
27642e3
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
27642e3
to
82cfb48
Compare
b068f71
to
f9d518f
Compare
Summary: Pull Request resolved: #155722 For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Reviewed By: frank-wei Differential Revision: D76092551
This pull request was exported from Phabricator. Differential Revision: D76092551 |
f9d518f
to
27b1eb8
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
27b1eb8
to
01af5c1
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
01af5c1
to
4a40cae
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
4a40cae
to
08fef07
Compare
Summary: Pull Request resolved: pytorch#155722 For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Reviewed By: frank-wei Differential Revision: D76092551
08fef07
to
9ae78df
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
Summary: For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Reviewed By: frank-wei Differential Revision: D76092551
9ae78df
to
b45fee9
Compare
Summary: For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Reviewed By: frank-wei Differential Revision: D76092551
This pull request was exported from Phabricator. Differential Revision: D76092551 |
Summary: For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead. Test Plan: Trying this on model id 737772166 with ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}" ``` will allow more linears to be correctly replaced with fp8. An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces. Rollback Plan: Reviewed By: frank-wei Differential Revision: D76092551
b45fee9
to
08ad237
Compare
This pull request was exported from Phabricator. Differential Revision: D76092551 |
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary: For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.
Test Plan:
Trying this on model id 737772166 with
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.
Rollback Plan:
Differential Revision: D76092551
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov