Skip to content

Conversation

oniononion36
Copy link
Contributor

@oniononion36 oniononion36 commented Jun 11, 2025

Summary: For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with

buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"

will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Differential Revision: D76092551

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155722

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 08ad237 with merge base ffac0de (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request Jun 11, 2025
Summary:
Pull Request resolved: pytorch#155722

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Differential Revision: D76092551
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

pytorch-bot bot pushed a commit that referenced this pull request Jun 16, 2025
Summary:
Pull Request resolved: #155722

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76092551
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

oniononion36 added a commit to oniononion36/pytorch that referenced this pull request Jun 16, 2025
Summary:
Pull Request resolved: pytorch#155722

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76092551
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

pytorch-bot bot pushed a commit that referenced this pull request Jun 23, 2025
Summary:

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with 
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76092551
pytorch-bot bot pushed a commit that referenced this pull request Jun 24, 2025
Summary:

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with 
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76092551
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

Summary:

For priors like layer norm, the order of the weight quantization kernel might be different and therefore have a different suffix, so we use regular expression instead.

Test Plan:
Trying this on model id 737772166 with 
```
buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=737772166_0 --trace-aot-inductor-module=True --disable-acc-tracer=False --batch-size=1024 --node_replacement_dict "{'(autotune)':{'(1000+,1000+)':'fp8_float_model_dynamic_quantization_rowwise'}"
```
will allow more linears to be correctly replaced with fp8.
An example of the gpu trace can be found in https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree/hpc/new/models/feed/benchmark/libkineto_activities_773108_f58b57e208c04787acd3bcb01a3e8771.json.gz&bucket=gpu_traces.

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76092551
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76092551

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants