support eval of float8_a1x128_w128x128 #3269

vkuzo · 2025-10-31T18:35:24Z

Summary:

Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.

For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up the cuBLAS binding on H100, which should make the UEX of
running evals a lot better.

Test Plan:

Smoke test on LLama-3.1-8B, accuracy looks good

// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B

// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B

// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}

// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-10-31T18:35:25Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-10-31T18:35:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3269

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

✅ No Failures

As of commit cafe668 with merge base f856d36 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds support for the new float8 scaling recipe in the official eval scripts used to generate accuracy numbers in the README. For now, I am using this as a smoke test that the scaling is working on a real model - it is. We can add official benchmark results after we hook up slayton's cuBLAS binding on H100, which should make the UEX of running evals a lot better. Test Plan: Smoke test on LLama-3.1-8B, accuracy looks good ``` // download checkpoint with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B // prepare checkpoint python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B // run bf16 eval on a single task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697} // run float8 eval on the same task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477} ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 01b8d77 ghstack-comment-id: 3474380821 Pull-Request: #3269

[ghstack-poisoned]

Summary: Adds support for the new float8 scaling recipe in the official eval scripts used to generate accuracy numbers in the README. For now, I am using this as a smoke test that the scaling is working on a real model - it is. We can add official benchmark results after we hook up slayton's cuBLAS binding on H100, which should make the UEX of running evals a lot better. Test Plan: Smoke test on LLama-3.1-8B, accuracy looks good ``` // download checkpoint with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B // prepare checkpoint python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B // run bf16 eval on a single task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697} // run float8 eval on the same task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477} ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e87609a ghstack-comment-id: 3474380821 Pull-Request: #3269

[ghstack-poisoned]

Summary: Adds support for the new float8 scaling recipe in the official eval scripts used to generate accuracy numbers in the README. For now, I am using this as a smoke test that the scaling is working on a real model - it is. We can add official benchmark results after we hook up slayton's cuBLAS binding on H100, which should make the UEX of running evals a lot better. Test Plan: Smoke test on LLama-3.1-8B, accuracy looks good ``` // download checkpoint with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B // prepare checkpoint python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B // run bf16 eval on a single task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697} // run float8 eval on the same task with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile ... winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477} ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e87609a ghstack-comment-id: 3474380821 Pull-Request: #3269

[ghstack-poisoned]

Update

22d1a14

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2025

This was referenced Oct 31, 2025

add a_1_128_w_128_128 (DeepSeek) float8 scaling for inference #3257

Merged

add bias handling for a_1_128_w_128_128 float8 scaling #3259

Merged

Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

Open

vkuzo requested review from andrewor14, jainapurva and jerryzh168 October 31, 2025 18:36

vkuzo added the topic: for developers Use this tag if this PR is mainly developer facing label Oct 31, 2025

Update

9a995b5

[ghstack-poisoned]

Update

485ee80

[ghstack-poisoned]

vkuzo mentioned this pull request Nov 3, 2025

make float8 a1x128_w128x128 granularity serializeable #3279

Open

Update

cafe668

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support eval of float8_a1x128_w128x128 #3269

support eval of float8_a1x128_w128x128 #3269

Uh oh!

vkuzo commented Oct 31, 2025 •

edited

Loading

Uh oh!

vkuzo commented Oct 31, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

support eval of float8_a1x128_w128x128 #3269

Are you sure you want to change the base?

support eval of float8_a1x128_w128x128 #3269

Uh oh!

Conversation

vkuzo commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3269

❗ 1 Active SEVs

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Oct 31, 2025 •

edited

Loading

vkuzo commented Oct 31, 2025 •

edited

Loading

pytorch-bot bot commented Oct 31, 2025 •

edited

Loading