Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Oct 31, 2025

Summary:

Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.

For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up the cuBLAS binding on H100, which should make the UEX of
running evals a lot better.

Test Plan:

Smoke test on LLama-3.1-8B, accuracy looks good

// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B

// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B

// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}

// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Oct 31, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3269

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit cafe668 with merge base f856d36 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Oct 31, 2025
Summary:

Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.

For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up slayton's cuBLAS binding on H100, which should make the UEX of
running evals a lot better.

Test Plan:

Smoke test on LLama-3.1-8B, accuracy looks good

```
// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B

// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B

// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}

// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 01b8d77
ghstack-comment-id: 3474380821
Pull-Request: #3269
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 31, 2025
@vkuzo vkuzo added the topic: for developers Use this tag if this PR is mainly developer facing label Oct 31, 2025
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Oct 31, 2025
Summary:

Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.

For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up slayton's cuBLAS binding on H100, which should make the UEX of
running evals a lot better.

Test Plan:

Smoke test on LLama-3.1-8B, accuracy looks good

```
// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B

// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B

// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}

// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: e87609a
ghstack-comment-id: 3474380821
Pull-Request: #3269
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Oct 31, 2025
Summary:

Adds support for the new float8 scaling recipe in the official eval
scripts used to generate accuracy numbers in the README.

For now, I am using this as a smoke test that the scaling is working on
a real model - it is. We can add official benchmark results after we
hook up slayton's cuBLAS binding on H100, which should make the UEX of
running evals a lot better.

Test Plan:

Smoke test on LLama-3.1-8B, accuracy looks good

```
// download checkpoint
with-proxy python scripts/download.py --hf_token {token} --repo_id meta-llama/Meta-Llama-3.1-8B

// prepare checkpoint
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3.1-8B

// run bf16 eval on a single task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande'
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7426992896606156, 'acc_stderr,none': 0.012285989618865697}

// run float8 eval on the same task
with-proxy time python torchao/_models/llama/eval.py --checkpoint_path checkpoints/meta-llama/Meta-Llama-3.1-8B/model.pth --tasks 'winogrande' --quantization float8_a1x128_w128x128 --compile
...
winogrande: {'alias': 'winogrande', 'acc,none': 0.7419100236779794, 'acc_stderr,none': 0.012298278833972477}
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: e87609a
ghstack-comment-id: 3474380821
Pull-Request: #3269
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: for developers Use this tag if this PR is mainly developer facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants