Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

vkuzo · 2025-10-30T19:55:59Z

Summary:

For now, we just care about bf16 output. We can add fp32 and a flag to
control it later, if needed.

Test Plan:

pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-10-30T19:56:00Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-10-30T19:56:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3265

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit c877d67 with merge base f856d36 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio fbgemm-gpu-genai --index-url https... / linux-job (gh) (trunk failure)
test_expected_gpu_kernel_fbgemm
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/sparsity/test_sparse_api.py::TestQuantSemiSparse::test_sparse_marlin_compile_True

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b3c443c ghstack-comment-id: 3469836810 Pull-Request: #3265

[ghstack-poisoned]

Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f00cd47 ghstack-comment-id: 3469836810 Pull-Request: #3265

[ghstack-poisoned]

Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8a63a04 ghstack-comment-id: 3469836810 Pull-Request: #3265

[ghstack-poisoned]

danielvegamyhre

LGTM. Btw in my benchmarks I found the torch._scaled_mm cutlass kernel for blockwise gemms to be much faster than the triton kernels. This was a few months ago, you can run the benchmarks scripts in this dir if you want: https://github.com/pytorch/ao/tree/main/benchmarks/prototype/blockwise_fp8_training

vkuzo added 20 commits October 29, 2025 04:05

Update

990ef89

[ghstack-poisoned]

Update

cce08f0

[ghstack-poisoned]

Update

681277a

[ghstack-poisoned]

Update

26ade98

[ghstack-poisoned]

Update

f76e10b

[ghstack-poisoned]

Update

6994e20

[ghstack-poisoned]

Update

1aff468

[ghstack-poisoned]

Update

f6fa134

[ghstack-poisoned]

Update

1911212

[ghstack-poisoned]

Update

9ec8ce1

[ghstack-poisoned]

Update

57b8876

[ghstack-poisoned]

Update

1161f7f

[ghstack-poisoned]

Update

c5be7c0

[ghstack-poisoned]

Update

00c6bbb

[ghstack-poisoned]

Update

d40ec7c

[ghstack-poisoned]

Update

ce5a8eb

[ghstack-poisoned]

Update

be5a9bb

[ghstack-poisoned]

Update

6a3684b

[ghstack-poisoned]

Update

1d4a2f7

[ghstack-poisoned]

Update

d28b0ae

[ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 30, 2025

This was referenced Oct 30, 2025

add a_1_128_w_128_128 (DeepSeek) float8 scaling for inference #3257

Merged

add bias handling for a_1_128_w_128_128 float8 scaling #3259

Merged

vkuzo requested a review from danielvegamyhre October 30, 2025 19:56

vkuzo added the topic: bug fix Use this tag for PRs that fix bugs label Oct 31, 2025

vkuzo added 2 commits October 31, 2025 06:26

Update

6c087b4

[ghstack-poisoned]

Update

4de79c9

[ghstack-poisoned]

Update

1938209

[ghstack-poisoned]

vkuzo added 3 commits October 31, 2025 09:59

Update

c4769a6

[ghstack-poisoned]

Update

eb95772

[ghstack-poisoned]

Update

526b741

[ghstack-poisoned]

vkuzo mentioned this pull request Oct 31, 2025

support eval of float8_a1x128_w128x128 #3269

Open

vkuzo added 3 commits October 31, 2025 12:43

Update

76671f9

[ghstack-poisoned]

Update

4a29159

[ghstack-poisoned]

Update

c877d67

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/159/head to main October 31, 2025 19:44

danielvegamyhre approved these changes Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

Uh oh!

vkuzo commented Oct 30, 2025

Uh oh!

vkuzo commented Oct 30, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

danielvegamyhre left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

Are you sure you want to change the base?

Makes fallback float8 1x128 by 128x128 gemm output bfloat16 #3265

Uh oh!

Conversation

vkuzo commented Oct 30, 2025

Uh oh!

vkuzo commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3265

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

danielvegamyhre left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vkuzo commented Oct 30, 2025 •

edited

Loading

pytorch-bot bot commented Oct 30, 2025 •

edited

Loading