add gptq benchmark, and speed up by ~3x with compile by vkuzo · Pull Request #4310 · pytorch/ao

vkuzo · 2026-04-21T18:31:14Z

Summary:

add gptq algo benchmark
capture a profiling run - two main issues, (a) the to_nvfp4 kernels and
(b) the hessian + weight updates
speed up (a) significantly by using torch.compile, we need to
ensure to enable TORCHINDUCTOR_EMULATE_DIVISION_ROUNDING to keep
parity with eager mode. Note there are significant accuracy
regressions without this set (small differences accumulate with each
incremental hessian + weight update), so we force it on by default.

e2e gptq run on LLaMa 3.2 1B speeds up by ~2.5x (172 seconds -> 69 seconds) with no change in accuracy:

time torchao/prototype/gptq/gptq_nvfp4_llama3_2_1b_nonsequential_wikitext.sh 2>&1 | tee ~/tmp/20260421_gptq_run_compile.txt
// logs: https://gist.github.com/vkuzo/4ebd0bfa71185b711b19c712216eae29

Test Plan:

[ghstack-poisoned]

vkuzo · 2026-04-21T18:31:16Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-21T18:31:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4310

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 1 New Failure

As of commit d302888 with merge base 1b04df3 ():

NEW FAILURE - The following job has failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio mslk --index-url https://download.... / linux-job (gh)
RuntimeError: Command docker exec -t e92b6a762ea62a740525282623b6cd68d12412f1a28ec5f150f5fc07b7eee2b1 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: 1. add gptq algo benchmark 2. capture a profiling run - two main issues, (a) the to_nvfp4 kernels and (b) the hessian + weight updates 3. speed up (a) significantly by using `torch.compile`, we need to ensure to enable `TORCHINDUCTOR_EMULATE_DIVISION_ROUNDING` to keep parity with eager mode. Note there are significant accuracy regressions without this set (small differences accumulate with each incremental hessian + weight update), so we force it on by default. e2e gptq run on LLaMa 3.2 1B speeds up by ~2.5x (172 seconds -> 69 seconds) with no change in accuracy: ``` time torchao/prototype/gptq/gptq_nvfp4_llama3_2_1b_nonsequential_wikitext.sh 2>&1 | tee ~/tmp/20260421_gptq_run_compile.txt // logs: https://gist.github.com/vkuzo/4ebd0bfa71185b711b19c712216eae29 ``` Test Plan: ghstack-source-id: e73e550 ghstack-comment-id: 4290904117 Pull-Request: #4310

[ghstack-poisoned]

Summary: 1. add gptq algo benchmark 2. capture a profiling run - two main issues, (a) the to_nvfp4 kernels and (b) the hessian + weight updates 3. speed up (a) significantly by using `torch.compile`, we need to ensure to enable `TORCHINDUCTOR_EMULATE_DIVISION_ROUNDING` to keep parity with eager mode. Note there are significant accuracy regressions without this set (small differences accumulate with each incremental hessian + weight update), so we force it on by default. e2e gptq run on LLaMa 3.2 1B speeds up by ~2.5x (172 seconds -> 69 seconds) with no change in accuracy: ``` time torchao/prototype/gptq/gptq_nvfp4_llama3_2_1b_nonsequential_wikitext.sh 2>&1 | tee ~/tmp/20260421_gptq_run_compile.txt // logs: https://gist.github.com/vkuzo/4ebd0bfa71185b711b19c712216eae29 ``` Test Plan: ghstack-source-id: 8b4b52c ghstack-comment-id: 4290904117 Pull-Request: #4310

[ghstack-poisoned]

vkuzo added 10 commits April 20, 2026 20:52

Update

f46445f

[ghstack-poisoned]

Update

3c92c1a

[ghstack-poisoned]

Update

b513b61

[ghstack-poisoned]

Update

a669b9e

[ghstack-poisoned]

Update

53bd8d0

[ghstack-poisoned]

Update

4c86363

[ghstack-poisoned]

Update

3cc91ed

[ghstack-poisoned]

Update

9b7dc74

[ghstack-poisoned]

Update

d69b32a

[ghstack-poisoned]

Update

294c9cc

[ghstack-poisoned]

vkuzo requested a review from jerryzh168 as a code owner April 21, 2026 18:31

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2026

This was referenced Apr 21, 2026

hook up gptq prototype to nvfp4 #4302

Merged

gptq + nvfp4 smoke test on LLaMa 3.2 1B + wikitext perplexity #4303

Merged

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 21, 2026

Update

65fae62

[ghstack-poisoned]

Update

5ee2ad2

[ghstack-poisoned]

vkuzo mentioned this pull request Apr 22, 2026

gptq example: remove transformers version check #4313

Merged

vkuzo added 3 commits April 22, 2026 11:24

Update

6463808

[ghstack-poisoned]

Update

d121bff

[ghstack-poisoned]

Update

d302888

[ghstack-poisoned]

vkuzo changed the base branch from gh/vkuzo/249/head to main April 22, 2026 11:25

vkuzo mentioned this pull request Apr 22, 2026

emulated nvfp4 support torch._grouped_mm for inference #4314

Merged

jerryzh168 approved these changes Apr 22, 2026

View reviewed changes

This was referenced Apr 22, 2026

make NVFP4Tensor handle per-expert outer scale #4315

Merged

hook up real nvfp4 grouped_gemm #4316

Merged

vkuzo merged commit b49d8cb into main Apr 23, 2026
50 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gptq benchmark, and speed up by ~3x with compile#4310

add gptq benchmark, and speed up by ~3x with compile#4310
vkuzo merged 15 commits intomainfrom
gh/vkuzo/250/head

vkuzo commented Apr 21, 2026

Uh oh!

vkuzo commented Apr 21, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Apr 21, 2026

Uh oh!

vkuzo commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4310

❗ 1 Active SEVs

❌ 1 New Failure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Apr 21, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 21, 2026 •

edited

Loading