Skip to content

Conversation

@jainapurva
Copy link
Contributor

@jainapurva jainapurva commented Oct 6, 2025

This pull request adds support for benchmarking Gemma 3 12B and 27B models with TorchAO quantization (FP8 and INT4) across latency, throughput, and serving tests on cuda devices. It also updates the workflow to ensure required dependencies of torchao and fbgemm-gpu-genai are installed when running on CUDA devices and bumps the torch version to 2.9.0 for compatibility.

@meta-cla meta-cla bot added the cla signed label Oct 6, 2025
@jainapurva jainapurva changed the title Add latency benchmarks for pytorch models Add benchmarks for pytorch models Nov 3, 2025
Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just FYI, I'm working on a fix for HPU at #100, but it will need some helps from Intel team who maintains the runner

@huydhn huydhn merged commit 81c4dc6 into main Nov 3, 2025
34 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants