Skip to content

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Sep 11, 2025

The main change is to add DeepSeek-V3 and DeepSeek-R1 on B200. To achieve this, I introduce a PLATFORM_SKIPS configuration to skip these models on A100 and H100. I'm also using this new configuration to shift Llama4 and bigger Gemma3 and Qwen3 to B200 while keeping Llama3 and smaller variants of these twos on A100/H100 (more coverage I guess).

For reference: configs for internal jobs for these models P1930074197

Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a review from BoyuanFeng September 11, 2025 01:54
@meta-cla meta-cla bot added the cla signed label Sep 11, 2025
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn requested a deployment to pytorch-x-vllm September 11, 2025 01:57 — with GitHub Actions In progress
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor Author

huydhn commented Sep 11, 2025

The benchmark surfaces this recent error https://github.com/pytorch/pytorch-integration-testing/actions/runs/17631888148/job/50100699186#step:16:5520 from vLLM vllm-project/vllm#23582 (only on B200 I think). So, there are some missing metrics like latency or token/s for now. There is probably no action needed on the CI.

Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn huydhn merged commit 963053c into main Sep 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants