Skip to content

Conversation

@xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Nov 25, 2025

We would like to utilize the Blackwell runners for TritonBench benchmarking.

It will pull the latest docker image from https://github.com/meta-pytorch/tritonbench/pkgs/container/tritonbench and run the nightly benchmark. We might add more TritonBench benchmarks later.

TODOs:

@meta-cla meta-cla bot added the cla signed label Nov 25, 2025
working-directory: triton-benchmarks/tritonbench
run: |
latest_result_json=$(find ./results/${TRITONBENCH_SIDE_A_ENV} -name "result.json" | sort -r | head -n 1)
python3 ./.ci/upload/scribe.py --json ${latest_result_json}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any permission that we need to setup to make Scribe upload works here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, let me know if you need to setup TRITONBENCH_SCRIBE_GRAPHQL_ACCESS_TOKEN, not sure if that can be done self-serve

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will need to set up TRITONBENCH_SCRIBE_GRAPHQL_ACCESS_TOKEN as a github secret.


# This mapping is needed to find out the platform of the runner
RUNNER_TO_PLATFORM_MAPPING = {
"linux.dgx.b200": "cuda",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, B200 runners are offline at the moment waiting for NVIDIA to re-image them, so it would take a few more days for them to be back to CI. The tracking issue is here pytorch/pytorch#169386

@huydhn
Copy link
Contributor

huydhn commented Dec 2, 2025

One note is that I don't think the CI container has the permission to run meta-pytorch/tritonbench#668. In B200 multi-tenancy setup, the workflow has already run inside a container. Nevertheless, the daemon does fix the power and clock here https://github.com/meta-pytorch/pytorch-gha-infra/blob/main/multi-tenant/services/ghad-manager/ghad-manager.py#L578-L580 before it launches the container. But this is done outside of CI

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Dec 2, 2025

Thanks for the feedback! Glad to know that the GPU is tuned already in the infra.

@xuzhao9 xuzhao9 merged commit 9709270 into pytorch:main Dec 3, 2025
1 check passed
@xuzhao9 xuzhao9 deleted the xz9/tritonbench-b200 branch December 3, 2025 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants