-
Notifications
You must be signed in to change notification settings - Fork 21
[tritonbench] Add initial tritonbench benchmark config #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| working-directory: triton-benchmarks/tritonbench | ||
| run: | | ||
| latest_result_json=$(find ./results/${TRITONBENCH_SIDE_A_ENV} -name "result.json" | sort -r | head -n 1) | ||
| python3 ./.ci/upload/scribe.py --json ${latest_result_json} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any permission that we need to setup to make Scribe upload works here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, let me know if you need to setup TRITONBENCH_SCRIBE_GRAPHQL_ACCESS_TOKEN, not sure if that can be done self-serve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will need to set up TRITONBENCH_SCRIBE_GRAPHQL_ACCESS_TOKEN as a github secret.
|
|
||
| # This mapping is needed to find out the platform of the runner | ||
| RUNNER_TO_PLATFORM_MAPPING = { | ||
| "linux.dgx.b200": "cuda", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, B200 runners are offline at the moment waiting for NVIDIA to re-image them, so it would take a few more days for them to be back to CI. The tracking issue is here pytorch/pytorch#169386
|
One note is that I don't think the CI container has the permission to run meta-pytorch/tritonbench#668. In B200 multi-tenancy setup, the workflow has already run inside a container. Nevertheless, the daemon does fix the power and clock here https://github.com/meta-pytorch/pytorch-gha-infra/blob/main/multi-tenant/services/ghad-manager/ghad-manager.py#L578-L580 before it launches the container. But this is done outside of CI |
|
Thanks for the feedback! Glad to know that the GPU is tuned already in the infra. |
We would like to utilize the Blackwell runners for TritonBench benchmarking.
It will pull the latest docker image from https://github.com/meta-pytorch/tritonbench/pkgs/container/tritonbench and run the
nightlybenchmark. We might add more TritonBench benchmarks later.TODOs:
TRITONBENCH_SCRIBE_GRAPHQL_ACCESS_TOKENto the repo, so that the metrics can be written to Scuba/Scribe