Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cuda] Switch cuda2 on and cuda1 off by default #16107

Merged
merged 3 commits into from
Jan 23, 2024

Conversation

antiagainst
Copy link
Contributor

@antiagainst antiagainst commented Jan 12, 2024

This commit switches the cuda2 HAL driver on and
the cuda HAL driver (which is renamed to cuda1) off
by default in CMake. In order to do this, we also
switched cuda2 to use stream-based command buffer
by default to follow cuda1 for simple transition.

Fixes #13245

benchmark-extra: cuda-large

@antiagainst antiagainst added the hal/cuda Runtime CUDA HAL backend label Jan 12, 2024
@antiagainst antiagainst force-pushed the cuda2-try-switch branch 2 times, most recently from 8ab514e to 84396e3 Compare January 12, 2024 05:58
@antiagainst antiagainst added benchmarks:cuda Run default CUDA benchmarks and removed benchmarks:cuda Run default CUDA benchmarks labels Jan 12, 2024
@antiagainst antiagainst added the benchmarks:cuda Run default CUDA benchmarks label Jan 17, 2024
Copy link

github-actions bot commented Jan 17, 2024

Abbreviated Benchmark Summary

@ commit e0e4e48a08cae52b6c7492e1bde31cf9a8eb6dd0 (vs. base 13dad384f9c0645cbc86eb735f486bff99084082)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.681 (vs. 1.530, 9.87%↑) 1.679 0.011
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.026 (vs. 0.025, 5.54%↑) 0.026 0.000

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

@antiagainst antiagainst force-pushed the cuda2-try-switch branch 3 times, most recently from a782a67 to 0d5f442 Compare January 23, 2024 16:36
This commit changes the benchmark capture steps to start the
capture process first so that we can reduce the number of
benchmark repetitions during capture to reduce capture size.
@antiagainst antiagainst changed the title [cuda] Try to switch cuda2 on as the default [cuda] Switch cuda2 on as the default Jan 23, 2024
@antiagainst antiagainst changed the title [cuda] Switch cuda2 on as the default [cuda] Switch cuda2 on and cuda off by default Jan 23, 2024
@antiagainst antiagainst changed the title [cuda] Switch cuda2 on and cuda off by default [cuda] Switch cuda2 on and cuda1 off by default Jan 23, 2024
@antiagainst antiagainst marked this pull request as ready for review January 23, 2024 19:16
@antiagainst
Copy link
Contributor Author

Okay this is good to go now. Only two benchmarks regressed slightly; I won't bother with it too much there.

Copy link
Collaborator

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for staging this work into separable PRs! Next steps are to remove cuda1 and drop the '2' from cuda2 names?

@antiagainst
Copy link
Contributor Author

Thanks for staging this work into separable PRs! Next steps are to remove cuda1 and drop the '2' from cuda2 names?

Yup exactly.

@antiagainst antiagainst merged commit 3b3cef9 into iree-org:main Jan 23, 2024
58 checks passed
@antiagainst antiagainst deleted the cuda2-try-switch branch January 23, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:cuda Run default CUDA benchmarks hal/cuda Runtime CUDA HAL backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Epic] CUDA HAL driver rewrite for production
3 participants