Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Benchmarking #7612

Open
5 tasks
gmarkall opened this issue Dec 2, 2021 · 7 comments
Open
5 tasks

CUDA Benchmarking #7612

gmarkall opened this issue Dec 2, 2021 · 7 comments
Labels
CUDA CUDA related issue/PR performance performance related issue

Comments

@gmarkall
Copy link
Member

gmarkall commented Dec 2, 2021

There is presently no benchmark suite for Numba’s CUDA target, and there is a gap between Numba’s performance and the maximum achievable. To support performance optimization efforts, a benchmark suite is needed that:

  • Runs regularly (with CI, perhaps before merge)
  • Measures compilation time - this has been drifting upwards with each release.
  • Measures kernel launch time - this is known to be slow in comparison to CUDA C/C++.
  • Contains a set of micro-benchmarks to spot existing opportunities for optimization and avoid regressions.
  • Benchmarks common real-world workloads (e.g. ETL operations, custom filters, etc.)
@gmarkall gmarkall added CUDA CUDA related issue/PR performance performance related issue labels Dec 2, 2021
@quasiben
Copy link
Contributor

quasiben commented Dec 2, 2021

@pentschev and myself have built infrastructure similar to what is being asked for here:

These were designed to run nightly and push a public GH issue. We liked this model because it's public and relatively low noise with high impact for noticing regressions for a once a day viewing

@gmarkall
Copy link
Member Author

gmarkall commented Dec 2, 2021

A useful benchmark for kernel launch time is here: #3003 (comment)

@pentschev
Copy link
Contributor

To add to @quasiben 's comment, the thing that can't be done is running before merging as it would need access to the repo, which we don't do today for UCX-Py. For that maybe we could check whether we have the resources for that in gpuCI, similar to what has been done in Dask, what do you think @quasiben ?

@gmarkall
Copy link
Member Author

gmarkall commented Dec 2, 2021

The benchmark in the following comment could probably be used with tweaking for general measurement, and comparison with CuPy's JIT: #4647 (comment)

@gmarkall
Copy link
Member Author

gmarkall commented Dec 2, 2021

the thing that can't be done is running before merging as it would need access to the repo

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

@pentschev
Copy link
Contributor

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

Sorry, I didn't mean it can't be done, but rather that you would need specific permissions from the GH API/GH Actions to query each new open PR/run tests on it, like gpucibot has for all RAPIDS projects. The infrastructure mentioned in #7612 (comment) has no special rights to any repos, so it won't do any of those things today.

@gmarkall
Copy link
Member Author

gmarkall commented Dec 2, 2021

The infrastructure mentioned in #7612 (comment) has no special rights to any repos, so it won't do any of those things today.

Ah, I see - many thanks for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA CUDA related issue/PR performance performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants