CUDA Benchmarking #7612

gmarkall · 2021-12-02T14:16:10Z

There is presently no benchmark suite for Numba’s CUDA target, and there is a gap between Numba’s performance and the maximum achievable. To support performance optimization efforts, a benchmark suite is needed that:

Runs regularly (with CI, perhaps before merge)
Measures compilation time - this has been drifting upwards with each release.
Measures kernel launch time - this is known to be slow in comparison to CUDA C/C++.
Contains a set of micro-benchmarks to spot existing opportunities for optimization and avoid regressions.
Benchmarks common real-world workloads (e.g. ETL operations, custom filters, etc.)

quasiben · 2021-12-02T14:19:20Z

@pentschev and myself have built infrastructure similar to what is being asked for here:

These were designed to run nightly and push a public GH issue. We liked this model because it's public and relatively low noise with high impact for noticing regressions for a once a day viewing

gmarkall · 2021-12-02T14:23:48Z

A useful benchmark for kernel launch time is here: #3003 (comment)

pentschev · 2021-12-02T14:41:58Z

To add to @quasiben 's comment, the thing that can't be done is running before merging as it would need access to the repo, which we don't do today for UCX-Py. For that maybe we could check whether we have the resources for that in gpuCI, similar to what has been done in Dask, what do you think @quasiben ?

gmarkall · 2021-12-02T15:33:53Z

The benchmark in the following comment could probably be used with tweaking for general measurement, and comparison with CuPy's JIT: #4647 (comment)

gmarkall · 2021-12-02T15:34:55Z

the thing that can't be done is running before merging as it would need access to the repo

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

pentschev · 2021-12-02T15:38:55Z

Why wouldn't the repo be accessible? I'm guessing I'm missing some understanding here?

Sorry, I didn't mean it can't be done, but rather that you would need specific permissions from the GH API/GH Actions to query each new open PR/run tests on it, like gpucibot has for all RAPIDS projects. The infrastructure mentioned in #7612 (comment) has no special rights to any repos, so it won't do any of those things today.

gmarkall · 2021-12-02T15:55:14Z

The infrastructure mentioned in #7612 (comment) has no special rights to any repos, so it won't do any of those things today.

Ah, I see - many thanks for the clarification!

gmarkall added CUDA CUDA related issue/PR performance performance related issue labels Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Benchmarking #7612

CUDA Benchmarking #7612

gmarkall commented Dec 2, 2021

quasiben commented Dec 2, 2021

gmarkall commented Dec 2, 2021

pentschev commented Dec 2, 2021

gmarkall commented Dec 2, 2021

gmarkall commented Dec 2, 2021

pentschev commented Dec 2, 2021

gmarkall commented Dec 2, 2021

CUDA Benchmarking #7612

CUDA Benchmarking #7612

Comments

gmarkall commented Dec 2, 2021

quasiben commented Dec 2, 2021

gmarkall commented Dec 2, 2021

pentschev commented Dec 2, 2021

gmarkall commented Dec 2, 2021

gmarkall commented Dec 2, 2021

pentschev commented Dec 2, 2021

gmarkall commented Dec 2, 2021