-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable performance #31
Comments
This is probably the exact same asynchronous/synchronous issue that explains #30---in other words, it's an artifact of how you're testing it. |
Indeed, adding
Is it something expected too? |
Yes. (Not necessarily the precise numerical factor, but the general slowdown.) You're inhibiting the GPU from getting started on the next job as resources get freed from the last one. |
That makes sense. Thank you! |
Disclaimer: I used matrix multiplication from CUBLAS.jl as an example operations since CUDArt.jl doesn't provide anything like that, so results may be biased because of it. Anyway, I'll be glad to see any pointers.
With random
CudaArray
and identity matrix like this:I do several performance tests like this:
If you are not familiar with BLAS (or just don't like cryptic names), this code multiplies
d_A
by identity matrixd_Im
and puts the result tod_A
again. When I run same test on CPU, I always get very similar, consistent results. But on GPU benchmarks give totally different results:So first call is really fast, but all subsequent calls take ~200x longer. After you wait some time (say, 10 seconds), multiplication becomes fast again, but also for one single test and then drops again.
Is it expected behavior? Do I use
CudaArray
s correctly at all?The text was updated successfully, but these errors were encountered: