-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated vis_gpu API to match vis_cpu #50
Conversation
This is currently failing on the single-precision test on GPU. For some reason it's saying
This would seem to indicate that it requires more threads/shared memory/registers than available. But it works perfectly fine for double precision, which makes no sense to me. Anyone with CUDA knowledge, please step in! Maybe @AaronParsons could be of help here... Other than that, everything seems to be working fine, and is tested at 100% coverage. To get coverage on GPU I set up a self-hosted runner on my own laptop (clearly not sustainable, but I'll transfer this over to our ASU cluster when I can). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty major changes here! Looks like a lot of improvements have been made, which look good to me overall. There are some API breakages that should be documented in the release notes, and I can't really give useful comments on the CUDA code. Only a few minor comments to address, mostly about documentation, plus getting the tests to pass.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think this can be merged (with a version bump) once the tests are passing again.
This PR is quite large, and should generate a new major version.
Fixes #42
Fixes #14
Changes
It removes the following:
It adds the following:
viscpu profile
that can run a simulation with arbitrary numbers of freqs, times, sources, antennas and beams and outputs line profiling informationIt improves the following:
Performance Measurements
GPU vs. original (einsum) CPU
CPU: einsum vs matprod (new)
CPU: performance vs. Nthreads
Caveats / Things to do
viscpu profile
script works really well for line profiling of the CPU, but for GPU the whole thing needs to be run inside ofnvprof
which makes it slightly non-uniform. This could be improved.viscpu profile
always runs with a particular coarse beam, which is not really representative of real problems. We should check the effect of this.We may not need to do all of the above in this PR, but I thought I'd lay it out for discussion.
*