Updated vis_gpu API to match vis_cpu #50

steven-murray · 2022-07-26T20:20:36Z

This PR is quite large, and should generate a new major version.

Fixes #42
Fixes #14

Changes

It removes the following:

Ability to use an (l,m) grid for beam interpolation, in both CPU and GPU versions
Ability to pass anything except UVBeam or AnalyticBeam objects in the beam list.

It adds the following:

GPU version now performs beam interpolation on a regular theta/phi grid (but can also use direct function evaluation if it's an AnalyticBeam)
Beam polarization now fully supported in GPU.
A new CLI script viscpu profile that can run a simulation with arbitrary numbers of freqs, times, sources, antennas and beams and outputs line profiling information
Updated tests for both CPU and GPU, comparing both against each other and pyuvsim.

It improves the following:

CPU version made ~10x faster by switching to a matrix-product instead of an einsum.

Performance Measurements

GPU vs. original (einsum) CPU

CPU: einsum vs matprod (new)

CPU: performance vs. Nthreads

Caveats / Things to do

GPU version has not yet been tested vs nthreads/maximum memory etc.
CPU version has no "chunking" which would allow it to run on smaller-memory allocations. Since it seems that increasing Ncores doesn't increase performance by much, it would seem that the best way to break up the simulation would be by time and frequency, running a single time and frequency per-core. However, since that memory wouldn't be shared, we'd need a way to be able to run with a low per-core memory.
Dominant component of the CPU version is the element-wise product and exponentiation of the fringe. There may be further speedups possible here to do with memory layout. Possibly also we could get speedup by threading if we tried hard (which would reduce the need for memory chunking).
The viscpu profile script works really well for line profiling of the CPU, but for GPU the whole thing needs to be run inside of nvprof which makes it slightly non-uniform. This could be improved.
viscpu profile always runs with a particular coarse beam, which is not really representative of real problems. We should check the effect of this.

We may not need to do all of the above in this PR, but I thought I'd lay it out for discussion.
*

…profile

steven-murray · 2022-07-28T18:03:47Z

This is currently failing on the single-precision test on GPU. For some reason it's saying

pycuda._driver.LaunchError: cuLaunchKernel failed: too many resources requested for launch

This would seem to indicate that it requires more threads/shared memory/registers than available. But it works perfectly fine for double precision, which makes no sense to me. Anyone with CUDA knowledge, please step in! Maybe @AaronParsons could be of help here...

Other than that, everything seems to be working fine, and is tested at 100% coverage. To get coverage on GPU I set up a self-hosted runner on my own laptop (clearly not sustainable, but I'll transfer this over to our ASU cluster when I can).

philbull

Pretty major changes here! Looks like a lot of improvements have been made, which look good to me overall. There are some API breakages that should be documented in the release notes, and I can't really give useful comments on the CUDA code. Only a few minor comments to address, mostly about documentation, plus getting the tests to pass.

src/vis_cpu/cli.py

src/vis_cpu/conversions.py

src/vis_cpu/gpu_src/beam_interpolation.cu

src/vis_cpu/vis_cpu.py

src/vis_cpu/vis_gpu.py

src/vis_cpu/wrapper.py

review-notebook-app · 2022-08-04T16:31:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

philbull

Looks good to me. I think this can be merged (with a version bump) once the tests are passing again.

steven-murray added 30 commits June 28, 2022 21:52

feat: some initial cleanup of interpolation

afc77cf

working basic GPU without interpolation

8be82bc

style: run pre-commit

a4df248

fix: clean up remaining usage of bm_cube

c9afeb4

fix: all tests now pass again, including gpu

7f39c93

feat: add nice profiling script

c256cea

test: all GPU tests pass, including beam inteprolation

85f62e3

fix: some fixes to profiling script

12096eb

feat: allow writing to different output dir in profile script

6b8a9d6

feat: install vc-profile automatically

ab8087c

add hashbang to start of executable script

5ddfac9

fix: don't require matplotlib just to run the simulator

d415572

feat: ability to print out full line-profile stats to screen from vc-…

602f8e4

…profile

feat: nicer printing in vc-profile

9e464d6

fix: ensure the first source is above the horizon

9b788b1

fix: move HERA beam data to package itself

7666c77

fix: restore the correction version of the beam

41b1fb6

fix: only import vis_gpu if cuda is installed

cb995ef

fix: don't import gpu in wrapper by default

0c1be49

fix vc-profile so that gpu without GPU doesn't fail

dd08d53

fix: vc-profile error in nsources

b890d37

perf: try .get instead of .get_async

a43aaca

feat: add log-level option to vc-profile

dabdd29

fix: use logger.debug instead of logging.debug

93fb035

fix: do basic config on logger

29063d5

fix: do basic config on logger

7e33349

fix: maybe fix GPU profiler

a0ac8a0

perf: use streams in GPU

6e0e81b

feat: ability to set gpu threads, chunks, maxmem

29a4312

feat: ability to set gpu threads in vc-profile

14d7c42

steven-murray added 5 commits July 27, 2022 16:48

test: remove single-precision tests for now

16ea829

test: leave precision parameter in

61542b2

test: add tests to cover corner cases

62d2339

test: add more tests of corner cases

961f81d

test: add test of single precision

d808ffd

philbull requested changes Aug 3, 2022

View reviewed changes

steven-murray added 10 commits August 3, 2022 06:48

fix: use typing_extensions if required

f2bf97c

docs: update changelog

b2c6dd2

Merge branch 'clean-gpu' of github.com:HERA-Team/vis_cpu into clean-gpu

e7e4486

test: just checking ci

e70ca56

ci: use updated git on GPU actions

0631375

fix: fix issues with casting double to float for ang_freq

39aa446

fix: broken transpose

059ea70

test: get coverage to 100%

c007ed0

docs: add algorithm docs

a978d89

docs: update vis_cpu tutorial

b588102

steven-murray added 5 commits August 4, 2022 09:38

ci: fix run_notebooks

ec65769

test: get coverage to 100%

7cfe24f

ci: add notebook-env.yml

c537ea1

ci: try fixing notebook runner

2c8f74c

docs: update module reference docs

b89531c

philbull approved these changes Aug 11, 2022

View reviewed changes

steven-murray added 3 commits August 11, 2022 06:59

fix: import error on jinja2

ab61bc3

fix: can't define Template without jinja

19f8310

docs: update changelog which somehow didn't update

7e749d6

steven-murray merged commit c741675 into main Aug 11, 2022

steven-murray deleted the clean-gpu branch August 11, 2022 18:06

steven-murray mentioned this pull request Aug 11, 2022

Port over tests from hera_sim #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated vis_gpu API to match vis_cpu #50

Updated vis_gpu API to match vis_cpu #50

steven-murray commented Jul 26, 2022 •

edited

steven-murray commented Jul 28, 2022

philbull left a comment

review-notebook-app bot commented Aug 4, 2022

philbull left a comment

Updated vis_gpu API to match vis_cpu #50

Updated vis_gpu API to match vis_cpu #50

Conversation

steven-murray commented Jul 26, 2022 • edited

Changes

Performance Measurements

GPU vs. original (einsum) CPU

CPU: einsum vs matprod (new)

CPU: performance vs. Nthreads

Caveats / Things to do

steven-murray commented Jul 28, 2022

philbull left a comment

Choose a reason for hiding this comment

review-notebook-app bot commented Aug 4, 2022

philbull left a comment

Choose a reason for hiding this comment

steven-murray commented Jul 26, 2022 •

edited