Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Tasignotas · 2015-04-23T17:47:08Z

These are the changes I've been working on for my bachelor's project - a GPU-based version of Topographica that with sparse projections, making the simulations several times faster.

A detailed description of the architecture of the GPU Topographica together with design justifications, implementation issues and benchmarking results can be found on http://homepages.inf.ed.ac.uk/s1137931/thesis.pdf.

This is a branch with cleaned commit history with all of the changes done, split into 4 commits.

…to use submodel-based approach

Sparse GPU Topographica implementation and tests (cleaned up commit version)

jbednar · 2015-04-23T17:57:50Z

Perfect; thanks so much!

jlstevens · 2015-04-23T18:55:54Z

Fantastic!

I am looking forward to making use of your work. If I could get an 8X speed up on TCAL I would be absolutely delighted - having a simulation take 6 hours instead of 2 days would really make a huge difference to me!

Tasignotas · 2015-04-23T18:59:01Z

Glad to see it finally merged. Would be great if you let me know the speedups you manage to achieve.

mjabri · 2015-09-08T05:33:26Z

Hi everybody
I am looking at the sparse/gpu implementation and manage to run some tests,though with the following caveats:
1- the scikits.edu.cusparse seems to have changed and some functions are missing (the version that installs with pip). So i found another implementation (https://github.com/grlee77/python-cuda-cffi) which seems to work. But the test results are not that encouraging.

2- I have run the gcal_sparse.ty model on cpu (4 physical cores) and on gpu (GeForce 980m), and here what i get:

on gpu (GeForce 980) after runing 15 by hand):

topo_t000015.00_c1>>> %time topo.sim.run(10000)
CPU times: user 7min 33s, sys: 726 ms, total: 7min 34s
Wall time: 2min 32s

on CPU (4 cores ...)

topo_t001015.00_c2>>> %time topo.sim.run(10000)
CPU times: user 21min 51s, sys: 778 ms, total: 21min 52s
Wall time: 2min 46s

So even though the cores are utilized at 1/4 of the capacity when in gpu mode, it seems the performance of gpu over cpu is non-existent from a time-lapse perspective. Has anybody observed this?

3- Looking at the sparce implementation of CF/Projection, it seems the signature of the functions (response_fn, learning_fn, ..) are different from CFProjection, This means one cannot easily switch between a non-sparse implementation and a sparce/gpu one. I can hack the signatures to make them callable, but that would be a horrible hack. Any suggestions?

Tasignotas · 2015-09-13T15:00:52Z

Hi @mjabri,

sorry for taking so long to reply. What cortex density value are you using for the V1 sheets? As described in the conclusions of my thesis, there is almost no advantage in using GPU when the model has relatively low cortical density, but may run several times faster than the CPU simulation when high cortical density (like 162) is selected.

Please let me know your findings and if the GPU implementation is still not faster for you under high cortical density values, we can try to investigate further.

Ignotas

mjabri · 2015-09-13T21:01:15Z

Thanks Ignotas, I will run with cortex_density of 162 and share the results.

mjabri · 2015-09-14T01:39:59Z

Indeed, as Ignotas mentioned above there is a BIG difference indeed when V1 density is larger. I tried at 162 and here the results (note GPU is a GeForce 980m and CPU is a i7-4860HQ). I run twice for eacheach:

CORTEX DENSITY 162.0:
GPU (two separate runs):
topo_t000010.00_c4>>> %time topo.sim.run(10000)
CPU times: user 28min 44s, sys: 3min 36s, total: 32min 21s
Wall time: 26min 57s
topo_t000010.00_c3>>> %time topo.sim.run(10000)
CPU times: user 29min 6s, sys: 3min 15s, total: 32min 21s
Wall time: 26min 58s

CPU (two separate runs):
topo_t000010.00_c4>>> %time topo.sim.run(10000)
CPU times: user 23h 5min 30s, sys: 9.67 s, total: 23h 5min 40s
Wall time: 2h 53min 27s
topo_t000001.00_c1>>> %time topo.sim.run(10000)
CPU times: user 23h 9min 58s, sys: 9.58 s, total: 23h 10min 7s
Wall time: 2h 53min 59s

As I still cannot display projections/CFs i looked at activities of both GPU and CPU side by side, and they looked identical to my eyes.

I haven't looked at the GPU implementation closely, so I don't really understand where the GPU benefits are coming from, and whether it is specific to GCAL with there are many zeros (to Philipp's point) in the artificially generated input patterns, and whether these benefits would still exist in the case of natural images.

As i have problem displaying projections (the KeyError problem mentioned above). So it would be good to know whether others have this KeyError problem or whether it is specific to me!! I tried on two systems, one VM (so cannot run GPU, but still problem in CPU mode) and one PM, and they both show the same KeyError problem. If this issue can be resolved i could then spend more efforts at the Sparse/GPU API...

Thanks

Marwan

philippjfr · 2015-09-14T11:59:49Z

The benefits are almost certainly just down to the memory bandwidth of the GPU vs CPU. For small densities or areas the CPU can probably fit a lot of each operation in the CPU cache and doesn't have to constantly wait on new chunks of memory to be transferred. In larger models the CPU is probably starved for data and spends a lot of time waiting, while the GPU can fit much larger chunks in it's memory and process them. So I don't think it has anything to do with the sparsity in this case.

Actual sparsity is probably just another way the GPU can outperform the CPU implementation because unlike the CPU the GPU uses sparse arrays. I do also think that the GPU performance gains should be independent from the input patterns (although sparser patterns probably do have some effect).

I'll start looking into the GPU implementation myself later this week for my own work so hopefully I'll have some news on your KeyError issue.

mjabri · 2015-09-14T15:07:24Z

Ok, BTW, the KeyError 'Afferent' I am getting is not only on gcal_sparse.ty but also on gcal.ty.
Also, tiny.ty seems to show projections ok.

philippjfr · 2015-09-14T15:42:38Z

Very odd they work fine for tiny but not for gcal. As I said I'll look into it.

Tasignotas added 4 commits April 23, 2015 18:27

Added the core GPU code

98b08b4

Added tests for GPU implementation and reverted the old sparse tests …

49253a3

…to use submodel-based approach

Added files for profiling and benchmarking GPU Topographica

ac2bf3c

Added the test data for GPU Topographica

ce563c1

Tasignotas mentioned this pull request Apr 23, 2015

Sparse GPU Topographica implementation and tests #619

Closed

jbednar pushed a commit that referenced this pull request Apr 23, 2015

Merge pull request #621 from Tasignotas/sparse_gpu_clean

d77ace6

Sparse GPU Topographica implementation and tests (cleaned up commit version)

jbednar merged commit d77ace6 into ioam:master Apr 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Tasignotas commented Apr 23, 2015

jbednar commented Apr 23, 2015

jlstevens commented Apr 23, 2015

Tasignotas commented Apr 23, 2015

mjabri commented Sep 8, 2015

Tasignotas commented Sep 13, 2015

mjabri commented Sep 13, 2015

mjabri commented Sep 14, 2015

philippjfr commented Sep 14, 2015

mjabri commented Sep 14, 2015

philippjfr commented Sep 14, 2015

Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Conversation

Tasignotas commented Apr 23, 2015

jbednar commented Apr 23, 2015

jlstevens commented Apr 23, 2015

Tasignotas commented Apr 23, 2015

mjabri commented Sep 8, 2015

Tasignotas commented Sep 13, 2015

mjabri commented Sep 13, 2015

mjabri commented Sep 14, 2015

philippjfr commented Sep 14, 2015

mjabri commented Sep 14, 2015

philippjfr commented Sep 14, 2015