Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse GPU Topographica implementation and tests (cleaned up commit version) #621

Merged
merged 4 commits into from
Apr 23, 2015

Conversation

Tasignotas
Copy link
Contributor

These are the changes I've been working on for my bachelor's project - a GPU-based version of Topographica that with sparse projections, making the simulations several times faster.

A detailed description of the architecture of the GPU Topographica together with design justifications, implementation issues and benchmarking results can be found on http://homepages.inf.ed.ac.uk/s1137931/thesis.pdf.

This is a branch with cleaned commit history with all of the changes done, split into 4 commits.

jbednar pushed a commit that referenced this pull request Apr 23, 2015
Sparse GPU Topographica implementation and tests (cleaned up commit version)
@jbednar jbednar merged commit d77ace6 into ioam:master Apr 23, 2015
@jbednar
Copy link
Member

jbednar commented Apr 23, 2015

Perfect; thanks so much!

@jlstevens
Copy link
Member

Fantastic!

I am looking forward to making use of your work. If I could get an 8X speed up on TCAL I would be absolutely delighted - having a simulation take 6 hours instead of 2 days would really make a huge difference to me!

@Tasignotas
Copy link
Contributor Author

Glad to see it finally merged. Would be great if you let me know the speedups you manage to achieve.

@mjabri
Copy link
Contributor

mjabri commented Sep 8, 2015

Hi everybody
I am looking at the sparse/gpu implementation and manage to run some tests,though with the following caveats:
1- the scikits.edu.cusparse seems to have changed and some functions are missing (the version that installs with pip). So i found another implementation (https://github.com/grlee77/python-cuda-cffi) which seems to work. But the test results are not that encouraging.

2- I have run the gcal_sparse.ty model on cpu (4 physical cores) and on gpu (GeForce 980m), and here what i get:

on gpu (GeForce 980) after runing 15 by hand):

topo_t000015.00_c1>>> %time topo.sim.run(10000)
CPU times: user 7min 33s, sys: 726 ms, total: 7min 34s
Wall time: 2min 32s

on CPU (4 cores ...)

topo_t001015.00_c2>>> %time topo.sim.run(10000)
CPU times: user 21min 51s, sys: 778 ms, total: 21min 52s
Wall time: 2min 46s

So even though the cores are utilized at 1/4 of the capacity when in gpu mode, it seems the performance of gpu over cpu is non-existent from a time-lapse perspective. Has anybody observed this?

3- Looking at the sparce implementation of CF/Projection, it seems the signature of the functions (response_fn, learning_fn, ..) are different from CFProjection, This means one cannot easily switch between a non-sparse implementation and a sparce/gpu one. I can hack the signatures to make them callable, but that would be a horrible hack. Any suggestions?

@Tasignotas
Copy link
Contributor Author

Hi @mjabri,

sorry for taking so long to reply. What cortex density value are you using for the V1 sheets? As described in the conclusions of my thesis, there is almost no advantage in using GPU when the model has relatively low cortical density, but may run several times faster than the CPU simulation when high cortical density (like 162) is selected.

Please let me know your findings and if the GPU implementation is still not faster for you under high cortical density values, we can try to investigate further.

Ignotas

@mjabri
Copy link
Contributor

mjabri commented Sep 13, 2015

Thanks Ignotas, I will run with cortex_density of 162 and share the results.

@mjabri
Copy link
Contributor

mjabri commented Sep 14, 2015

Indeed, as Ignotas mentioned above there is a BIG difference indeed when V1 density is larger. I tried at 162 and here the results (note GPU is a GeForce 980m and CPU is a i7-4860HQ). I run twice for eacheach:

CORTEX DENSITY 162.0:
GPU (two separate runs):
topo_t000010.00_c4>>> %time topo.sim.run(10000)
CPU times: user 28min 44s, sys: 3min 36s, total: 32min 21s
Wall time: 26min 57s
topo_t000010.00_c3>>> %time topo.sim.run(10000)
CPU times: user 29min 6s, sys: 3min 15s, total: 32min 21s
Wall time: 26min 58s

CPU (two separate runs):
topo_t000010.00_c4>>> %time topo.sim.run(10000)
CPU times: user 23h 5min 30s, sys: 9.67 s, total: 23h 5min 40s
Wall time: 2h 53min 27s
topo_t000001.00_c1>>> %time topo.sim.run(10000)
CPU times: user 23h 9min 58s, sys: 9.58 s, total: 23h 10min 7s
Wall time: 2h 53min 59s

As I still cannot display projections/CFs i looked at activities of both GPU and CPU side by side, and they looked identical to my eyes.

I haven't looked at the GPU implementation closely, so I don't really understand where the GPU benefits are coming from, and whether it is specific to GCAL with there are many zeros (to Philipp's point) in the artificially generated input patterns, and whether these benefits would still exist in the case of natural images.

As i have problem displaying projections (the KeyError problem mentioned above). So it would be good to know whether others have this KeyError problem or whether it is specific to me!! I tried on two systems, one VM (so cannot run GPU, but still problem in CPU mode) and one PM, and they both show the same KeyError problem. If this issue can be resolved i could then spend more efforts at the Sparse/GPU API...

Thanks

Marwan

@philippjfr
Copy link
Member

The benefits are almost certainly just down to the memory bandwidth of the GPU vs CPU. For small densities or areas the CPU can probably fit a lot of each operation in the CPU cache and doesn't have to constantly wait on new chunks of memory to be transferred. In larger models the CPU is probably starved for data and spends a lot of time waiting, while the GPU can fit much larger chunks in it's memory and process them. So I don't think it has anything to do with the sparsity in this case.

Actual sparsity is probably just another way the GPU can outperform the CPU implementation because unlike the CPU the GPU uses sparse arrays. I do also think that the GPU performance gains should be independent from the input patterns (although sparser patterns probably do have some effect).

I'll start looking into the GPU implementation myself later this week for my own work so hopefully I'll have some news on your KeyError issue.

@mjabri
Copy link
Contributor

mjabri commented Sep 14, 2015

Ok, BTW, the KeyError 'Afferent' I am getting is not only on gcal_sparse.ty but also on gcal.ty.
Also, tiny.ty seems to show projections ok.

@philippjfr
Copy link
Member

Very odd they work fine for tiny but not for gcal. As I said I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants