-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse GPU Topographica implementation and tests (cleaned up commit version) #621
Conversation
Sparse GPU Topographica implementation and tests (cleaned up commit version)
Perfect; thanks so much! |
Fantastic! I am looking forward to making use of your work. If I could get an 8X speed up on TCAL I would be absolutely delighted - having a simulation take 6 hours instead of 2 days would really make a huge difference to me! |
Glad to see it finally merged. Would be great if you let me know the speedups you manage to achieve. |
Hi everybody 2- I have run the gcal_sparse.ty model on cpu (4 physical cores) and on gpu (GeForce 980m), and here what i get: on gpu (GeForce 980) after runing 15 by hand): topo_t000015.00_c1>>> %time topo.sim.run(10000) on CPU (4 cores ...) topo_t001015.00_c2>>> %time topo.sim.run(10000) So even though the cores are utilized at 1/4 of the capacity when in gpu mode, it seems the performance of gpu over cpu is non-existent from a time-lapse perspective. Has anybody observed this? 3- Looking at the sparce implementation of CF/Projection, it seems the signature of the functions (response_fn, learning_fn, ..) are different from CFProjection, This means one cannot easily switch between a non-sparse implementation and a sparce/gpu one. I can hack the signatures to make them callable, but that would be a horrible hack. Any suggestions? |
Hi @mjabri, sorry for taking so long to reply. What cortex density value are you using for the V1 sheets? As described in the conclusions of my thesis, there is almost no advantage in using GPU when the model has relatively low cortical density, but may run several times faster than the CPU simulation when high cortical density (like 162) is selected. Please let me know your findings and if the GPU implementation is still not faster for you under high cortical density values, we can try to investigate further. Ignotas |
Thanks Ignotas, I will run with cortex_density of 162 and share the results. |
Indeed, as Ignotas mentioned above there is a BIG difference indeed when V1 density is larger. I tried at 162 and here the results (note GPU is a GeForce 980m and CPU is a i7-4860HQ). I run twice for eacheach: CORTEX DENSITY 162.0: CPU (two separate runs): As I still cannot display projections/CFs i looked at activities of both GPU and CPU side by side, and they looked identical to my eyes. I haven't looked at the GPU implementation closely, so I don't really understand where the GPU benefits are coming from, and whether it is specific to GCAL with there are many zeros (to Philipp's point) in the artificially generated input patterns, and whether these benefits would still exist in the case of natural images. As i have problem displaying projections (the KeyError problem mentioned above). So it would be good to know whether others have this KeyError problem or whether it is specific to me!! I tried on two systems, one VM (so cannot run GPU, but still problem in CPU mode) and one PM, and they both show the same KeyError problem. If this issue can be resolved i could then spend more efforts at the Sparse/GPU API... Thanks Marwan |
The benefits are almost certainly just down to the memory bandwidth of the GPU vs CPU. For small densities or areas the CPU can probably fit a lot of each operation in the CPU cache and doesn't have to constantly wait on new chunks of memory to be transferred. In larger models the CPU is probably starved for data and spends a lot of time waiting, while the GPU can fit much larger chunks in it's memory and process them. So I don't think it has anything to do with the sparsity in this case. Actual sparsity is probably just another way the GPU can outperform the CPU implementation because unlike the CPU the GPU uses sparse arrays. I do also think that the GPU performance gains should be independent from the input patterns (although sparser patterns probably do have some effect). I'll start looking into the GPU implementation myself later this week for my own work so hopefully I'll have some news on your KeyError issue. |
Ok, BTW, the KeyError 'Afferent' I am getting is not only on gcal_sparse.ty but also on gcal.ty. |
Very odd they work fine for tiny but not for gcal. As I said I'll look into it. |
These are the changes I've been working on for my bachelor's project - a GPU-based version of Topographica that with sparse projections, making the simulations several times faster.
A detailed description of the architecture of the GPU Topographica together with design justifications, implementation issues and benchmarking results can be found on http://homepages.inf.ed.ac.uk/s1137931/thesis.pdf.
This is a branch with cleaned commit history with all of the changes done, split into 4 commits.