-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel execution, Multithreading #255
Comments
Thanks for starting a tracker issue for this subject! A computer has a fixed number of threads and so it does not make sense to parallelize everything at every level. We should focus on 1 or 2 high-impact areas. |
I agree, and NetworkAPI Regions seem to be a major candidate, mostly because there is very little interaction between them except for the data buffers. |
True, I was considering this too. I think this will be disadvantage (not to paralellize too much) of the low-level #214 approach. For the latter 2, I'd suggest sth as:
So if you have large network, you'll get best utilization by giving all threads to the simultanously running regions. But in the case like MNIST example (my motivation), we are 1 algo/region (SP), so one could try to give threads to that. Now, giving it a priority would make the process seamless:
definitely, it would probably be also first thing we will achieve |
All of the algorithms are basically compute bound. This means that priorities will not get things done faster. But spreading the work over multiple threads would help. My machine has 6 cores (12 physical threads) and they could all be put to work. But even there the best I could expect is around 10 times as fast. Ultimately the biggest advantage would be to reduce the amount of work that needs to be done. Replace loops with other ways of organizing data so loops are not needed. Reduce the number of times that data must be copied. |
Yes, but even linear in #threads would be nice win.
yep, that goes in parallel (sic) with this, #3 . It would be nice if we can vectorize the for loops, it was my intention with SDR_t, but I'm not sure SDR is helping that (at least it prepares as for sparse data structures) |
hmmm, 'vectorizing' the for loops does not get rid of the loops. They are just harder to find. What I mean is using things like map and set objects in places where it makes since to avoid having to iterate. Go from O(n) to O(log n) types of things. But I don't know, it might require a whole new way of looking at the algorithms to make that work. |
Aim: run algorithms in parallel as much as possible, with maximal efficiency.
This will be an umbrella issue based on following idea:
from #214 (comment)
EDIT:
TODO:
-DNUMTHREADS=8
The text was updated successfully, but these errors were encountered: