-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading for gradient descent not working #43
Comments
To be more precise:
We don't really know that. We are looking at the For comparison, during the Annoy loop, all cores are working with 100%. |
Another update: if we set perplexity a lot higher (e.g. perplexity=500), then all cores are working at 30-40%, so it becomes clear that the process is actually multithreaded... PS. In all these cases the sample size is moderately small n=11k. |
@dkobak @msayhan With n=11k, the bottleneck in the gradient descent is the FFT, which is not multithreaded (multithreading the FFT does not give much speed up for the typical number interpolation points and boxes we use, and requires the users to compile FFTW with flags, complicating the install). So, it's going to be hard to really see the speed-up on the non-FFT parts, or even to catch the split second all cores are working before all the threads finish and we are back in the FFT again. Increasing perplexity was a great way to check it though. But perhaps even simpler is to just try it with a large N. In that case, the bottleneck becomes the attractive forces, and that should parallelize nicely. You still might not catch all cores at 100% (because each iteration is so fast and they have to go to 0% between iterations), but you should definitely see some multicore action happening. I typically just use |
This makes sense. In fact, I think @msayhan was observing CPU usage less than 100% in Thanks for the explanations! I close this now. |
A colleague of mine @msayhan downloaded the current master and is running t-SNE on our server under Python (24 cores; inside a docker container; whatever). It looks like the Annoy search is multithreaded (all 24 cores are working at 100% when t-SNE is started), but once Annoy finishes, only one core is active during the gradient descent.
Given that Annoy calls are multithreaded "manually" and the gradient descent is multithreaded using PARALLEL_FOR, I suspect that the latter does not work for him for some reason.
How can we troubleshoot that?
The text was updated successfully, but these errors were encountered: