Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading for gradient descent not working #43

Closed
dkobak opened this issue Sep 25, 2018 · 4 comments
Closed

Multithreading for gradient descent not working #43

dkobak opened this issue Sep 25, 2018 · 4 comments

Comments

@dkobak
Copy link
Collaborator

dkobak commented Sep 25, 2018

A colleague of mine @msayhan downloaded the current master and is running t-SNE on our server under Python (24 cores; inside a docker container; whatever). It looks like the Annoy search is multithreaded (all 24 cores are working at 100% when t-SNE is started), but once Annoy finishes, only one core is active during the gradient descent.

Given that Annoy calls are multithreaded "manually" and the gradient descent is multithreaded using PARALLEL_FOR, I suspect that the latter does not work for him for some reason.

How can we troubleshoot that?

@dkobak
Copy link
Collaborator Author

dkobak commented Sep 25, 2018

To be more precise:

only one core is active during the gradient descent.

We don't really know that. We are looking at the htop and it shows that almost all cores are used by 15% or so. It almost looks like they are not being used at all, but who knows - maybe they are?..

For comparison, during the Annoy loop, all cores are working with 100%.

@dkobak
Copy link
Collaborator Author

dkobak commented Sep 25, 2018

Another update: if we set perplexity a lot higher (e.g. perplexity=500), then all cores are working at 30-40%, so it becomes clear that the process is actually multithreaded...

PS. In all these cases the sample size is moderately small n=11k.

@linqiaozhi
Copy link
Member

@dkobak @msayhan With n=11k, the bottleneck in the gradient descent is the FFT, which is not multithreaded (multithreading the FFT does not give much speed up for the typical number interpolation points and boxes we use, and requires the users to compile FFTW with flags, complicating the install). So, it's going to be hard to really see the speed-up on the non-FFT parts, or even to catch the split second all cores are working before all the threads finish and we are back in the FFT again.

Increasing perplexity was a great way to check it though. But perhaps even simpler is to just try it with a large N. In that case, the bottleneck becomes the attractive forces, and that should parallelize nicely. You still might not catch all cores at 100% (because each iteration is so fast and they have to go to 0% between iterations), but you should definitely see some multicore action happening. I typically just use top, and if the CPU usage for the fast_tsne process exceeds 100% I know that it is multithreading.

@dkobak
Copy link
Collaborator Author

dkobak commented Sep 26, 2018

This makes sense. In fact, I think @msayhan was observing CPU usage less than 100% in top, that's exactly what confused us and made us post this issue. But as soon as he increased N (or perplexity), the usage went above 100%.

Thanks for the explanations! I close this now.

@dkobak dkobak closed this as completed Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants