Multithreading for gradient descent not working #43

dkobak · 2018-09-25T13:25:53Z

A colleague of mine @msayhan downloaded the current master and is running t-SNE on our server under Python (24 cores; inside a docker container; whatever). It looks like the Annoy search is multithreaded (all 24 cores are working at 100% when t-SNE is started), but once Annoy finishes, only one core is active during the gradient descent.

Given that Annoy calls are multithreaded "manually" and the gradient descent is multithreaded using PARALLEL_FOR, I suspect that the latter does not work for him for some reason.

How can we troubleshoot that?

dkobak · 2018-09-25T13:55:25Z

To be more precise:

only one core is active during the gradient descent.

We don't really know that. We are looking at the htop and it shows that almost all cores are used by 15% or so. It almost looks like they are not being used at all, but who knows - maybe they are?..

For comparison, during the Annoy loop, all cores are working with 100%.

dkobak · 2018-09-25T13:58:18Z

Another update: if we set perplexity a lot higher (e.g. perplexity=500), then all cores are working at 30-40%, so it becomes clear that the process is actually multithreaded...

PS. In all these cases the sample size is moderately small n=11k.

linqiaozhi · 2018-09-26T01:33:59Z

@dkobak @msayhan With n=11k, the bottleneck in the gradient descent is the FFT, which is not multithreaded (multithreading the FFT does not give much speed up for the typical number interpolation points and boxes we use, and requires the users to compile FFTW with flags, complicating the install). So, it's going to be hard to really see the speed-up on the non-FFT parts, or even to catch the split second all cores are working before all the threads finish and we are back in the FFT again.

Increasing perplexity was a great way to check it though. But perhaps even simpler is to just try it with a large N. In that case, the bottleneck becomes the attractive forces, and that should parallelize nicely. You still might not catch all cores at 100% (because each iteration is so fast and they have to go to 0% between iterations), but you should definitely see some multicore action happening. I typically just use top, and if the CPU usage for the fast_tsne process exceeds 100% I know that it is multithreading.

dkobak · 2018-09-26T06:49:40Z

This makes sense. In fact, I think @msayhan was observing CPU usage less than 100% in top, that's exactly what confused us and made us post this issue. But as soon as he increased N (or perplexity), the usage went above 100%.

Thanks for the explanations! I close this now.

dkobak closed this as completed Sep 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading for gradient descent not working #43

Multithreading for gradient descent not working #43

dkobak commented Sep 25, 2018

dkobak commented Sep 25, 2018 •

edited

Loading

dkobak commented Sep 25, 2018 •

edited

Loading

linqiaozhi commented Sep 26, 2018

dkobak commented Sep 26, 2018

Multithreading for gradient descent not working #43

Multithreading for gradient descent not working #43

Comments

dkobak commented Sep 25, 2018

dkobak commented Sep 25, 2018 • edited Loading

dkobak commented Sep 25, 2018 • edited Loading

linqiaozhi commented Sep 26, 2018

dkobak commented Sep 26, 2018

dkobak commented Sep 25, 2018 •

edited

Loading

dkobak commented Sep 25, 2018 •

edited

Loading