New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threading aware multi-threading #16990
Comments
Does this reproduce with the much more elementary |
Yes. |
Then it sounds like this is completely out of our control, as our implementation for your test is likely just:
|
I guess it depends on your own threading implementation. The answer for NumPy at the moment is: use the |
@rgommers: Thanks for the pointer to Two notes though:
|
It's not a |
Fair point - yes, this is hard. BTW, is there any known way to get "composable multi threading" with Numpy? For example, even using Intel's Python distribution (which presumably uses MKL and TBB), even when running with FWIW, |
I don't know; NumPy uses neither, it's all MKL at that point. You should look at what libraries are loaded, perhaps it's you using a wrong API for threading, or perhaps it's an Intel packaging problem. |
I have contacted the relevant Intel maintainers. Seems that currently the TBB implementation has some issues with the The good news is that, if you use the Intel python distribution, and run I am not aware of anything equivalent using the vanilla Python distribution, though. It seems the Bottom line, I'm keeping my code using the |
Thanks, that's very useful info. |
If I invoke
np.corrcoef
on a large enough matrix, it uses multiple threads to speed up the computation. In fact, it uses all the CPUs (nproc
). On my server, this is a lot (56 threads).However, if I am using multiple threads myself, and invoke
np.corrcoef
in each one (again, on a large enough matrix), then each invocation uses its own set ofnproc
threads.The result is the OS can see up to
nproc
^2 threads (in my case, I over 3000 threads!), with all the memory and scheduling overheads this entails (most likely just getting killed due to out-of-memory issues).One would expect that, when using internal Numpy threads, then Numpy should be itself multi-threading aware, and only use up to
nproc
total such additional internal threads, regardless of the number of application threads that invoke Numpy.One could say this isn't Numpy's issue, but an issue of the underlying parallel framework (OpenMP, TBB). That said, TBB is supposed to solve this problem in theory, but I still see it when running Intel's python distribution with
python -m tbb
. Either "using TBB doesn't mean what you think it means" or there's some deeper issue here.Reproducing the problem:
Run
np.corrcoeff
on a large matrix fromnproc
different Python threads, and watch the system's load average soar up tonproc
^2 (most likely the program will be killed before it gets to that point).The text was updated successfully, but these errors were encountered: