Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive cpu usage for np.unique #12374

Closed
jimmyyhwu opened this issue Nov 13, 2018 · 20 comments
Closed

Excessive cpu usage for np.unique #12374

jimmyyhwu opened this issue Nov 13, 2018 · 20 comments

Comments

@jimmyyhwu
Copy link

Calling np.unique seems to result in > 100% cpu usage (no multiprocessing).

Reproducing code example:

import numpy as np

for _ in range(1000):
    arr = np.zeros((1024, 6144), dtype=np.uint16)
    np.unique(arr)

Error message:

htop shows ~3600% CPU usage.

Numpy/Python version information:

1.15.3 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]

@charris
Copy link
Member

charris commented Nov 13, 2018

I don't see that here in master or 14.6, python 3.7.1. What is your setup? Maybe Anaconda?

@jimmyyhwu
Copy link
Author

I am using conda 4.5.11, python 3.6.6, numpy 1.15.3

@jimmyyhwu
Copy link
Author

Steps to create the conda environment:

conda create -n numpy-env python=3.6
conda activate numpy-env
conda install numpy

@charris
Copy link
Member

charris commented Nov 13, 2018

I don't see anything in the function that should cause problems unless conda has done something about the sorting. What happens if you pip install numpy?

@jimmyyhwu
Copy link
Author

It looks like this only happens when installing via conda. The following does not cause problems:

conda create -n numpy-env python=3.6
conda activate numpy-env
pip install numpy

@seberg
Copy link
Member

seberg commented Nov 13, 2018

@jimmyyhwu out of curiosity, can you try if the issue goes away with OMP_NUM_THREADS=1 set as environment variable (put it on the same line as the python execution).

@jimmyyhwu
Copy link
Author

Yes, for the numpy installed via conda, setting OMP_NUM_THREADS=1 indeed makes the issue go away.

@seberg
Copy link
Member

seberg commented Nov 13, 2018

So I guess it is some MKL hook, but I am not sure what parts Intel/Anaconda monkeypatches inside numpy, I know there is monkeypatching for FFT and of course linear algebra is always external (and tends to be parallel). But both are not used here. @oleksandr-pavlyk do you maybe happen to know where we can see quickly what parts of numpy can be monkeypatched here?

Also, in case gh-11826 moves forward it might be good to keep an eye on.

@oleksandr-pavlyk
Copy link
Contributor

Probably np.copyto or np.copy.

Try setting KMP_BLOCKTIME=0 before running this example. It will call threads to terminate as soon as they are done with their work.

Doing so reduces CPU usage to about 110% for me.

@seberg
Copy link
Member

seberg commented Nov 13, 2018

@oleksandr-pavlyk good to know. What is the issue tracker for the project? I know mkl_fft, but I am a bit at a loss for the apparently many monkey patches that exist by now. I am also wondering a bit whether some of these changes would be good to push upstream into numpy proper, although I guess that is mostly not possible and not a high priority for you (if anything, for example tests being in numpy proper can't hurt).

@oleksandr-pavlyk
Copy link
Contributor

There is no dedicated issue tracker at the moment, and it's a good idea to create one.

Changes to np.copyto are not monkey patches. When building NumPy we apply patches to the tagged sources. These patches can be found in info/recipe folder inside conda tar-ball downloadable from anaconda cloud, e.g. https://anaconda.org/intel/numpy-base/1.15.4/download/linux-64/numpy-base-1.15.4-py36_2.tar.bz2 .

If you have Intel's numpy installed in conda environment, they can be accessed in /path/to/miniconda/pkgs/pkgs/numpy-base-${NUMPY_VERSION}-py36_0/info/recipe/parent/

@mattip
Copy link
Member

mattip commented Nov 14, 2018

Is there a public repo for those patches?

It seems some of them are not mkl specific and could be merged to numpy, but I could not find licensing info in the diffs

@mattip
Copy link
Member

mattip commented Feb 19, 2019

Is the CPU usage still an issue?

@jimmyyhwu
Copy link
Author

Yes, I just tried the above with numpy 1.15.4 and the problem still persists.

@DeltaProg
Copy link

DeltaProg commented Mar 7, 2019

The similar problem with the follwoing setup: Win10 x64, conda=4.6.7, python=3.6
Example:

arrayDim = (1080,1920,3)
npDest = np.zeros(arrayDim, dtype = np.uint8)
for i in range(2000):
    npSrc = np.random.randint(256, size = arrayDim, dtype=np.uint8) # np.random doesnt influence CPU usage, it was tested with data saved on ssd
    npDest[:,:,:] = npSrc   # that is what results in a higher CPU usage

I tried different numpy versions by
conda install numpy=x.xx.x
The problem happens when mkl is updated, and disappears when it's downgraded. So np 1.15.4 can work with both mkl packages, but CPU usage is higher with newer mkl:

mkl                                            2018.0.3-1 --> 2019.1-144
mkl_fft                              1.0.6-py36hdbbee80_0 --> 1.0.10-py36h14836fe_0
mkl_random                           1.0.1-py36h77b88f5_1 --> 1.0.2-py36h343c172_0
numpy                               1.15.4-py36ha559c80_0 --> 1.16.0-py36h19fb1c0_1
numpy-base                          1.15.4-py36h8128ebf_0 --> 1.16.0-py36hc3f5095_1

On two intel machine (i7-6700k) CPU usage increases from 15% to 52%, time of copying reduces from 600 µs to 530 µs. So 3.5 times more CPU results in only 1.13 speed up.
Setting KMP_BLOCKTIME=0 solves the problem

@njzjz
Copy link

njzjz commented Aug 5, 2019

I faced this issue when I use tensorflow-gpu with anaconda's numpy. Setting KMP_BLOCKTIME=0 can solve it.

@njzjz
Copy link

njzjz commented Oct 14, 2019

@jjhelmus Could you please pay attention to this issue? Thank you.

@rgommers
Copy link
Member

Rather than tagging an individual maintainer of Anaconda or Intel, it may be useful to open an issue on the correct tracker: https://github.com/ContinuumIO/anaconda-issues/issues. @njzjz it would be very helpful if you could do this and let us know so we can close this issue.

@rgommers
Copy link
Member

For the record:

  • This kind of thing has been an issue for a couple of years.
  • Patches to NumPy like these are undesirable. Using MKL for linalg is perfectly okay of course. fft and random are less desirable, but we've so far not made progress on a backend system that allows a cleaner integration. Changing things like np.unique and np.copyto is not okay; there doesn't seem to be a good justification, and we'd like to see that not being done at all (and certainly not in Anaconda defaults).
  • We (the NumPy maintainers) haven't been very good at having a structural discussion with Anaconda and Intel about it.
  • I've started such a conversation a week ago (with Anaconda first).

@mattip
Copy link
Member

mattip commented Dec 2, 2020

Closing the issue here, interested parties should follow the discussion on the open anaconda issue.

@mattip mattip closed this as completed Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants