Excessive cpu usage for np.unique #12374

jimmyyhwu · 2018-11-13T14:05:54Z

Calling np.unique seems to result in > 100% cpu usage (no multiprocessing).

Reproducing code example:

import numpy as np

for _ in range(1000):
    arr = np.zeros((1024, 6144), dtype=np.uint16)
    np.unique(arr)

Error message:

htop shows ~3600% CPU usage.

Numpy/Python version information:

1.15.3 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0]

charris · 2018-11-13T14:27:27Z

I don't see that here in master or 14.6, python 3.7.1. What is your setup? Maybe Anaconda?

jimmyyhwu · 2018-11-13T14:32:48Z

I am using conda 4.5.11, python 3.6.6, numpy 1.15.3

jimmyyhwu · 2018-11-13T14:38:29Z

Steps to create the conda environment:

conda create -n numpy-env python=3.6
conda activate numpy-env
conda install numpy

charris · 2018-11-13T14:43:39Z

I don't see anything in the function that should cause problems unless conda has done something about the sorting. What happens if you pip install numpy?

jimmyyhwu · 2018-11-13T14:46:51Z

It looks like this only happens when installing via conda. The following does not cause problems:

conda create -n numpy-env python=3.6
conda activate numpy-env
pip install numpy

seberg · 2018-11-13T14:53:38Z

@jimmyyhwu out of curiosity, can you try if the issue goes away with OMP_NUM_THREADS=1 set as environment variable (put it on the same line as the python execution).

jimmyyhwu · 2018-11-13T14:57:30Z

Yes, for the numpy installed via conda, setting OMP_NUM_THREADS=1 indeed makes the issue go away.

seberg · 2018-11-13T15:03:19Z

So I guess it is some MKL hook, but I am not sure what parts Intel/Anaconda monkeypatches inside numpy, I know there is monkeypatching for FFT and of course linear algebra is always external (and tends to be parallel). But both are not used here. @oleksandr-pavlyk do you maybe happen to know where we can see quickly what parts of numpy can be monkeypatched here?

Also, in case gh-11826 moves forward it might be good to keep an eye on.

oleksandr-pavlyk · 2018-11-13T15:34:43Z

Probably np.copyto or np.copy.

Try setting KMP_BLOCKTIME=0 before running this example. It will call threads to terminate as soon as they are done with their work.

Doing so reduces CPU usage to about 110% for me.

seberg · 2018-11-13T15:43:17Z

@oleksandr-pavlyk good to know. What is the issue tracker for the project? I know mkl_fft, but I am a bit at a loss for the apparently many monkey patches that exist by now. I am also wondering a bit whether some of these changes would be good to push upstream into numpy proper, although I guess that is mostly not possible and not a high priority for you (if anything, for example tests being in numpy proper can't hurt).

oleksandr-pavlyk · 2018-11-14T16:28:29Z

There is no dedicated issue tracker at the moment, and it's a good idea to create one.

Changes to np.copyto are not monkey patches. When building NumPy we apply patches to the tagged sources. These patches can be found in info/recipe folder inside conda tar-ball downloadable from anaconda cloud, e.g. https://anaconda.org/intel/numpy-base/1.15.4/download/linux-64/numpy-base-1.15.4-py36_2.tar.bz2 .

If you have Intel's numpy installed in conda environment, they can be accessed in /path/to/miniconda/pkgs/pkgs/numpy-base-${NUMPY_VERSION}-py36_0/info/recipe/parent/

mattip · 2018-11-14T18:29:58Z

Is there a public repo for those patches?

It seems some of them are not mkl specific and could be merged to numpy, but I could not find licensing info in the diffs

mattip · 2019-02-19T08:37:05Z

Is the CPU usage still an issue?

jimmyyhwu · 2019-02-19T19:11:10Z

Yes, I just tried the above with numpy 1.15.4 and the problem still persists.

DeltaProg · 2019-03-07T09:26:14Z

The similar problem with the follwoing setup: Win10 x64, conda=4.6.7, python=3.6
Example:

arrayDim = (1080,1920,3)
npDest = np.zeros(arrayDim, dtype = np.uint8)
for i in range(2000):
    npSrc = np.random.randint(256, size = arrayDim, dtype=np.uint8) # np.random doesnt influence CPU usage, it was tested with data saved on ssd
    npDest[:,:,:] = npSrc   # that is what results in a higher CPU usage

I tried different numpy versions by
conda install numpy=x.xx.x
The problem happens when mkl is updated, and disappears when it's downgraded. So np 1.15.4 can work with both mkl packages, but CPU usage is higher with newer mkl:

mkl                                            2018.0.3-1 --> 2019.1-144
mkl_fft                              1.0.6-py36hdbbee80_0 --> 1.0.10-py36h14836fe_0
mkl_random                           1.0.1-py36h77b88f5_1 --> 1.0.2-py36h343c172_0
numpy                               1.15.4-py36ha559c80_0 --> 1.16.0-py36h19fb1c0_1
numpy-base                          1.15.4-py36h8128ebf_0 --> 1.16.0-py36hc3f5095_1

On two intel machine (i7-6700k) CPU usage increases from 15% to 52%, time of copying reduces from 600 µs to 530 µs. So 3.5 times more CPU results in only 1.13 speed up.
Setting KMP_BLOCKTIME=0 solves the problem

njzjz · 2019-08-05T15:53:50Z

I faced this issue when I use tensorflow-gpu with anaconda's numpy. Setting KMP_BLOCKTIME=0 can solve it.

njzjz · 2019-10-14T19:03:17Z

@jjhelmus Could you please pay attention to this issue? Thank you.

rgommers · 2019-10-14T19:19:24Z

Rather than tagging an individual maintainer of Anaconda or Intel, it may be useful to open an issue on the correct tracker: https://github.com/ContinuumIO/anaconda-issues/issues. @njzjz it would be very helpful if you could do this and let us know so we can close this issue.

rgommers · 2019-10-14T19:27:24Z

For the record:

This kind of thing has been an issue for a couple of years.
Patches to NumPy like these are undesirable. Using MKL for linalg is perfectly okay of course. fft and random are less desirable, but we've so far not made progress on a backend system that allows a cleaner integration. Changing things like np.unique and np.copyto is not okay; there doesn't seem to be a good justification, and we'd like to see that not being done at all (and certainly not in Anaconda defaults).
We (the NumPy maintainers) haven't been very good at having a structural discussion with Anaconda and Intel about it.
I've started such a conversation a week ago (with Anaconda first).

mattip · 2020-12-02T21:25:31Z

Closing the issue here, interested parties should follow the discussion on the open anaconda issue.

mattip added the 29 - Intel/Anaconda label Jan 5, 2019

njzjz mentioned this issue Oct 14, 2019

Excessive cpu usage for numpy ContinuumIO/anaconda-issues#11367

Closed

mattip closed this as completed Dec 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive cpu usage for np.unique #12374

Excessive cpu usage for np.unique #12374

jimmyyhwu commented Nov 13, 2018

charris commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

charris commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

seberg commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

seberg commented Nov 13, 2018

oleksandr-pavlyk commented Nov 13, 2018

seberg commented Nov 13, 2018

oleksandr-pavlyk commented Nov 14, 2018

mattip commented Nov 14, 2018

mattip commented Feb 19, 2019

jimmyyhwu commented Feb 19, 2019

DeltaProg commented Mar 7, 2019 •

edited

Loading

njzjz commented Aug 5, 2019

njzjz commented Oct 14, 2019

rgommers commented Oct 14, 2019

rgommers commented Oct 14, 2019

mattip commented Dec 2, 2020

Excessive cpu usage for np.unique #12374

Excessive cpu usage for np.unique #12374

Comments

jimmyyhwu commented Nov 13, 2018

Reproducing code example:

Error message:

Numpy/Python version information:

charris commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

charris commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

seberg commented Nov 13, 2018

jimmyyhwu commented Nov 13, 2018

seberg commented Nov 13, 2018

oleksandr-pavlyk commented Nov 13, 2018

seberg commented Nov 13, 2018

oleksandr-pavlyk commented Nov 14, 2018

mattip commented Nov 14, 2018

mattip commented Feb 19, 2019

jimmyyhwu commented Feb 19, 2019

DeltaProg commented Mar 7, 2019 • edited Loading

njzjz commented Aug 5, 2019

njzjz commented Oct 14, 2019

rgommers commented Oct 14, 2019

rgommers commented Oct 14, 2019

mattip commented Dec 2, 2020

DeltaProg commented Mar 7, 2019 •

edited

Loading