Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NearestNeighbors radius_neighbors memory leaking #11051

Closed
EemeliSaari opened this issue May 2, 2018 · 21 comments · Fixed by #11056
Closed

NearestNeighbors radius_neighbors memory leaking #11051

EemeliSaari opened this issue May 2, 2018 · 21 comments · Fixed by #11056

Comments

@EemeliSaari
Copy link

Description

NearestNeighbors uses a large chunck of memory in run time without releasing it even after calling del or assigning a empty array to the object variable. The memory will how ever be released after the python process is terminated.

I noticed the bug when I was trying to fit a DBSCAN model and by looking deeper into the issue I was able to reproduce the same memory leak with a random data by using NearestNeighbors.radius_neighbors() method.

The memory leaked in a 12GB RAM machine was: 10%.

Steps/Code to Reproduce

from sklearn.neighbors import NearestNeighbors
import numpy as np

X = np.random.rand(20000, 2) # Random data

neighbors_model = NearestNeighbors()
neighbors_model.fit(X)

neighborhoods = neighbors_model.radius_neighbors(X, 0.5, return_distance=False)

del neighborhoods
del neighbors_model
del X

while True:
	pass

The algorithm modes that were producing the leak were:

  • auto
  • ball_tree
  • kd_tree

With the algorithm brute the memory leak didn't happen but when increasing the data size by factor of 10 the following error happened

Expected Results

MemoryError                               Traceback (most recent call last)
<ipython-input-14-df89794bb3b3> in <module>()
----> 1 neighborhoods = neighbors_model.radius_neighbors(X, 0.5, return_distance=False)

/usr/local/lib/python3.5/dist-packages/sklearn/neighbors/base.py in radius_neighbors(self, X, radius, return_distance)
    588             if self.effective_metric_ == 'euclidean':
    589                 dist = pairwise_distances(X, self._fit_X, 'euclidean',
--> 590                                           n_jobs=self.n_jobs, squared=True)
    591                 radius *= radius
    592             else:

/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py in pairwise_distances(X, Y, metric, n_jobs, **kwds)
   1245         func = partial(distance.cdist, metric=metric, **kwds)
   1246 
-> 1247     return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1248 
   1249 

/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py in _parallel_pairwise(X, Y, func, n_jobs, **kwds)
   1088     if n_jobs == 1:
   1089         # Special case to avoid picklability checks in delayed
-> 1090         return func(X, Y, **kwds)
   1091 
   1092     # TODO: in some cases, backend='threading' may be appropriate

/usr/local/lib/python3.5/dist-packages/sklearn/metrics/pairwise.py in euclidean_distances(X, Y, Y_norm_squared, squared, X_norm_squared)
    244         YY = row_norms(Y, squared=True)[np.newaxis, :]
    245 
--> 246     distances = safe_sparse_dot(X, Y.T, dense_output=True)
    247     distances *= -2
    248     distances += XX

/usr/local/lib/python3.5/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
    138         return ret
    139     else:
--> 140         return np.dot(a, b)
    141 
    142 

MemoryError: 

Versions

  • Linux-4.13.0-39-generic-x86_64-with-Ubuntu-16.04-xenial
  • Python 3.5.2 (default, Nov 23 2017, 16:37:01)
  • [GCC 5.4.0 20160609]
  • NumPy 1.14.3
  • SciPy 1.0.1
  • Scikit-Learn 0.19.1
@jnothman
Copy link
Member

jnothman commented May 2, 2018

@recamshak do you happen to be aware of this?

@EemeliSaari have you checked this is present in master? This was recently changed.

@jnothman
Copy link
Member

jnothman commented May 2, 2018 via email

@EemeliSaari
Copy link
Author

EemeliSaari commented May 2, 2018

The issue was with the leaking memory with the other algorithms.

Edited: Also I'm working with the latest release version of the sklearn so that would be no to the master branch question.

@jnothman
Copy link
Member

jnothman commented May 2, 2018 via email

@EemeliSaari
Copy link
Author

No changes to the memory usage when running the test script with the master branch. Might this be a problem only on my machine?

@rth
Copy link
Member

rth commented May 2, 2018

I can reproduce on master using Linux and Python 3.6 for both ball_tree and kd_tree,

import gc

from sklearn.neighbors import NearestNeighbors
import numpy as np
from memory_profiler import memory_usage


def get_memory_usage():
    return memory_usage(timeout=0.2)[0]


print('Initial memory usage: %.2f MB' % get_memory_usage())

X = np.random.rand(20000, 2)  # Random data

print('Allocated X: %.2f MB' % get_memory_usage())
print('X.nbytes == %.2f MB' % (X.nbytes / 1e6))

neighbors_model = NearestNeighbors(algorithm='ball_tree')
neighbors_model.fit(X)
print('Fitted NearestNeighbors: %.2f MB' % get_memory_usage())

neighborhoods = neighbors_model.radius_neighbors(X, 0.5, return_distance=False)
print('Computed radius_neighbors: %.2f MB' % get_memory_usage())

del neighborhoods
del neighbors_model
del X

gc.collect()

print('Unallocated everything: %.2f MB' % get_memory_usage())

produces,

Initial memory usage: 88.03 MB
Allocated X: 88.41 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 88.52 MB
Computed radius_neighbors: 1570.81 MB
Unallocated everything: 1569.59 MB

@rth
Copy link
Member

rth commented May 2, 2018

It's also the case in 0.18.1 and 0.19.1.

Seems fairly credible given the widespread use of C pointers in the kd / ball tree implementation. After a quick look I wasn't able to find the issue, but given the amount of memory leaked there shouldn't be that many objects that need to allocate 1.5 GB memory in this example. In particular, it seems to be of the order of (n_samples, n_samples) matrix in 32 bit.

Furthermore, calling radius_neighbors multiple times increases the amount of memory used by as much,

Initial memory_usage: 87.38 MB
Allocated X: 87.77 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 88.06 MB
Computed radius_neighbors: 1560.98 MB
Computed radius_neighbors: 3032.30 MB
Computed radius_neighbors: 4502.32 MB

@recamshak
Copy link
Contributor

@jnothman I am not aware of any memory leak, but I'll check my MR.

@recamshak
Copy link
Contributor

It seems that setting the NPY_OWNDATA flag on a numpy array is not doing what I though it would. The memory is not freed even when the array get garbage collected.

@jnothman
Copy link
Member

jnothman commented May 2, 2018 via email

@recamshak
Copy link
Contributor

On 0.19.1 I got:

Initial memory usage: 82.68 MB
Allocated X: 82.93 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 83.42 MB
Computed radius_neighbors: 1564.74 MB
Unallocated everything: 84.88 MB

@rth
Copy link
Member

rth commented May 3, 2018

It's also the case in 0.18.1 and 0.19.1.

On 0.19.1 I got:
Unallocated everything: 84.88 MB

Yes, previously I have built 0.18.1 and 0.19.1 from sources with Python 3.6, Cython 0.25.2, Numpy 1.12.1, scipy 0.19.0 (installed with conda), where I can reproduce this issue.

On the other hand, using either 0.18.1 / 0.19.1 conda binary files or manylinux wheels from PyPi seems fine for me as well. These would use the latest Numpy 1.14.1, scipy 1.0.1.

Maybe I'm missing something with my local builds, but in any case @EemeliSaari was also seeing this issue for 0.19.1 not on master, so normally it should be unrelated to #10887

@jnothman jnothman reopened this May 3, 2018
@jnothman
Copy link
Member

jnothman commented May 3, 2018

Sorry, I shouldn't have let #11056 close this. I think that solved a separate issue.

@rth
Copy link
Member

rth commented May 3, 2018

In an Ubuntu xenial container with idential parameters to OP, I can reproduce on master (after the #11056 fix) but not in 0.19.1, using the same benchmark script as above and the following setup,

docker run --rm ubuntu:xenial-20180417 /bin/bash -c "

apt-get update &&
apt-get install -yq libatlas-dev libatlas-base-dev liblapack-dev gfortran python3-pip wget git &&
pip3 install numpy scipy Cython==0.25.2 memory_profiler psutil &&
# adapt the git tag as needed 
pip3 install git+https://github.com/scikit-learn/scikit-learn.git@master &&
# run the benchmark script 
wget -O - https://github.com/scikit-learn/scikit-learn/files/1970318/test_memory_leak.txt | python3 -
"

@recamshak
Copy link
Contributor

On a slightly modified benchmark script on master you can see that the memory get reused:

sklearn: 0.20.dev0
Initial memory usage: 74.74 MB
Allocated X: 74.74 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 75.08 MB
Computed radius_neighbors: 1554.91 MB
Computed radius_neighbors: 1555.02 MB
Computed radius_neighbors: 1554.81 MB
Computed radius_neighbors: 1554.81 MB
Computed radius_neighbors: 1554.81 MB
Unallocated everything: 1553.53 MB

On a Windows 10 Home system, the same script gives:

sklearn: 0.20.dev0
Initial memory usage: 51.81 MB
Allocated X: 52.11 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 52.43 MB
Computed radius_neighbors: 1534.98 MB
Computed radius_neighbors: 1535.32 MB
Computed radius_neighbors: 1535.39 MB
Computed radius_neighbors: 1535.36 MB
Computed radius_neighbors: 1535.39 MB
Unallocated everything: 54.31 MB

Is it expected from malloc/free to not directly release the memory to the OS in certain case? Not sure why then 0.19.1 and master don't give the same memory usage. Maybe numpy is using a different memory allocator? As far as I can tell it's using malloc defined in Python.h: https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/src/multiarray/ctors.c#L1062 -> https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/src/multiarray/alloc.c#L92 -> https://github.com/numpy/numpy/blob/v1.14.2/numpy/core/src/multiarray/alloc.c#L201

So I tried with the following change on master to make sure I use the same allocator, but it has no effect:

diff --git a/sklearn/neighbors/binary_tree.pxi b/sklearn/neighbors/binary_tree.pxi
index edf78257c..3d5986ec4 100755
--- a/sklearn/neighbors/binary_tree.pxi
+++ b/sklearn/neighbors/binary_tree.pxi
@@ -144,7 +144,6 @@
 cimport cython
 cimport numpy as np
 from libc.math cimport fabs, sqrt, exp, cos, pow, log
-from libc.stdlib cimport calloc, malloc, free
 from libc.string cimport memcpy
 from sklearn.utils.lgamma cimport lgamma
 
@@ -161,6 +160,11 @@ from dist_metrics cimport (DistanceMetric, euclidean_dist, euclidean_rdist,
 cdef extern from "numpy/arrayobject.h":
     void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)
 
+cdef extern from "Python.h":
+    void* malloc(int size) nogil
+    void* calloc(int num, int size) nogil
+    void free(void *p) nogil
+
 np.import_array()
 
 # some handy constants

@jnothman
Copy link
Member

jnothman commented May 7, 2018

I can't remember the terminology, but I recall previously identifying similar behaviour - memory being kept for reuse - as a Linux (I think) feature that made it look like memory consumption was increasing when it wasn't.

@rth
Copy link
Member

rth commented May 7, 2018

This sounds relevant, https://www.linuxquestions.org/questions/programming-9/how-to-free-memory-immediately-after-calling-free-in-c-programming-916896/#post4540924

In the C language, free happens immediately, and the space is given back to the C heap [...]

However, the usual implementation of the heap itself is such that if it runs out of space, it is increased in size. It never decreases in size, even if used space is given back to the heap. It is done this way because it is an efficient implementation.

This is not typically a problem (ignoring pathological fragmentation), because the areas no longer being used will eventually be paged out of real memory.

But it does mean that you cannot use programs like top to monitor what is happening, because they cannot tell that part of the C heap is no longer being used, and will continue to show the current total memory space of the process (including both used and unused parts of the C heap).

But then if this is true, I don't understand why it's not an issue when benchmarking memory usage in general...

In case it's relevant, in the benchmark script, memory_profiler uses psutil.memory_info (when psutil is installed) to measure memory usage which gets it from /proc/<PID>/statm on Linux.

@rth
Copy link
Member

rth commented May 7, 2018

Maybe @aabadie would have some ideas about this ?

@recamshak
Copy link
Contributor

After digging into glibc I found that the memory manager behavior can be changed via environment variables. See the end of man mallopt for a (exhaustive?) list.

I tried to run MALLOC_MMAP_THRESHOLD_=0 python test_memory_leak.py (glibc 2.26) and the memory leak seems to be gone:

sklearn: 0.20.dev0
Initial memory usage: 122.58 MB
Allocated X: 122.97 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: 123.52 MB
Computed radius_neighbors: 1716.45 MB
Computed radius_neighbors: 1716.46 MB
Computed radius_neighbors: 1716.46 MB
Computed radius_neighbors: 1716.46 MB
Computed radius_neighbors: 1716.47 MB
Unallocated everything: 123.07 MB

It's significantly slower though. 12s instead of 9s.

So it seems that there is actually no memory leak. But as @rth said, I don't understand why the behavior is different whether we do the malloc in sklearn (master branch) or let numpy do it (0.19.1)?

@jnothman
Copy link
Member

jnothman commented May 7, 2018

Thanks @EemeliSaari for helping us find a different issue, but I think we will close unless this cab be verified more reliably

@jnothman jnothman closed this as completed May 7, 2018
@rth
Copy link
Member

rth commented May 8, 2018

Thanks for investigating @recamshak !

After digging into glibc I found that the memory manager behavior can be changed via environment variables. See the end of man mallopt for a (exhaustive?) list.

Interesting, in particular,

Note: Nowadays, glibc uses a dynamic mmap threshold by default. The initial value of the threshold is 128*1024, but when blocks larger than the current threshold and less than or equal to DEFAULT_MMAP_THRESHOLD_MAX are freed, the threshold is adjusted upward to the size of the freed block. When dynamic mmap thresholding is in effect, the threshold for trimming the heap is also dynamically adjusted to be twice the dynamic mmap threshold. Dynamic adjustment of the mmap threshold is disabled if any of the M_TRIM_THRESHOLD, M_TOP_PAD, M_MMAP_THRESHOLD, or M_MMAP_MAX parameters is set.

sounds also relevant, but I can't wrap my head around it (and I'm not sure I want to)...

For the record, I also tried running a more in depth memory benchmark with all fieds from psutil.Process().memory_info(), using the following function,

def get_memory_usage():
    import psutil
    res = psutil.Process().memory_full_info()
    return ' '.join('%s: %.3f MB' % (key, val/2**20)
                    for key, val in res._asdict().items() if val)

(and adapting the print statements in the benchmark script) which produces,

Initial memory usage: rss: 71.637 MB vms: 547.602 MB shared: 31.797 MB text: 2.746 MB data: 41.824 MB uss: 67.371 MB pss: 67.648 MB
Allocated X: rss: 72.410 MB vms: 548.320 MB shared: 31.809 MB text: 2.746 MB data: 42.543 MB uss: 68.195 MB pss: 68.473 MB
X.nbytes == 0.32 MB
Fitted NearestNeighbors: rss: 72.984 MB vms: 548.691 MB shared: 31.871 MB text: 2.746 MB data: 42.914 MB uss: 68.625 MB pss: 68.902 MB
Computed radius_neighbors: rss: 1545.211 MB vms: 2020.824 MB shared: 31.996 MB text: 2.746 MB data: 1515.047 MB uss: 1540.930 MB pss: 1541.207 MB
Computed radius_neighbors: rss: 1545.867 MB vms: 2021.324 MB shared: 31.996 MB text: 2.746 MB data: 1515.547 MB uss: 1541.441 MB pss: 1541.719 MB
Computed radius_neighbors: rss: 1546.266 MB vms: 2021.973 MB shared: 31.996 MB text: 2.746 MB data: 1516.195 MB uss: 1541.965 MB pss: 1542.242 MB
Computed radius_neighbors: rss: 1546.785 MB vms: 2022.359 MB shared: 31.996 MB text: 2.746 MB data: 1516.582 MB uss: 1542.352 MB pss: 1542.629 MB
Computed radius_neighbors: rss: 1546.785 MB vms: 2022.359 MB shared: 31.996 MB text: 2.746 MB data: 1516.582 MB uss: 1542.352 MB pss: 1542.629 MB
Unallocated everything  : rss: 1545.758 MB vms: 2021.109 MB shared: 31.996 MB text: 2.746 MB data: 1515.332 MB uss: 1541.250 MB pss: 1541.527 MB

The original benchmark script only shows the RSS memory, but in the end I don't think this provides any additional insight.

I agree with @jnothman about closing this because,

  • we were not able to reproduce the original problem on 0.19.0
  • on master this seems to be another issue. The fact that repeated calls don't increase memory suggect there is not leak and the rest seems to be the pecularities of handling of memory allocations by the Linux OS. So I'm not sure there is anything that we can/should be do about it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants