Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] ENH use pairwise_distances_chunked in brute nearest neighbors #11136

Merged
merged 4 commits into from Jun 6, 2018

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented May 25, 2018

Formerly part of #10280

Fixes #7287

@jnothman jnothman changed the title [MRG+1] ENH use pairwise_distances_chunked in brute nearest neighbors [MRG] ENH use pairwise_distances_chunked in brute nearest neighbors May 25, 2018
Copy link
Member Author

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I believe no tests are needed: the functionality of algorithm='brute' is tested, and the memory consumption of pairwise_distances_chunked is tested elsewhere.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very useful!

Some benchmarks to double check that doesn't affect performance too much on a standard use case with default working_memory size could be helpful. There were some benchmarks in #10280 (comment) but I'm not sure to what extent they would generalize to this..

For readability, it would help to have some docstrings in the reduce functions specifying type and shape of inputs.

@@ -276,9 +278,25 @@ def _pairwise(self):
class KNeighborsMixin(object):
"""Mixin for k-neighbors searches"""

def _kneighbors_reduce_func(self, dist, start,
n_neighbors, return_distance):
Copy link
Member

@rth rth May 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make sense to decorate it with @staticmethod and remove self to help redability?

Copy link
Member Author

@jnothman jnothman May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses self.effective_metric_

@jnothman
Copy link
Member Author

jnothman commented May 28, 2018

Benchmark:

from pprint import pprint
import time
import sklearn
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import euclidean_distances
n_features = 100


results = []
for n_samples in [1000, 10000, 20000, 50000, ]:
    X = np.random.rand(n_samples, n_features)
    sampleD = euclidean_distances(X[:100])
    avg_dist = np.mean(sampleD[np.triu_indices_from(sampleD, 1)])
    for metric in ['euclidean']:
        nn = NearestNeighbors(radius=avg_dist * .9, n_neighbors=10,
                              metric=metric, algorithm='brute').fit(X)
        for method in ['kneighbors', 'radius_neighbors']:
###            for working_mem in ['master']:
            for working_mem in [100, 1024, 999999999]:
                with sklearn.config_context(working_memory=working_mem):
                    s = time.time()
                    out = getattr(nn, method)()
                    t = time.time() - s
                    n_neighbors = sum(len(x) for x in out[0])
                    d = {'metric': metric,
                         'n_samples': n_samples,
                         'method': method,
                         'working_memory': working_mem,
                         'total_n_neighbors': n_neighbors,
                         'time': t}
                    pprint(d)
                    results.append(d)

image

All timings:

,method,metric,n_samples,time,total_n_neighbors,working_memory
0,kneighbors,euclidean,1000,0.030091047286987305,10000,master
1,radius_neighbors,euclidean,1000,0.020231008529663086,35066,master
2,kneighbors,euclidean,10000,3.1793291568756104,100000,master
3,radius_neighbors,euclidean,10000,2.0275309085845947,5868760,master
4,kneighbors,euclidean,20000,17.558682918548584,200000,master
5,radius_neighbors,euclidean,20000,9.4132080078125,20476428,master
6,kneighbors,euclidean,50000,1119.4494569301605,500000,master
7,radius_neighbors,euclidean,50000,1070.3520658016205,117407558,master
0,kneighbors,euclidean,1000,0.031382083892822266,10000,100
1,kneighbors,euclidean,1000,0.021708011627197266,10000,1024
2,kneighbors,euclidean,1000,0.026649951934814453,10000,999999999
3,radius_neighbors,euclidean,1000,0.0228121280670166,45472,100
4,radius_neighbors,euclidean,1000,0.019867897033691406,45472,1024
5,radius_neighbors,euclidean,1000,0.03182816505432129,45472,999999999
6,kneighbors,euclidean,10000,2.182466983795166,100000,100
7,kneighbors,euclidean,10000,2.8718271255493164,100000,1024
8,kneighbors,euclidean,10000,2.8780999183654785,100000,999999999
9,radius_neighbors,euclidean,10000,1.3253610134124756,5574012,100
10,radius_neighbors,euclidean,10000,1.8732130527496338,5574012,1024
11,radius_neighbors,euclidean,10000,1.8151168823242188,5574012,999999999
12,kneighbors,euclidean,20000,9.68129301071167,200000,100
13,kneighbors,euclidean,20000,13.157173156738281,200000,1024
14,kneighbors,euclidean,20000,22.281371116638184,200000,999999999
15,radius_neighbors,euclidean,20000,9.239833116531372,19401726,100
16,radius_neighbors,euclidean,20000,6.303516149520874,19401726,1024
17,radius_neighbors,euclidean,20000,8.991750001907349,19401726,999999999
18,kneighbors,euclidean,50000,68.80946683883667,500000,100
19,kneighbors,euclidean,50000,72.32541084289551,500000,1024
20,kneighbors,euclidean,50000,1429.7003951072693,500000,999999999
21,radius_neighbors,euclidean,50000,53.88025784492493,108698316,100
22,radius_neighbors,euclidean,50000,49.64297103881836,108698316,1024
23,radius_neighbors,euclidean,50000,1224.8581790924072,108698316,999999999
24,kneighbors,euclidean,100000,214.5971851348877,1000000,100

@jnothman
Copy link
Member Author

Basically, we see good improvements with capped working_memory, and negligible change from master.

@jnothman
Copy link
Member Author

jnothman commented Jun 2, 2018

@rth, let's merge this too?

@rth
Copy link
Member

rth commented Jun 3, 2018

Thanks for the benchmarks !

I was actually curious if this has an impact on the run time - it turns out that it does a little. Below a few more benchmarks,

For `NearestNeighbors.kneighbors` (details)
import numpy as np
from sklearn.neighbors import NearestNeighbors

from neurtu import delayed, Benchmark

n_features = 100

n_samples_predict = 1000

def runs():
    for n_samples in [5000, 10000, 20000, 40000, 80000, 160000, 320000]:
        for n_neighbors in [1, 5, 10]
            tags = {'n_samples': n_samples, 'n_neighbors': n_neighbors}

            X = np.random.rand(n_samples, n_features)
            Xp = np.random.rand(n_samples_predict, n_features)
            estimator = NearestNeighbors(n_neighbors=n_neighbors, algorithm='brute').fit(X)
            yield delayed(estimator, tags=tags).kneighbors(Xp)

df = Benchmark(wall_time=True, peak_memory={'interval': 0.0001}, repeat=1)(runs())
df.set_index(['n_samples_predict', 'n_samples'], inplace=True)

df = df.sort_index()
df = df.rename(columns={'wall_time_mean': 'wall_time', 'peak_memory_mean': 'peak_memory'})

df = df[['wall_time', 'peak_memory']]
df['peak_memory'] = df['peak_memory'].round(2)
df['wall_time'] = df['wall_time'].round(4)
print(df)
                      wall_time           peak_memory          
                         master PR #11136      master PR #11136
n_neighbors n_samples                                          
1           5000         0.0633    0.0633       75.19     74.69
            10000        0.1206    0.1212      152.39    151.02
            20000        0.2369    0.2362      303.52    305.51
            40000        0.4633    0.4630      609.49    610.18
            80000        0.9132    0.9129     1221.02   1220.41
            160000       1.8354    1.8600     2441.99   2045.79
            320000       3.6594    3.8362     4883.88   2045.83
5           5000         0.1007    0.1002       76.09     76.25
            10000        0.1818    0.1887      152.25    152.59
            20000        0.3785    0.3681      304.92    305.06
            40000        0.7574    0.7430      610.27    610.15
            80000        1.4955    1.4317     1220.46   1220.11
            160000       2.9148    3.0222     2441.36   2045.79
            320000       5.5406    5.7484     4882.66   2045.89
10          5000         0.1044    0.1008       76.30     76.30
            10000        0.1955    0.1981      152.48    152.40
            20000        0.3810    0.3534      305.16    305.00
            40000        0.7431    0.7677      610.25    610.17
            80000        1.4157    1.4905     1220.71   1220.60
            160000       2.8404    3.0266     2441.35   2045.79
            320000       5.7615    5.9708     4882.64   2045.86
For `NearestNeighbors.radius_neighbors` (details)
import numpy as np
from sklearn.neighbors import NearestNeighbors

from neurtu import delayed, Benchmark

n_features = 100
n_neighbors = 5

n_samples_predict = 1000
radius = 0.5

def runs():
    for n_samples in [5000, 10000, 20000, 40000, 80000, 160000, 320000]:
        tags = {'n_samples': n_samples, 'radius': radius}

        X = np.random.rand(n_samples, n_features)
        Xp = np.random.rand(n_samples_predict, n_features)
        estimator = NearestNeighbors(radius=radius, algorithm='brute').fit(X)
        yield delayed(estimator, tags=tags).radius_neighbors(Xp)

df = Benchmark(wall_time=True, peak_memory={'interval': 0.0001}, repeat=1)(runs())
df.set_index(['radius', 'n_samples'], inplace=True)

df = df.sort_index()
df = df.rename(columns={'wall_time_mean': 'wall_time', 'peak_memory_mean': 'peak_memory'})

df = df[['wall_time', 'peak_memory']]
df['peak_memory'] = df['peak_memory'].round(2)
df['wall_time'] = df['wall_time'].round(4)
print(df)
                 wall_time         peak_memory         
                    master      PR      master       PR
radius n_samples                                       
0.5    5000         0.0557  0.0556       39.09    38.92
       10000        0.0997  0.0999       76.57    76.56
       20000        0.1984  0.1930      152.74   152.73
       40000        0.3736  0.3784      305.18   305.18
       80000        0.7296  0.7248      610.66   610.66
       160000       1.4284  1.4596     1221.32  1022.96
       320000       2.8373  2.9735     2442.63  1022.96

So the memory usage works as expected however we pay for it with some performance overhead of around 4-5% once all of the matrix doesn't fit into a single chunk. I imagine the overhead is due to chunking & concatenating. Overall this sounds like a reasonable compromise to me.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, if some docstrings are added to the private reduce functions (#11136 (review))

Thanks @jnothman !

@rth rth changed the title [MRG] ENH use pairwise_distances_chunked in brute nearest neighbors [MRG+1] ENH use pairwise_distances_chunked in brute nearest neighbors Jun 3, 2018
@jnothman
Copy link
Member Author

jnothman commented Jun 4, 2018

Huh. My benchmarks above show substantial wall time decreases for large n_samples

@rth
Copy link
Member

rth commented Jun 4, 2018

Huh. My benchmarks above show substantial wall time decreases for large n_samples

The plots represent run time? I though it was memory usage.. Do you still have the benchmark code?

@jnothman
Copy link
Member Author

jnothman commented Jun 4, 2018 via email

@jnothman
Copy link
Member Author

jnothman commented Jun 4, 2018

(although naively using t = time.time() - s)

@rth
Copy link
Member

rth commented Jun 4, 2018

How much RAM were you running these benchmarks with? I get a memory error for 50000 samples with 32GB RAM...

Hmm, but for the rest, yes, with your script I can confirm the results,

                                time                          
working_memory                   100      1024 999999999    master   
n_samples method                                                     
1000      kneighbors        0.023986  0.023552  0.023704  0.022691   
          radius_neighbors  0.023470  0.023864  0.022067  0.023892   
10000     kneighbors        1.832281  2.249930  2.242692  2.268365   
          radius_neighbors  1.321087  1.760668  1.740842  1.787875   
20000     kneighbors        8.025549  7.530514  9.977264  9.978430   
          radius_neighbors  4.835002  4.589210  6.991414  7.119566   

The difference appears to be that in your benchmark you were running kneighbours(X=None) while I was using another random vector for X.

The 20k case for me takes around 20 GB RAM (out of 32 GB available). I'm not familiar enough with the details of dynamic memory allocation, could it be that that to allocate such a contiguous array without chunking there is some overhead at the OS level (fragmentation issues etc), when comparing to allocating/deallocating smaller chunks? In any case the results look good in both benchmarks !

@jnothman
Copy link
Member Author

jnothman commented Jun 4, 2018

Hmm... I think I ran these on my laptop, with 16G... Yes, your benchmarks probably had fewer experimental variables. I forgot that I should probably keep the size of the test data constant; although in practice users are often using X=None with NN.

Let's not worry about it too much, given that we have separately confirmed that this PR is a good thing.

@jnothman jnothman requested a review from TomDLT June 5, 2018 21:38
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TomDLT TomDLT merged commit 646ceb1 into scikit-learn:master Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants