# Description

Now `_get_parts` has been optimized with previous profiling tests.

Here we profile function `cdist_parts` again.

This version is the final test with the completely optimized clustermatch's functions.

This notebook was run on my laptop.

# Remove pycache dir

In [1]:
!echo ${CODE_DIR}

/opt/code


In [2]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/scipy/__pycache__
/opt/code/libs/clustermatch/sklearn/__pycache__
/opt/code/libs/clustermatch/__pycache__
/opt/code/libs/clustermatch/pytorch/__pycache__


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -exec rm -rf {} \;

find: ‘/opt/code/libs/clustermatch/scipy/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/sklearn/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/pytorch/__pycache__’: No such file or directory


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [5]:
import numpy as np

from clustermatch.coef import cm

# Settings

In [6]:
N_REPS = 10

In [7]:
np.random.seed(0)

# Setup

In [8]:
# let numba compile all the code before profiling
cm(np.random.rand(10), np.random.rand(10))

0.15625

# Run with `n_samples` small

## `n_samples=50`

In [9]:
N_SAMPLES = 50

In [10]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [11]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [12]:
%%timeit func()
func()

18.5 ms ± 458 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_50.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_50.txt'. 


         6942 function calls in 0.032 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.032    0.032 {built-in method builtins.exec}
        1    0.000    0.000    0.032    0.032 <string>:1(<module>)
        1    0.000    0.000    0.032    0.032 1517976664.py:1(func)
       10    0.001    0.000    0.032    0.003 coef.py:252(cm)
       10    0.001    0.000    0.023    0.002 coef.py:387(compute_coef)
       10    0.000    0.000    0.022    0.002 coef.py:380(cdist_func)
       10    0.002    0.000    0.022    0.002 coef.py:168(cdist_parts_parallel)
      135    0.001    0.000    0.018    0.000 threading.py:280(wait)
      558    0.017    0.000    0.017    0.000 {method 'acquire' of '_thread.lock' objects}
       72    0.000    0.000    0.017    0.000 threading.py:556(wait)
       70    0.001    0.000    0.014    0.000 _base.py:201(as_comple

## `n_samples=100`

In [14]:
N_SAMPLES = 100

In [15]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [16]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [17]:
%%timeit func()
func()

45.7 ms ± 536 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [18]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_100.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_100.txt'. 


         9232 function calls in 0.065 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.065    0.065 {built-in method builtins.exec}
        1    0.000    0.000    0.065    0.065 <string>:1(<module>)
        1    0.000    0.000    0.065    0.065 1517976664.py:1(func)
       10    0.001    0.000    0.065    0.006 coef.py:252(cm)
       10    0.001    0.000    0.050    0.005 coef.py:387(compute_coef)
       10    0.000    0.000    0.049    0.005 coef.py:380(cdist_func)
       10    0.005    0.000    0.048    0.005 coef.py:168(cdist_parts_parallel)
      808    0.041    0.000    0.041    0.000 {method 'acquire' of '_thread.lock' objects}
      200    0.001    0.000    0.040    0.000 threading.py:280(wait)
      106    0.000    0.000    0.038    0.000 threading.py:556(wait)
      100    0.001    0.000    0.036    0.000 _base.py:201(as_comple

## `n_samples=500`

In [19]:
N_SAMPLES = 500

In [20]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [21]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [22]:
%%timeit func()
func()

63.6 ms ± 3.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [23]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_500.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_500.txt'. 


         9358 function calls in 0.082 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.082    0.082 {built-in method builtins.exec}
        1    0.000    0.000    0.082    0.082 <string>:1(<module>)
        1    0.000    0.000    0.082    0.082 1517976664.py:1(func)
       10    0.001    0.000    0.081    0.008 coef.py:252(cm)
       10    0.001    0.000    0.064    0.006 coef.py:387(compute_coef)
       10    0.000    0.000    0.062    0.006 coef.py:380(cdist_func)
       10    0.003    0.000    0.062    0.006 coef.py:168(cdist_parts_parallel)
      213    0.001    0.000    0.060    0.000 threading.py:280(wait)
      844    0.059    0.000    0.059    0.000 {method 'acquire' of '_thread.lock' objects}
      107    0.000    0.000    0.053    0.000 threading.py:556(wait)
      100    0.001    0.000    0.052    0.001 _base.py:201(as_comple

## `n_samples=1000`

In [24]:
N_SAMPLES = 1000

In [25]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [26]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [27]:
%%timeit func()
func()

92.6 ms ± 2.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [28]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_1000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_1000.txt'. 


         9578 function calls in 0.113 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.113    0.113 {built-in method builtins.exec}
        1    0.000    0.000    0.113    0.113 <string>:1(<module>)
        1    0.000    0.000    0.113    0.113 1517976664.py:1(func)
       10    0.001    0.000    0.113    0.011 coef.py:252(cm)
      222    0.001    0.000    0.092    0.000 threading.py:280(wait)
      882    0.091    0.000    0.091    0.000 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    0.088    0.009 coef.py:387(compute_coef)
       10    0.000    0.000    0.087    0.009 coef.py:380(cdist_func)
       10    0.002    0.000    0.087    0.009 coef.py:168(cdist_parts_parallel)
      116    0.001    0.000    0.079    0.001 threading.py:556(wait)
      100    0.001    0.000    0.077    0.001 _base.py:201(as_comple

# Run with `n_samples` large

## `n_samples=50000`

In [29]:
N_SAMPLES = 50000

In [30]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [31]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [32]:
%%timeit func()
func()

3.79 s ± 93.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [33]:
%%prun -s cumulative -l 20 -T 08-n_samples_large_50000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_large_50000.txt'. 


         9639 function calls in 3.811 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.811    3.811 {built-in method builtins.exec}
        1    0.000    0.000    3.811    3.811 <string>:1(<module>)
        1    0.000    0.000    3.811    3.811 1517976664.py:1(func)
       10    0.005    0.000    3.811    0.381 coef.py:252(cm)
      221    0.002    0.000    3.781    0.017 threading.py:280(wait)
      892    3.779    0.004    3.779    0.004 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    2.694    0.269 coef.py:387(compute_coef)
       10    0.000    0.000    2.692    0.269 coef.py:380(cdist_func)
       10    0.002    0.000    2.692    0.269 coef.py:168(cdist_parts_parallel)
      120    0.001    0.000    2.680    0.022 threading.py:556(wait)
      100    0.002    0.000    2.680    0.027 _base.py:201(as_comple

## `n_samples=100000`

In [34]:
N_SAMPLES = 100000

In [35]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [36]:
def func():
    for i in range(N_REPS):
        cm(x, y)

In [37]:
%%timeit func()
func()

7.98 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [38]:
%%prun -s cumulative -l 20 -T 08-n_samples_large_100000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_large_100000.txt'. 


         9676 function calls in 8.328 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.328    8.328 {built-in method builtins.exec}
        1    0.000    0.000    8.328    8.328 <string>:1(<module>)
        1    0.000    0.000    8.328    8.328 1517976664.py:1(func)
       10    0.008    0.001    8.328    0.833 coef.py:252(cm)
      225    0.002    0.000    8.288    0.037 threading.py:280(wait)
      900    8.286    0.009    8.286    0.009 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    5.853    0.585 coef.py:387(compute_coef)
       10    0.000    0.000    5.851    0.585 coef.py:380(cdist_func)
       10    0.003    0.000    5.851    0.585 coef.py:168(cdist_parts_parallel)
      100    0.002    0.000    5.836    0.058 _base.py:201(as_completed)
      120    0.001    0.000    5.835    0.049 threading.py:556(w