# Description

Now `_get_parts` has been optimized with previous profiling tests.

Here we profile function `cdist_parts` again.

This version is the final test with the completely optimized clustermatch's functions.

This notebook was run on my laptop.

# Remove pycache dir

In [1]:
!echo ${CODE_DIR}

/opt/code


In [2]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/scipy/__pycache__
/opt/code/libs/clustermatch/sklearn/__pycache__
/opt/code/libs/clustermatch/__pycache__
/opt/code/libs/clustermatch/pytorch/__pycache__


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -exec rm -rf {} \;

find: ‘/opt/code/libs/clustermatch/scipy/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/sklearn/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/pytorch/__pycache__’: No such file or directory


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [5]:
import numpy as np

from clustermatch.coef import _cm

# Settings

In [6]:
N_REPS = 10

In [7]:
np.random.seed(0)

# Setup

In [8]:
# let numba compile all the code before profiling
_cm(np.random.rand(10), np.random.rand(10))

(array([0.15625]),
 array([[0, 1]], dtype=uint64),
 array([[[0, 1, 1, 0, 0, 1, 0, 1, 1, 0],
         [1, 2, 1, 1, 0, 2, 0, 2, 2, 0]],
 
        [[1, 0, 0, 1, 0, 0, 0, 1, 1, 1],
         [2, 1, 1, 2, 0, 0, 0, 2, 1, 2]]], dtype=int16))

# Run with `n_samples` small

In [9]:
N_SAMPLES = 100

In [10]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [11]:
def func():
    for i in range(N_REPS):
        _cm(x, y)

In [12]:
%%timeit func()
func()

34 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [13]:
%%prun -s cumulative -l 20 -T 08-n_samples_small.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small.txt'. 


         9045 function calls in 0.052 seconds

   Ordered by: cumulative time
   List reduced from 116 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.052    0.052 {built-in method builtins.exec}
        1    0.000    0.000    0.052    0.052 <string>:1(<module>)
        1    0.000    0.000    0.052    0.052 1670587376.py:1(func)
       10    0.001    0.000    0.052    0.005 coef.py:214(_cm)
       10    0.001    0.000    0.044    0.004 coef.py:317(compute_coef)
       10    0.000    0.000    0.042    0.004 coef.py:312(<lambda>)
       10    0.003    0.000    0.042    0.004 coef.py:168(cdist_parts_parallel)
      204    0.001    0.000    0.035    0.000 threading.py:280(wait)
      818    0.034    0.000    0.034    0.000 {method 'acquire' of '_thread.lock' objects}
      105    0.000    0.000    0.034    0.000 threading.py:556(wait)
      100    0.001    0.000    0.032    0.000 _base.py:201(as_complet

# Run with `n_samples` large

In [14]:
N_SAMPLES = 100000

In [15]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [16]:
def func():
    for i in range(N_REPS):
        _cm(x, y)

In [17]:
%%timeit func()
func()

7.15 s ± 703 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%prun -s cumulative -l 20 -T 08-n_samples_large.txt
func()

 
*** Profile printout saved to text file '08-n_samples_large.txt'. 


         9434 function calls in 6.916 seconds

   Ordered by: cumulative time
   List reduced from 116 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    6.916    6.916 {built-in method builtins.exec}
        1    0.000    0.000    6.916    6.916 <string>:1(<module>)
        1    0.000    0.000    6.916    6.916 1670587376.py:1(func)
       10    0.007    0.001    6.915    0.692 coef.py:214(_cm)
      221    0.002    0.000    6.883    0.031 threading.py:280(wait)
      892    6.882    0.008    6.882    0.008 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    4.809    0.481 coef.py:317(compute_coef)
       10    0.000    0.000    4.808    0.481 coef.py:312(<lambda>)
       10    0.002    0.000    4.807    0.481 coef.py:168(cdist_parts_parallel)
      120    0.001    0.000    4.796    0.040 threading.py:556(wait)
      100    0.001    0.000    4.795    0.048 _base.py:201(as_complet