# Description

Creates a point of reference/comparison with non-optimized version of ccc.

# Remove pycache dir

In [1]:
!echo ${CODE_DIR}

/opt/code


In [2]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/__pycache__


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -exec rm -rf {} \;

find: ‘/opt/code/libs/clustermatch/__pycache__’: No such file or directory


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [5]:
import numpy as np

from ccc.coef import _cm

# Settings

In [6]:
N_REPS = 10

In [7]:
np.random.seed(0)

# Setup

In [8]:
# let numba compile all the code before profiling
_cm.py_func(np.random.rand(10), np.random.rand(10))

(array([0.15625]),
 array([[0, 1]], dtype=uint64),
 [array([[0, 1, 1, 0, 0, 1, 0, 1, 1, 0],
         [1, 2, 1, 1, 0, 2, 0, 2, 2, 0]], dtype=int8),
  array([[1, 0, 0, 1, 0, 0, 0, 1, 1, 1],
         [2, 1, 1, 2, 0, 0, 0, 2, 1, 2]], dtype=int8)])

# Run with `n_samples` small

In [9]:
N_SAMPLES = 100

In [10]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [11]:
def func():
    for i in range(N_REPS):
        # py_func accesses the original python function, not the numba-optimized one
        # this is needed to be able to profile the function
        _cm.py_func(x, y)

In [12]:
%%timeit -n1 -r1 func()
func()

28 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [13]:
%%prun -s cumulative -l 20 -T 00-n_samples_small.txt
func()

 
*** Profile printout saved to text file '00-n_samples_small.txt'. 


         154 function calls in 0.034 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.034    0.034 {built-in method builtins.exec}
        1    0.000    0.000    0.034    0.034 <string>:1(<module>)
        1    0.000    0.000    0.034    0.034 1556911885.py:1(func)
       10    0.001    0.000    0.034    0.003 coef.py:265(_cm)
       10    0.024    0.002    0.024    0.002 coef.py:198(cdist_parts)
       20    0.009    0.000    0.009    0.000 coef.py:169(_get_parts)
       10    0.000    0.000    0.000    0.000 {method 'argmax' of 'numpy.ndarray' objects}
       20    0.000    0.000    0.000    0.000 coef.py:119(_get_range_n_clusters)
       20    0.000    0.000    0.000    0.000 {built-in method numpy.zeros}
       10    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
       10    0.000    0.000    0.000    0.000 coef.py:248(unravel_index_2d)
       10    0.000    0.000    0.

The bottleneck functions are, in order of importance:
1. `cdist_parts`
1. `_get_parts`

# Run with `n_samples` large

In [14]:
N_SAMPLES = 100000

In [15]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [16]:
def func():
    for i in range(N_REPS):
        # py_func accesses the original python function, not the numba-optimized one
        # this is needed to be able to profile the function
        _cm.py_func(x, y)

In [17]:
%%timeit -n1 -r1 func()
func()

18.7 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [18]:
%%prun -s cumulative -l 20 -T 00-n_samples_large.txt
func()

 
*** Profile printout saved to text file '00-n_samples_large.txt'. 


         154 function calls in 18.817 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   18.817   18.817 {built-in method builtins.exec}
        1    0.000    0.000   18.817   18.817 <string>:1(<module>)
        1    0.004    0.004   18.817   18.817 1556911885.py:1(func)
       10    0.008    0.001   18.813    1.881 coef.py:265(_cm)
       20   10.568    0.528   10.568    0.528 coef.py:169(_get_parts)
       10    8.237    0.824    8.237    0.824 coef.py:198(cdist_parts)
       20    0.001    0.000    0.001    0.000 {built-in method numpy.zeros}
       10    0.000    0.000    0.000    0.000 {method 'argmax' of 'numpy.ndarray' objects}
       20    0.000    0.000    0.000    0.000 coef.py:119(_get_range_n_clusters)
       10    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
       10    0.000    0.000    0.000    0.000 special.py:18(__new__)
       10    0.000    0.000    0.000   

The bottleneck functions now are **different**, in order of importance:
1. `_get_parts`
1. `cdist_parts`