# Description

Now `_get_parts` has been optimized with previous profiling tests.

Here we profile function `cdist_parts` again.

This version is the final test with the completely optimized ccc's functions.

This notebook was run on my laptop.

# Remove pycache dir

In [1]:
!echo ${CODE_DIR}

/opt/code


In [2]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/scipy/__pycache__
/opt/code/libs/clustermatch/sklearn/__pycache__
/opt/code/libs/clustermatch/__pycache__
/opt/code/libs/clustermatch/pytorch/__pycache__


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -exec rm -rf {} \;

find: ‘/opt/code/libs/clustermatch/scipy/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/sklearn/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/__pycache__’: No such file or directory
find: ‘/opt/code/libs/clustermatch/pytorch/__pycache__’: No such file or directory


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [5]:
import numpy as np

from ccc.coef import ccc

# Settings

In [6]:
N_REPS = 10

In [7]:
np.random.seed(0)

# Setup

In [8]:
# let numba compile all the code before profiling
ccc(np.random.rand(10), np.random.rand(10))

0.15625

# Run with `n_samples` small

## `n_samples=50`

In [9]:
N_SAMPLES = 50

In [10]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [11]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [12]:
%%timeit func()
func()

17.3 ms ± 647 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [13]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_50.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_50.txt'. 


         6815 function calls in 0.028 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.028    0.028 {built-in method builtins.exec}
        1    0.000    0.000    0.028    0.028 <string>:1(<module>)
        1    0.000    0.000    0.028    0.028 1517976664.py:1(func)
       10    0.001    0.000    0.028    0.003 coef.py:275(cm)
       10    0.001    0.000    0.020    0.002 coef.py:414(compute_coef)
       10    0.000    0.000    0.019    0.002 coef.py:407(cdist_func)
       10    0.002    0.000    0.019    0.002 coef.py:168(cdist_parts_parallel)
      132    0.001    0.000    0.015    0.000 threading.py:280(wait)
      540    0.015    0.000    0.015    0.000 {method 'acquire' of '_thread.lock' objects}
       65    0.000    0.000    0.014    0.000 threading.py:556(wait)
       70    0.000    0.000    0.012    0.000 _base.py:201(as_comple

## `n_samples=100`

In [14]:
N_SAMPLES = 100

In [15]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [16]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [17]:
%%timeit func()
func()

34 ms ± 878 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [18]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_100.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_100.txt'. 


         9175 function calls in 0.046 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.046    0.046 {built-in method builtins.exec}
        1    0.000    0.000    0.046    0.046 <string>:1(<module>)
        1    0.000    0.000    0.046    0.046 1517976664.py:1(func)
       10    0.001    0.000    0.045    0.005 coef.py:275(cm)
       10    0.001    0.000    0.037    0.004 coef.py:414(compute_coef)
       10    0.000    0.000    0.036    0.004 coef.py:407(cdist_func)
       10    0.002    0.000    0.036    0.004 coef.py:168(cdist_parts_parallel)
      203    0.001    0.000    0.030    0.000 threading.py:280(wait)
      810    0.029    0.000    0.029    0.000 {method 'acquire' of '_thread.lock' objects}
      100    0.000    0.000    0.028    0.000 threading.py:556(wait)
      100    0.001    0.000    0.027    0.000 _base.py:201(as_comple

## `n_samples=500`

In [19]:
N_SAMPLES = 500

In [20]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [21]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [22]:
%%timeit func()
func()

49.8 ms ± 422 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [23]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_500.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_500.txt'. 


         9391 function calls in 0.062 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.062    0.062 {built-in method builtins.exec}
        1    0.000    0.000    0.062    0.062 <string>:1(<module>)
        1    0.000    0.000    0.062    0.062 1517976664.py:1(func)
       10    0.001    0.000    0.062    0.006 coef.py:275(cm)
       10    0.001    0.000    0.048    0.005 coef.py:414(compute_coef)
       10    0.000    0.000    0.047    0.005 coef.py:407(cdist_func)
       10    0.002    0.000    0.047    0.005 coef.py:168(cdist_parts_parallel)
      215    0.001    0.000    0.045    0.000 threading.py:280(wait)
      850    0.045    0.000    0.045    0.000 {method 'acquire' of '_thread.lock' objects}
      108    0.000    0.000    0.040    0.000 threading.py:556(wait)
      100    0.001    0.000    0.039    0.000 _base.py:201(as_comple

## `n_samples=1000`

In [24]:
N_SAMPLES = 1000

In [25]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [26]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [27]:
%%timeit func()
func()

69.3 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [28]:
%%prun -s cumulative -l 20 -T 08-n_samples_small_1000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_small_1000.txt'. 


         9577 function calls in 0.083 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.083    0.083 {built-in method builtins.exec}
        1    0.000    0.000    0.083    0.083 <string>:1(<module>)
        1    0.000    0.000    0.083    0.083 1517976664.py:1(func)
       10    0.001    0.000    0.083    0.008 coef.py:275(cm)
      223    0.001    0.000    0.066    0.000 threading.py:280(wait)
      882    0.065    0.000    0.065    0.000 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    0.064    0.006 coef.py:414(compute_coef)
       10    0.000    0.000    0.063    0.006 coef.py:407(cdist_func)
       10    0.002    0.000    0.062    0.006 coef.py:168(cdist_parts_parallel)
      115    0.000    0.000    0.055    0.000 threading.py:556(wait)
      100    0.001    0.000    0.054    0.001 _base.py:201(as_comple

# Run with `n_samples` large

## `n_samples=50000`

In [29]:
N_SAMPLES = 50000

In [30]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [31]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [32]:
%%timeit func()
func()

2.47 s ± 110 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [33]:
%%prun -s cumulative -l 20 -T 08-n_samples_large_50000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_large_50000.txt'. 


         9633 function calls in 2.469 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.469    2.469 {built-in method builtins.exec}
        1    0.000    0.000    2.469    2.469 <string>:1(<module>)
        1    0.000    0.000    2.469    2.469 1517976664.py:1(func)
       10    0.003    0.000    2.469    0.247 coef.py:275(cm)
      220    0.001    0.000    2.448    0.011 threading.py:280(wait)
      890    2.448    0.003    2.448    0.003 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    1.676    0.168 coef.py:414(compute_coef)
       10    0.000    0.000    1.675    0.168 coef.py:407(cdist_func)
       10    0.001    0.000    1.675    0.167 coef.py:168(cdist_parts_parallel)
      120    0.000    0.000    1.668    0.014 threading.py:556(wait)
      100    0.001    0.000    1.667    0.017 _base.py:201(as_comple

## `n_samples=100000`

In [34]:
N_SAMPLES = 100000

In [35]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [36]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [37]:
%%timeit func()
func()

5.13 s ± 115 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [38]:
%%prun -s cumulative -l 20 -T 08-n_samples_large_100000.txt
func()

 
*** Profile printout saved to text file '08-n_samples_large_100000.txt'. 


         9647 function calls in 5.917 seconds

   Ordered by: cumulative time
   List reduced from 120 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.917    5.917 {built-in method builtins.exec}
        1    0.000    0.000    5.917    5.917 <string>:1(<module>)
        1    0.000    0.000    5.917    5.917 1517976664.py:1(func)
       10    0.005    0.001    5.917    0.592 coef.py:275(cm)
      222    0.001    0.000    5.890    0.027 threading.py:280(wait)
      894    5.889    0.007    5.889    0.007 {method 'acquire' of '_thread.lock' objects}
       10    0.001    0.000    4.013    0.401 coef.py:414(compute_coef)
       10    0.000    0.000    4.011    0.401 coef.py:407(cdist_func)
       10    0.002    0.000    4.011    0.401 coef.py:168(cdist_parts_parallel)
      120    0.000    0.000    4.002    0.033 threading.py:556(wait)
      100    0.001    0.000    4.001    0.040 _base.py:201(as_comple