# Description

Now `cdist_parts` has been optimized with previous profiling tests.

Here we profile function `_get_parts`.

Here I disabled njit in `_get_parts` and `run_quantile_clustering` to be able to profile.

Here I tried `scipy.stats.rankdata` instead of the `rank` function I wrote.

# Remove pycache dir

In [1]:
!echo ${CODE_DIR}

/opt/code


In [2]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/__pycache__


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -exec rm -rf {} \;

find: ‘/opt/code/libs/clustermatch/__pycache__’: No such file or directory


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [5]:
import numpy as np

from clustermatch.coef import _cm

# Settings

In [6]:
N_REPS = 10

In [7]:
np.random.seed(0)

# Setup

In [8]:
# let numba compile all the code before profiling
_cm(np.random.rand(10), np.random.rand(10))

(array([0.15625]),
 array([[0, 1]], dtype=uint64),
 array([[[0, 1, 1, 0, 0, 1, 0, 1, 1, 0],
         [1, 2, 1, 1, 0, 2, 0, 2, 2, 0]],
 
        [[1, 0, 0, 1, 0, 0, 0, 1, 1, 1],
         [2, 1, 1, 2, 0, 0, 0, 2, 1, 2]]], dtype=int16))

# Run with `n_samples` small

In [9]:
N_SAMPLES = 100

In [10]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [11]:
def func():
    for i in range(N_REPS):
        # py_func accesses the original python function, not the numba-optimized one
        # this is needed to be able to profile the function
        _cm(x, y)

In [12]:
%%timeit -n1 -r1 func()
func()

37.5 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [13]:
%%prun -s cumulative -l 20 -T 06-n_samples_small.txt
func()

 
*** Profile printout saved to text file '06-n_samples_small.txt'. 


         21654 function calls in 0.059 seconds

   Ordered by: cumulative time
   List reduced from 64 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.059    0.059 {built-in method builtins.exec}
        1    0.000    0.000    0.059    0.059 <string>:1(<module>)
        1    0.000    0.000    0.059    0.059 691993785.py:1(func)
       10    0.001    0.000    0.059    0.006 coef.py:251(_cm)
       20    0.001    0.000    0.036    0.002 coef.py:154(_get_parts)
      180    0.006    0.000    0.032    0.000 coef.py:63(run_quantile_clustering)
      180    0.005    0.000    0.022    0.000 stats.py:8631(rankdata)
       10    0.022    0.002    0.022    0.002 coef.py:183(cdist_parts)
      360    0.004    0.000    0.012    0.000 index_tricks.py:323(__getitem__)
     1620    0.002    0.000    0.010    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
       20    0.000    0.000  

In this case (small number of samples), `cdist_parts` is still the most consuming function, followed by `rank` (`tottime`).

# Run with `n_samples` large

In [14]:
N_SAMPLES = 100000

In [15]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [16]:
def func():
    for i in range(N_REPS):
        # py_func accesses the original python function, not the numba-optimized one
        # this is needed to be able to profile the function
        _cm(x, y)

In [17]:
%%timeit -n1 -r1 func()
func()

7.14 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


In [18]:
%%prun -s cumulative -l 20 -T 06-n_samples_large.txt
func()

 
*** Profile printout saved to text file '06-n_samples_large.txt'. 


         21654 function calls in 7.834 seconds

   Ordered by: cumulative time
   List reduced from 64 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.834    7.834 {built-in method builtins.exec}
        1    0.000    0.000    7.834    7.834 <string>:1(<module>)
        1    0.000    0.000    7.834    7.834 691993785.py:1(func)
       10    0.016    0.002    7.834    0.783 coef.py:251(_cm)
       20    0.015    0.001    3.914    0.196 coef.py:154(_get_parts)
       10    3.902    0.390    3.902    0.390 coef.py:183(cdist_parts)
      180    0.134    0.001    3.602    0.020 coef.py:63(run_quantile_clustering)
      360    3.045    0.008    3.045    0.008 {method 'argsort' of 'numpy.ndarray' objects}
      180    0.300    0.002    1.943    0.011 stats.py:8631(rankdata)
     1620    0.019    0.000    1.858    0.001 {built-in method numpy.core._multiarray_umath.implement_array_function}
      540    0.0

**Large improvement** using the scipy rankdata function. The current `rank` function needs optimization.