# Description

Exactly the same code as in `09`, but here I disable numba.

# Disable numba

In [1]:
%env NUMBA_DISABLE_JIT=1

env: NUMBA_DISABLE_JIT=1


# Remove pycache dir

In [2]:
!echo ${CODE_DIR}

/opt/code


In [3]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/clustermatch/__pycache__
/opt/code/libs/clustermatch/sklearn/__pycache__
/opt/code/libs/clustermatch/scipy/__pycache__
/opt/code/libs/clustermatch/pytorch/__pycache__


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -prune -exec rm -rf {} \;

In [5]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [6]:
import numpy as np

from ccc.coef import ccc

# Settings

In [7]:
N_REPS = 10

In [8]:
np.random.seed(0)

# Setup

In [9]:
# let numba compile all the code before profiling
ccc(np.random.rand(10), np.random.rand(10))

0.15625

# Run with `n_samples` small

## `n_samples=50`

In [10]:
N_SAMPLES = 50

In [11]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [12]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [13]:
%%timeit func()
func()

141 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [14]:
%%prun -s cumulative -l 20 -T 10-n_samples_small_50.txt
func()

 
*** Profile printout saved to text file '10-n_samples_small_50.txt'. 


## `n_samples=100`

In [15]:
N_SAMPLES = 100

In [16]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [17]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [18]:
%%timeit func()
func()

428 ms ± 39.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%%prun -s cumulative -l 20 -T 10-n_samples_small_100.txt
func()

 
*** Profile printout saved to text file '10-n_samples_small_100.txt'. 


## `n_samples=500`

In [20]:
N_SAMPLES = 500

In [21]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [22]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [23]:
%%timeit func()
func()

441 ms ± 24.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [24]:
%%prun -s cumulative -l 20 -T 10-n_samples_small_500.txt
func()

 
*** Profile printout saved to text file '10-n_samples_small_500.txt'. 


## `n_samples=1000`

In [25]:
N_SAMPLES = 1000

In [26]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [27]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [28]:
%%timeit func()
func()

858 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [29]:
%%prun -s cumulative -l 20 -T 10-n_samples_small_1000.txt
func()

 
*** Profile printout saved to text file '10-n_samples_small_1000.txt'. 


**CONCLUSION:** as expected, with relatively small samples, the numba-compiled version (`09-cdist_parts_v04`) performs much better than the non-compiled one.

# Run with `n_samples` large

## `n_samples=50000`

In [30]:
N_SAMPLES = 50000

In [31]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [32]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [33]:
%%timeit func()
func()

2.23 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [34]:
%%prun -s cumulative -l 20 -T 10-n_samples_large_50000.txt
func()

 
*** Profile printout saved to text file '10-n_samples_large_50000.txt'. 


## `n_samples=100000`

In [35]:
N_SAMPLES = 100000

In [36]:
x = np.random.rand(N_SAMPLES)
y = np.random.rand(N_SAMPLES)

In [37]:
def func():
    for i in range(N_REPS):
        ccc(x, y)

In [38]:
%%timeit func()
func()

4.25 s ± 11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [39]:
%%prun -s cumulative -l 20 -T 10-n_samples_large_100000.txt
func()

 
*** Profile printout saved to text file '10-n_samples_large_100000.txt'. 


**CONCLUSION:** this is unexpected. With very large samples, the python version performs better! Something to look at in the future. The profiling file for 100,000 samples () shows that the `cdist_parts_parallel` is taking more time in the numba-compiled version than in the python version. Maybe the compiled ARI implementation could be improved in these cases with large samples.