# Description

Similar as `07` but with numba disabled to compare with a pure Python implementation.

# Use only one CPU core

In [1]:
%env CM_N_JOBS=1
%env NUMBA_NUM_THREADS=1
%env MKL_NUM_THREADS=1
%env OPEN_BLAS_NUM_THREADS=1
%env NUMEXPR_NUM_THREADS=1
%env OMP_NUM_THREADS=1

env: CM_N_JOBS=1
env: NUMBA_NUM_THREADS=1
env: MKL_NUM_THREADS=1
env: OPEN_BLAS_NUM_THREADS=1
env: NUMEXPR_NUM_THREADS=1
env: OMP_NUM_THREADS=1


# Disable numba

In [2]:
%env NUMBA_DISABLE_JIT=1

env: NUMBA_DISABLE_JIT=1


# Remove pycache dir

In [3]:
!echo ${CODE_DIR}

/opt/code


In [4]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/ccc/coef/__pycache__
/opt/code/libs/ccc/pytorch/__pycache__
/opt/code/libs/ccc/scipy/__pycache__
/opt/code/libs/ccc/utils/__pycache__
/opt/code/libs/ccc/__pycache__
/opt/code/libs/ccc/sklearn/__pycache__


In [5]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -prune -exec rm -rf {} \;

In [6]:
!find ${CODE_DIR} -regex '^.*\(__pycache__\)$' -print

# Modules

In [7]:
import numpy as np

from ccc.coef import ccc

In [8]:
# let numba compile all the code before profiling
ccc(np.random.rand(10), np.random.rand(10))

0.20454545454545456

# Data

In [9]:
n_genes, n_samples = 10, 30000

In [10]:
np.random.seed(0)

In [11]:
data = np.random.rand(n_genes, n_samples)

In [12]:
data.shape

(10, 30000)

# With default `internal_n_clusters`

In [13]:
def func():
    n_clust = list(range(2, 10 + 1))
    return ccc(data, internal_n_clusters=n_clust)

In [14]:
%%timeit func()
func()

4.47 s ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [15]:
%%prun -s cumulative -l 50 -T 07_01-disable_numba-cm_many_samples-default_internal_n_clusters.txt
func()

 
*** Profile printout saved to text file '09-cm_many_samples-default_internal_n_clusters.txt'. 


These results are just slightly worse than the numba-compiled version (notebook `07`).

# With reduced `internal_n_clusters`

In [16]:
def func():
    n_clust = list(range(2, 5 + 1))
    return ccc(data, internal_n_clusters=n_clust)

In [17]:
%%timeit func()
func()

579 ms ± 5.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%prun -s cumulative -l 50 -T 07_01-disable_numba-cm_many_samples-less_internal_n_clusters.txt
func()

 
*** Profile printout saved to text file '09-cm_many_samples-less_internal_n_clusters.txt'. 


These results are slightly better than the numba-compiled version (notebook `07`), which is surprising. In the future, it would be interesting to disable threading here to get accurate profiling results to debug this issue.