# Description

Similar as `06` but with numba disabled to compare with a pure Python implementation.

Here I had to reduce the number of `n_genes`, since it takes too much otherwise.

# Disable numba

In [1]:
%env NUMBA_DISABLE_JIT=1

env: NUMBA_DISABLE_JIT=1


# Remove pycache dir

In [2]:
!echo ${CODE_DIR}

/opt/code


In [3]:
!find ${CODE_DIR}/libs -regex '^.*\(__pycache__\)$' -print

/opt/code/libs/ccc/pytorch/__pycache__
/opt/code/libs/ccc/utils/__pycache__
/opt/code/libs/ccc/scipy/__pycache__
/opt/code/libs/ccc/sklearn/__pycache__
/opt/code/libs/ccc/__pycache__
/opt/code/libs/ccc/coef/__pycache__


In [4]:
!find ${CODE_DIR}/libs -regex '^.*\(__pycache__\)$' -prune -exec rm -rf {} \;

In [5]:
!find ${CODE_DIR}/libs -regex '^.*\(__pycache__\)$' -print

# Modules

In [6]:
import numpy as np

from ccc.coef import ccc

In [7]:
# let numba compile all the code before profiling
ccc(np.random.rand(10), np.random.rand(10))

0.15625

# Data

In [8]:
n_genes, n_samples = 50, 1000

In [9]:
np.random.seed(0)

In [10]:
data = np.random.rand(n_genes, n_samples)

In [11]:
data.shape

(50, 1000)

# Profile

In [12]:
def func():
    n_clust = list(range(2, 10 + 1))
    return ccc(data, internal_n_clusters=n_clust)

In [13]:
%%timeit func()
func()

52.7 s ± 120 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [14]:
%%prun -s cumulative -l 50 -T 08-cm_many_genes.txt
func()

 
*** Profile printout saved to text file '08-cm_many_genes.txt'. 


**CONCLUSIONS:** compared with notebook `06` (which has 500 rows (`n_genes`) instead of 50 here), this one would have taken 2.80 hours for 500 rows based on this results. Whereas the numba-compiled version took ~7 minutes.