# Description

It profiles some functions to compute the correlation between predicted gene expression. Each of these notebooks is supposed to be run in a particular changeset.

**Before running this notebook**, make sure you are in this changeset:
```bash
# the changes tried to improve the performance by activating lru_cache for method Gene.get_tissues_correlations
git co e6a706cc8102cbf83b0adf29485a72428a28b6c0
```

In [1]:
%load_ext line_profiler

# Modules

In [2]:
from entity import Gene

# Functions

In [3]:
def compute_ssm_correlation(all_genes):
    res = []
    for g1_idx, g1 in enumerate(all_genes[:-1]):
        for g2 in all_genes[g1_idx:]:
            c = g1.get_ssm_correlation(
                g2,
                reference_panel="1000G",
                model_type="MASHR",
                use_within_distance=False,
            )
            res.append(c)
    return res

# Test case

In [4]:
gene1 = Gene(ensembl_id="ENSG00000180596")
gene2 = Gene(ensembl_id="ENSG00000180573")
gene3 = Gene(ensembl_id="ENSG00000274641")
gene4 = Gene(ensembl_id="ENSG00000277224")

all_genes = [gene1, gene2, gene3, gene4]

In [5]:
assert len(set([g.chromosome for g in all_genes])) == 1

# Run timeit

In [6]:
%timeit compute_ssm_correlation(all_genes)

11.1 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Profile

In [7]:
%prun -l 20 -s cumulative compute_ssm_correlation(all_genes)

 

         27139 function calls (26446 primitive calls) in 0.020 seconds

   Ordered by: cumulative time
   List reduced from 229 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.020    0.020 {built-in method builtins.exec}
        1    0.000    0.000    0.020    0.020 <string>:1(<module>)
        1    0.000    0.000    0.020    0.020 85958312.py:1(compute_ssm_correlation)
        9    0.001    0.000    0.019    0.002 entity.py:1041(get_ssm_correlation)
       27    0.000    0.000    0.009    0.000 frame.py:2809(T)
       27    0.000    0.000    0.009    0.000 frame.py:2687(transpose)
        9    0.000    0.000    0.007    0.001 frame.py:1221(__rmatmul__)
       63    0.000    0.000    0.006    0.000 frame.py:441(__init__)
       36    0.000    0.000    0.005    0.000 frame.py:1105(dot)
       63    0.000    0.000    0.005    0.000 construction.py:143(init_ndarray)
       27    0.000    0.000    0.005

In [8]:
%prun -l 20 -s time compute_ssm_correlation(all_genes)

 

         27139 function calls (26446 primitive calls) in 0.020 seconds

   Ordered by: internal time
   List reduced from 229 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       18    0.003    0.000    0.004    0.000 linalg.py:1483(svd)
     4851    0.001    0.000    0.002    0.000 {built-in method builtins.isinstance}
        9    0.001    0.000    0.020    0.002 entity.py:1041(get_ssm_correlation)
2313/1746    0.000    0.000    0.001    0.000 {built-in method builtins.len}
  342/288    0.000    0.000    0.001    0.000 {built-in method numpy.array}
     1737    0.000    0.000    0.001    0.000 generic.py:10(_check)
2916/2898    0.000    0.000    0.001    0.000 {built-in method builtins.getattr}
       27    0.000    0.000    0.009    0.000 frame.py:2687(transpose)
       63    0.000    0.000    0.005    0.000 construction.py:143(init_ndarray)
       90    0.000    0.000    0.002    0.000 blocks.py:2677(get_block_type)
       99

# Conclusion

Considering the time reported by `%%timeit`, the average speed to compute the SSM correlation between a pair of genes was:
 * without core improvement (`00_00-gls-profiling-old_code.ipynb`): `40.4 s ± 30.5 ms`
 * after core improvements (`01_01-gls-profiling-new_code.ipynb`): `1.41 s ± 8.15` (more than 28 times faster).
 * after activating lru_cache: `11.1 ms ± 112 µs` (3639 times faster than baseline and 127 than core improvements).