# Description

It profiles some functions to compute the correlation between predicted gene expression. Each of these notebooks is supposed to be run in a particular changeset.

**Before running this notebook**, make sure you are in this changeset:
```bash
# the changes tried to improve the performance of Gene.get_expression_correlation
git co f24bf2a8b93c7202bb22c39b088f48680aa84bfa
```

In [1]:
%load_ext line_profiler

# Modules

In [2]:
from entity import Gene

# Functions

In [3]:
def compute_ssm_correlation(all_genes):
    res = []
    for g1_idx, g1 in enumerate(all_genes[:-1]):
        for g2 in all_genes[g1_idx:]:
            c = g1.get_ssm_correlation(
                g2,
                reference_panel="1000G",
                model_type="MASHR",
                use_within_distance=False,
            )
            res.append(c)
    return res

# Test case

In [4]:
gene1 = Gene(ensembl_id="ENSG00000180596")
gene2 = Gene(ensembl_id="ENSG00000180573")
gene3 = Gene(ensembl_id="ENSG00000274641")
gene4 = Gene(ensembl_id="ENSG00000277224")

all_genes = [gene1, gene2, gene3, gene4]

In [5]:
assert len(set([g.chromosome for g in all_genes])) == 1

# Run timeit

In [6]:
%timeit compute_ssm_correlation(all_genes)

1.41 s ± 8.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Profile

In [7]:
%prun -l 20 -s cumulative compute_ssm_correlation(all_genes)

 

         4031835 function calls (4027083 primitive calls) in 2.514 seconds

   Ordered by: cumulative time
   List reduced from 460 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.514    2.514 {built-in method builtins.exec}
        1    0.000    0.000    2.514    2.514 <string>:1(<module>)
        1    0.000    0.000    2.514    2.514 85958312.py:1(compute_ssm_correlation)
        9    0.001    0.000    2.514    0.279 entity.py:1039(get_ssm_correlation)
       27    0.061    0.002    2.494    0.092 entity.py:987(get_tissues_correlations)
    64827    0.433    0.000    2.314    0.000 entity.py:898(get_expression_correlation)
    29560    0.298    0.000    1.000    0.000 entity.py:771(_get_snps_cov)
    88680    0.050    0.000    0.926    0.000 <__array_function__ internals>:2(ix_)
92007/89784    0.048    0.000    0.901    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}


In [8]:
%prun -l 20 -s time compute_ssm_correlation(all_genes)

 

         4031835 function calls (4027083 primitive calls) in 2.520 seconds

   Ordered by: internal time
   List reduced from 460 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    64827    0.433    0.000    2.320    0.000 entity.py:898(get_expression_correlation)
    29560    0.304    0.000    1.003    0.000 entity.py:771(_get_snps_cov)
    88680    0.261    0.000    0.819    0.000 index_tricks.py:34(ix_)
181134/180918    0.248    0.000    0.249    0.000 {built-in method numpy.array}
    59120    0.092    0.000    0.403    0.000 base.py:743(to_numpy)
   118564    0.089    0.000    0.217    0.000 numerictypes.py:360(issubdtype)
    59147    0.085    0.000    0.112    0.000 base.py:1032(__iter__)
   237128    0.077    0.000    0.117    0.000 numerictypes.py:286(issubclass_)
       27    0.061    0.002    2.500    0.093 entity.py:987(get_tissues_correlations)
   422969    0.058    0.000    0.058    0.000 {built-in method builtins.is

# Profile by line

## Function `get_expression_correlation`

In [9]:
%lprun -f Gene.get_expression_correlation compute_ssm_correlation(all_genes)

Timer unit: 1e-06 s

Total time: 3.30807 s
File: /opt/code/libs/entity.py
Function: get_expression_correlation at line 898

Line #      Hits         Time  Per Hit   % Time  Line Contents
   898                                               def get_expression_correlation(
   899                                                   self,
   900                                                   other_gene,
   901                                                   tissue: str,
   902                                                   other_tissue: str = None,
   903                                                   reference_panel: str = "GTEX_V8",
   904                                                   model_type: str = "MASHR",
   905                                                   use_within_distance=True,
   906                                               ):
   907                                                   """
   908                                                   Given anoth

## Function `_get_snps_cov`

In [10]:
%lprun -f Gene._get_snps_cov compute_ssm_correlation(all_genes)

Timer unit: 1e-06 s

Total time: 1.61329 s
File: /opt/code/libs/entity.py
Function: _get_snps_cov at line 771

Line #      Hits         Time  Per Hit   % Time  Line Contents
   771                                               @staticmethod
   772                                               def _get_snps_cov(
   773                                                   snps_ids_list1,
   774                                                   snps_ids_list2=None,
   775                                                   check=False,
   776                                                   reference_panel="GTEX_V8",
   777                                                   model_type="MASHR",
   778                                               ):
   779                                                   """
   780                                                   Given one or (optionally) two lists of SNPs IDs, it returns the
   781                                                   covariance