# Description

It profiles some functions to compute the correlation between predicted gene expression. Each of these notebooks is supposed to be run in a particular changeset.

**Before running this notebook**, make sure you are in this changeset:
```bash
# the changes tried to improve the performance of Gene._get_snps_cov
git co 4a60b950f0e75cd6c100181dfcd4ae3255f4765b
```

In [1]:
%load_ext line_profiler

# Modules

In [2]:
from entity import Gene

# Functions

In [3]:
def compute_ssm_correlation(all_genes):
    res = []
    for g1_idx, g1 in enumerate(all_genes[:-1]):
        for g2 in all_genes[g1_idx:]:
            c = g1.get_ssm_correlation(
                g2,
                reference_panel="1000G",
                model_type="MASHR",
                use_within_distance=False,
            )
            res.append(c)
    return res

# Test case

In [4]:
gene1 = Gene(ensembl_id="ENSG00000180596")
gene2 = Gene(ensembl_id="ENSG00000180573")
gene3 = Gene(ensembl_id="ENSG00000274641")
gene4 = Gene(ensembl_id="ENSG00000277224")

all_genes = [gene1, gene2, gene3, gene4]

In [5]:
assert len(set([g.chromosome for g in all_genes])) == 1

# Run timeit

In [6]:
%timeit compute_ssm_correlation(all_genes)

17.2 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Profile

In [7]:
%prun -l 20 -s cumulative compute_ssm_correlation(all_genes)

 

         68183719 function calls (67942487 primitive calls) in 34.418 seconds

   Ordered by: cumulative time
   List reduced from 500 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   34.418   34.418 {built-in method builtins.exec}
        1    0.000    0.000   34.418   34.418 <string>:1(<module>)
        1    0.000    0.000   34.418   34.418 85958312.py:1(compute_ssm_correlation)
        9    0.001    0.000   34.418    3.824 entity.py:1037(get_ssm_correlation)
       27    0.196    0.007   34.398    1.274 entity.py:985(get_tissues_correlations)
    64827    0.547    0.000   34.084    0.001 entity.py:896(get_expression_correlation)
    59174    0.109    0.000   31.253    0.001 indexing.py:864(__getitem__)
    59174    0.346    0.000   31.114    0.001 indexing.py:1078(_getitem_axis)
    59120    0.158    0.000   29.882    0.001 indexing.py:1011(_getitem_iterable)
    59120    0.280    0.000   22.612    0

In [8]:
%prun -l 20 -s time compute_ssm_correlation(all_genes)

 

         68183719 function calls (67942487 primitive calls) in 34.335 seconds

   Ordered by: internal time
   List reduced from 500 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
236588/118294    2.751    0.000   14.617    0.000 base.py:293(__new__)
 16114676    2.586    0.000    4.185    0.000 {built-in method builtins.isinstance}
962134/961918    1.213    0.000    1.215    0.000 {built-in method numpy.array}
  5398467    1.187    0.000    1.598    0.000 generic.py:10(_check)
   189854    1.088    0.000    1.526    0.000 {pandas._libs.lib.infer_dtype}
  1184803    0.764    0.000    1.971    0.000 common.py:1460(is_extension_array_dtype)
8367109/8367091    0.707    0.000    0.717    0.000 {built-in method builtins.getattr}
  1184884    0.702    0.000    0.999    0.000 base.py:413(find)
  1325050    0.654    0.000    2.739    0.000 base.py:256(is_dtype)
  1256006    0.629    0.000    1.156    0.000 common.py:1600(_is_dtype_type)
 

# Profile by line

## Function `get_expression_correlation`

In [9]:
%lprun -f Gene.get_expression_correlation compute_ssm_correlation(all_genes)

Timer unit: 1e-06 s

Total time: 43.857 s
File: /opt/code/libs/entity.py
Function: get_expression_correlation at line 896

Line #      Hits         Time  Per Hit   % Time  Line Contents
   896                                               def get_expression_correlation(
   897                                                   self,
   898                                                   other_gene,
   899                                                   tissue: str,
   900                                                   other_tissue: str = None,
   901                                                   reference_panel: str = "GTEX_V8",
   902                                                   model_type: str = "MASHR",
   903                                                   use_within_distance=True,
   904                                               ):
   905                                                   """
   906                                                   Given anothe

## Function `_get_snps_cov`

In [10]:
%lprun -f Gene._get_snps_cov compute_ssm_correlation(all_genes)

Timer unit: 1e-06 s

Total time: 1.87564 s
File: /opt/code/libs/entity.py
Function: _get_snps_cov at line 771

Line #      Hits         Time  Per Hit   % Time  Line Contents
   771                                               @staticmethod
   772                                               def _get_snps_cov(
   773                                                   snps_ids_list1,
   774                                                   snps_ids_list2=None,
   775                                                   check=False,
   776                                                   reference_panel="GTEX_V8",
   777                                                   model_type="MASHR",
   778                                               ):
   779                                                   """
   780                                                   Given one or (optionally) two lists of SNPs IDs, it returns the
   781                                                   covariance

# Profile by line - 2nd round

## Function `get_tissues_correlations`

In [11]:
%lprun -f Gene.get_tissues_correlations compute_ssm_correlation(all_genes)

Timer unit: 1e-06 s

Total time: 44.5961 s
File: /opt/code/libs/entity.py
Function: get_tissues_correlations at line 985

Line #      Hits         Time  Per Hit   % Time  Line Contents
   985                                               def get_tissues_correlations(
   986                                                   self,
   987                                                   other_gene,
   988                                                   tissues: list = None,
   989                                                   reference_panel: str = "GTEX_V8",
   990                                                   model_type: str = "MASHR",
   991                                                   use_within_distance=True,
   992                                               ):
   993                                                   """
   994                                                   It computes the correlation matrix for two genes across all tissues.
   995              