# MIOPY: Use cases

In this tutorial, we demonstrate how MIOPY can be used to study the microRNA/mRNA interaction from expression data.

For this tutorial, we use the TCGA-LUAD dataset.

## Use Case S1: MicroRNAs targeting immune modulators including PD-L1

We were intereseted in finding out which are the most important microRNAs regulating immune-checkpoints in tumor cells.

#### Loading the example dataset

In [1]:
import miopy as mp
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]:
dfMir, dfRna, metadata = mp.load_dataset("TCGA-OV")

**We filtered to keep only primary tumor samples**.


In [3]:
dfExpr = mp.concat_matrix(dfMir,dfRna)
dfExpr = dfExpr.loc[metadata.query('sample_type == "PrimaryTumor"').index,:]

#### Run Correlation

In the use case from the publication, we used the Immune Checkpoint (ICBI) geneset, but in this case we reduce the number of genes to reduce the computational times. We can run all the methods with mp.all_methods, every methods can be running indivdually.

In [4]:
lGene = open("genesets/geneset_Immune checkpoints [ICBI].txt","r").read().split()
lGene[0:1]

['PDCD1']

In [None]:
res, pearson = mp.all_methods(dfExpr, lMirUser = None, lGeneUser = lGene[0:5]+["CD274"], n_core = 4, background = True, test = True)

Obtain Concat Gene and MIR
Number of genes: 4
Number of miRNAs: 347
Number of samples: 371
Number of features: 12567
None
                 hsa-let-7a-5p  hsa-let-7a-2-3p  hsa-let-7b-5p  hsa-let-7b-3p   
TCGA-24-1558-01      15.885907         4.910062      14.740380       5.629762  \
TCGA-13-0766-01      12.432427         5.897493      14.427845       6.770913   
TCGA-36-1570-01      15.585205         1.541587      15.947901       5.939390   
TCGA-29-1763-01      15.913488         4.312351      15.449972       6.861324   
TCGA-29-1695-01      16.484594         4.992670      16.172414       6.733752   

                 hsa-let-7c-5p  hsa-let-7c-3p  hsa-let-7d-5p  hsa-let-7d-3p   
TCGA-24-1558-01      15.350796       5.137807       8.427782      11.349796  \
TCGA-13-0766-01      11.212190       1.253637       7.086527      13.601978   
TCGA-36-1570-01      14.710954       3.792548       9.549007      12.687799   
TCGA-29-1763-01      14.045072       2.511255       7.785404      11.629016

As result, the function return a table with all the microRNA/mRNA pairs and the coeficient obatin for each method. 

In [None]:
res.loc[res["P-Value"] < 0.05,:].sort_values("P-Value")

**Filtering the results**

Let's now run mp.FilterDF() to keep the most important microRNAs/mRNA pair. FilterDf allow to filter the pairs through the coeficients, the adjust pvalue, and/or the number of prediction tools that predict the interaction. In the publications, we use and FDR < 0.1, coef < -0.3, and min_db > 10.

In [None]:
table, matrix = mp.FilterDF(table = res, matrix = pearson, join = "or", low_coef = -0.2, high_coef = 1, pval = 0.1, analysis = "Correlation", min_db = 10)

MIO implement the BORDA ranking sistem, which use all the metrics in the table to ranking the microRNA/mRNA pairs from the most relevant.

In [None]:
table[["Ranking","Mir","Gene"]].head()

#### Predict Target

MIO integrate a custom database from a variety of target prediction tools. In MIO a target prediction can be done using only the 40 integrate prediction tools, or using a gene expression data. In this example, we predict the microRNA whih targeting CD274 (PDL1) using the database, and using the previous results.

**Using only the 40 prediction tools**

In [None]:
table, matrix = mp.predict_target(lTarget = ["CD274",], min_db = 10)

In [None]:
table.sort_values("Number Prediction Tools", ascending=False).head()

**Using the correlation result**

In [None]:
table, matrix = mp.predict_target(table = res, matrix = None, lTarget = ["CD274",], lTools = None, method = "or", min_db = 5, low_coef = -0.2, high_coef = 1, pval = 0.1)
table.sort_values("Ranking").head()

## Use Case S2: Genes involved in antigen processing and presentation by microRNAs

Deficient or down regulated genes of the antigen processing and presentation machinery have been associated with response prediction to cancer immunotherapy. In order to study, which microRNAs are potentially able to down regulate the complete pathwey we perfom a correlation analysis using a weigthed expression score.

In [None]:
dfPval

In [None]:
lGene = open("genesets/geneset_Antigen Processig and Presentation [ImmPort].txt","r").read().split()
dfCor, dfPval, dfSetScore = mp.gene_set_correlation(dfExpr, lGene, GeneSetName = "Antigen Processig and Presentation [ImmPort]", 
                                                    lMirUser = ["hsa-miR-181a-2-3p","hsa-miR-125b-5p","hsa-miR-130a-3p"], n_core = 8)

In [None]:
dfCor

gene_set_correlation return 3 elements: the pearson's coefficients, the p.value, and the calculate module score for each sample and microRNA.

In [None]:
dfPval.columns = ["P.val"]

In [None]:
table = pd.concat([dfCor, dfPval], axis = 1)

In [None]:
table.sort_values("Antigen Processig and Presentation [ImmPort]").head()

## Use Case S3: Identifying a microRNA signature predictive for survival

In the publication we used the TCGA-CRC dataset to predict microRNA related with the microsatelite inestability. In this case, we are going to use the TCGA-LUAD to predict the survival (death status) samples. This is only an example about how to use the function.

In [None]:
from miopy.feature_selection import feature_selection

In [None]:
data = pd.concat([dfMir.transpose(),metadata.loc[:,"event"]], axis = 1)
data = data.dropna()

In [None]:
top_feature, dAll, DictScore = feature_selection(data, k = 10, topk = 25, group = "event")

Th feature selection return the top predictors most informative in separating the death status in the TCGA-LUAD patients. Now, we can use this predictors to training a model, and see how robust are these microRNAs.

In [None]:
from miopy.classification import classification_cv

In [None]:
results = classification_cv(data, k = 5, name = "Random Forest", group = "event", lFeature = top_feature.index)


## Use Case S4: MicroRNA target genes synthetic lethal to immune (therapy) essential genes

In order to identify synthetic lethal partner genes in tumor cells we have taken advantage of previous efforts and used the ISLE algorithm for calculation (Lee et al., 2018), which is available within MIO. We were
interested in identifying microRNAs targeting genes which are synthetic lethal to immune(therapy) essential genes. We used the option Target Prediction, miRNA Synthetic Lethal Prediction. 

In addition, MIOPY can perform an overrepresentation analysis for microRNAs based on the number of synthetic lethal target genes compared to all potential target genes.

In [None]:
lGene = open("genesets/geneset_Immune essential genes [Patel].txt","r").read().split()

In [None]:
target, matrix, ora = mp.predict_lethality2(lQuery = lGene, lTools = None, method = "or", min_db = 25)

In [None]:
target.sort_values("Number Prediction Tools", ascending=False).head()

In [None]:
ora.sort_values("FDR").head()