# miRe2e

This notebook is a quick guide to use the methods from:

J. Raad, L. A. Bugnon, D. H. Milone and G. Stegmayer, "**miRe2e: a full
end-to-end deep model based on  Transformers for prediction
of pre-miRNAs from raw genome-wide data**", 2021.


## Installation
The package is available in PyPI and uses standard packages from the python ecosystem, thus it is straightforward:

In [1]:
pip install -U miRe2e  > /dev/null

Note: you may need to restart the kernel to use updated packages.


# Quick start
In this demo we will predict pre-miRNAs on H. sapiens chromosome 19. As it will take a while, you can run a fast check of the model using a short input sequence like the following


In [1]:
# Short exerpt of chr19
!wget https://raw.githubusercontent.com/sinc-lab/miRe2e/master/examples/chr19_13836201_13836660_true.fa  > /dev/null

# Notice that the input file is a raw sequence string, like this one
#>chr19
#AGGTCTGATTCTGAGTCCTCATCTCTGCTCCAAGCATCAGCCCACCCAGGGAAGGCAGGG
#GCTGCAGGCTCCAAGGGGGCTTGACCCCTGTTCCTGCTGAACTGAGCCAGTGTACACAAA
#CCAACTGTGTTTCAGCTCAGTAGGCACGGGAGGCAGAGCCCAGGGAGGCCAGGCAGCAGG
#ATGGCAGGCAGACAGGCGGCAGCAGGGGACAGGCGGCAAGGCCAGAGGAGGTGAGGGCCT
#GGGGGGCGGAACTTAGCCACTGTGAACACGACTTGGTGTGGACCCTGCTCACAAGCAGCT
#AAGCCCTGCTCCTCAGGCCAGGCACAGGCTTCGGGGCCTCTCTGCCACCCCGTCCCCGGG
#CAGCATCCTCGGTGGCAGAGCTCAGGGTCGGTTGGAAATCCCTGGCAATGTGATTTGTGA
#CAGGAAGCAAATCCCATCCCCAGGAACCCCAGCCGGCCG

filename = "chr19_13836201_13836660_true.fa"

--2023-07-08 21:08:08--  https://raw.githubusercontent.com/sinc-lab/miRe2e/master/examples/chr19_13836201_13836660_true.fa
Resolviendo raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Conectando con raw.githubusercontent.com (raw.githubusercontent.com)[185.199.108.133]:443... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 474 [text/plain]
Grabando a: «chr19_13836201_13836660_true.fa.6»


2023-07-08 21:08:08 (42,3 MB/s) - «chr19_13836201_13836660_true.fa.6» guardado [474/474]



The following runs the prediction on the raw RNA sequence. The input fasta file is analyzed with a sliding window, and a score is obtained for each one.

First, let's load the model. The model has 3 stages:
1.  Structure prediction model: predicts RNA secondary structure using only the input sequence.
2. MFE estimation model: estimates the minimum free energy when folding (MFE) the secondary  structure.
3. Pre-miRNA classifier: uses the input RNA sequence and the outputs of the two previous
  models to give a score to the input sequence in order to determine if it is a  pre-miRNA candidate.  


In [2]:
from miRe2e import MiRe2e

# Create an instance. Pre-trained weights are download by default. New model
# weights can be given as well (see source documentation)
model = MiRe2e()

The ```predict``` method uses the pretrained model to analyze the input fasta and return the scores

In [3]:
scores_5_3, scores_3_5, index = model.predict(filename, batch_size=4096)

# It returns the scores for each window and the position in pb (stored in index)
print()
print(scores_5_3[:3])
print(index[:3])

Loading sequences...
Number of sequences: 36
Done


  0%|          | 0/1 [00:00<?, ?it/s]

100%|██████████| 1/1 [00:00<00:00,  3.68it/s]


[4.8296736e-04 2.5739661e-07 1.1705596e-03]
['chr19-0-100', 'chr19-20-120', 'chr19-40-140']



