# miRNA-Seq

(2019.04.02)

## Only valid miRNAs

On the paper [Evolutionary history of plant microRNAs](https://doi.org/10.1016/j.tplants.2013.11.008), miRBase is scan for valid miRNAs. For a miRNA to be considered valid...

* miRNA sequence must have high complementarity to opposing arm (>= 15 nt)
* It should be observed an precision on 5' cleavage
* Little heterogeneity in the sequence matching to the miRNA precursor. If not, this is a siRNA, not a miRNA.
* The miRNA* should be present

The list of valid *Vitis vinifera* miRNAs is provided in the addtional files. Here, I will scan my miRNAs list, for the valid miRNAs, and remove the non validated ones.

In [None]:
import pandas

### Importing list of valid miRNAs

Note: This list does not contain only valid miRNAs. It contains the indication if the miRNA is valid or not! miRNA should pass on miRNA* or Structure to be considered valid.

In [None]:
valid_mirnas = pandas.read_csv('valid_mirnas.tsv',
                               sep = '\t'
                              )

valid_mirnas.head()

In [None]:
valid_mirnas.shape

In [None]:
valid_mirnas_pass = valid_mirnas[(valid_mirnas['miRNA*'] == '✓')
                                 | (valid_mirnas['Structure'] == '✓')
                                ]
valid_mirnas_pass.shape

Only 4 miRNAs should be removed from our lists... To be honest, may be we do nor even got those.

In [None]:
valid_mirnas_pass_list = valid_mirnas_pass['ID'].tolist()
valid_mirnas_all_list = valid_mirnas['ID'].tolist()

### Importing miRNAs from miRBase

#### Hairpins

In [None]:
mirbase_hairpins_counts = pandas.read_csv('../2019_miRNA_sequencing/mirbase_hairpins_counts.tsv',
                                          sep = '\t'
                                         )

mirbase_hairpins_counts.head()

In [None]:
mirbase_hairpins_counts.shape

Below is the list of miRNAs that were not considered valid. This does not mean that the miRNAs are invalid, only that the miRNAs were not considered valid (but they may have not even be tested)

In [None]:
mirbase_hairpins_counts[~mirbase_hairpins_counts['miRNA'].isin(valid_mirnas_pass_list)]

In [None]:
mirbase_hairpins_counts_non_valid = mirbase_hairpins_counts[~mirbase_hairpins_counts['miRNA'].isin(valid_mirnas_pass_list)]['miRNA'].tolist()

[mirna for mirna in mirbase_hairpins_counts_non_valid if mirna in valid_mirnas_all_list]

The miRNAs listed above, is the list of miRNAs that are present in the analysis provided on the paper and on our results, but did not pass. The other miRNAs present on the table, were just not checked along the paper.

Instead of checkig the valid miRNAs, as this will not consider the miRNAs that were not processed on the paper, we should remove directly the non valid miRNAs.

In [None]:
valid_mirnas_nopass = valid_mirnas[(valid_mirnas['miRNA*'] == '✗')
                                 & (valid_mirnas['Structure'] == '✗')
                                ]
valid_mirnas_nopass.shape

In [None]:
valid_mirnas_nopass_list = valid_mirnas_nopass['ID'].tolist()

In [None]:
mirbase_hairpins_counts_pass = mirbase_hairpins_counts[~mirbase_hairpins_counts['miRNA'].isin(valid_mirnas_nopass_list)]
mirbase_hairpins_counts_pass.head()

Those were the valid or non processed miRNAs.

In [None]:
mirbase_hairpins_de = pandas.read_csv('../2019_miRNA_sequencing/mirbase_hairpins_diffexpression.tsv',
                                          sep = '\t'
                                         )

mirbase_hairpins_de.head()

In [None]:
mirbase_hairpins_de_pass = mirbase_hairpins_de[~mirbase_hairpins_de['miRNA'].isin(valid_mirnas_nopass_list)]
mirbase_hairpins_de_pass.head()

Next, are the no pass!

In [None]:
mirbase_hairpins_de[mirbase_hairpins_de['miRNA'].isin(valid_mirnas_nopass_list)].head()

### Saving the new lists to new files

In [None]:
mirbase_hairpins_counts_pass.to_csv('mirbase_hairpins_counts_pass.tsv',
                                    sep = '\t'
                                   )


mirbase_hairpins_de_pass.to_csv('mirbase_hairpins_diffexpression_pass.tsv',
                                sep = '\t'
                               )