# miRNA-mRNA target prediction


### 1. Problem definition

This project aims to <b>predict if an input pair of sequences (miRNA and mRNA) will interact</b>; given that the molecules interact, they constitute a pair, whereas if they do not interact, they are defined as non-pair.

Since the mechanisms underlying the targeting process can differ between organisms, this study is <b>focused on the A. thaliana organism</b>. This is a model organism, meaning that the results obtained can be extrapolated to other plants; furthermore, they can be used to understand human diseases due to the conservation of protein function, conservation of cellular processes, and the high percentage of genes shared between both species [31, 32].

Considering the above, this problem can be shaped as a <b>binary classification problem</b>, where 0 means non-pair and 1 represents a pair.

The data proposed to train a DNN able to distinguish between pair and non-pair RNA sequences consists of curated interactions that are publicly available and reported in the literature [14].

Although such miRNA-mRNA interacting pairs can be used to train the network, they represent only positive examples that were tested experimentally either in vivo or in vitra. This implies that the <b>available datasets are unbalanced</b>, and denotes the need to incorporate negative data before proceeding.

### 2. Dataset selection

<b>Positive examples</b> of interactions were downloaded from MirTarBase [14]. This database was selected based on the following criteria.
- A. thaliana miRNA/mRNA target interaction available.
- Interactions are curated based on experimental evidence.
- The database is continuously maintained and updated.
- The data is publicly available.
[14, 40]

Validated <b>negative examples</b> are only available for H. sapiens. Using the available H. sapiens validated interactions [33], homologous sequences between A. thaliana and H. sapiens can be used to assemble a dataset.

### 3. Success measures

TODO


### 4. Evaluation protocols
TODO


### 5. Data preparation
In the context of this project, a negative example constitutes a pair of molecules (miRNA and mRNA) that do NOT interact. Given the lack of experimentally confirmed negative examples, the available datasets for target prediction are highly unbalanced, containing exclusively positive data (pairs that interact). To overcome this problem, a negative publicly available curated dataset for human (hsa) miRNA target prediction is proposed [33]. However, since this project proposal uses A. thaliana (ath) as organism to study, only the homologous and highly conserved genes across both organisms will be considered.

The methodology and process of mapping those genes across organisms is presented in this section.

#### 5.1. Negative dataset - Preprocessing

For the preprocessing stage, the RefSeq IDs were converted into Gene symbol IDs using Genomics Biotools [35], then the invalid IDs and duplicates were removed from the dataset.


#### 5.2. Negative dataset - Methodology
The methodology for preparing a negative dataset consists of:
- Extracting the H. sapiens miRNA sequence.
- Getting the H. sapiens target sequence.
- Finding the homologous miRNA sequences in A. thaliana (pairwise alignment - Local [41]).
- Getting the homologous target sequence in A. thaliana (pairwise alignment - Global [41]).

Since it is not guaranteed that the interaction miRNA-mRNA results will hold given the presence of gaps in the alignments, the opening of a gap in the alignment should be penalized as well as its size or extension.

Furthermore, given the nature of the proposed research problem, where mature miRNA sequences have a length of 17-22 nucleotides and pair with binding sites of the same length in targets [2, 4], finding the exact sequences in homologous genes is more valuable than an overall high similarity using global alignments. Considering the above, local alignments are used in this project.


#### 5.3. Negative dataset - Dataset generation
##### 5.3.1. Sequence extraction
To extract the miRNA sequences, a dataset containing all known mature miRNA sequences was downloaded from miRBase [36], whereas to locate the target sequences, the full H. sapiens genome was used [37, 38].

Once the sequences are extracted, they are integrated with the negative dataset [33].

In [1]:
# Load the FASTA file containing all known miRNA sequences for all organisms.
with open('data/mature_mirna_all_organisms.fa') as f:
    mature_mirnas = f.read().split('>')[1:]
    f.close()

# Isolate H. sapiens (hsa) and A. thaliana (ath) sequences.
hsa_mature_mirnas_dict = {mirna.split(' ')[0]: mirna.split('\n')[1]
                          for mirna in mature_mirnas
                          if 'hsa-' in mirna}

ath_mature_mirnas_dict = {mirna.split('\n')[1]: mirna.split(' ')[0]
                          for mirna in mature_mirnas
                          if 'ath-' in mirna}

print(f'Total hsa miRNAs: {len(hsa_mature_mirnas_dict)}')
print(f'Total ath miRNAs: {len(ath_mature_mirnas_dict)}')


Total hsa miRNAs: 2655
Total ath miRNAs: 350


In order to extract the similar sequences in A. thaliana, all the miRNAs must be aligned between both organisms. This implies that 953145 local pairwise alignments will occur, where each pair can have at least one possible alignment configuration. To optimize memory usage and performance, in the next section (5.3.3. Sequence matching - Homology search and pairwise alignment in A. thaliana) only the most relevant (best score) alignment configuration for each pair will be retrieved.

In [2]:
# Load the list of miRNAs included in the negative dataset [33].
with open('data/negative_pairs/hsa_mirnas.txt') as f:
    hsa_negative_mirnas = f.read().split('\n')
    f.close()


In [3]:
# Match the negative dataset [1] miRNAs with the respective mature miRNA miRBase [3] sequence.
hsa_negative_mirnas_seq = [hsa_mature_mirnas_dict[mirna]
                           if mirna in hsa_mature_mirnas_dict.keys() else mirna
                           for mirna in hsa_negative_mirnas]

# Create a separate file holding only the sequences.
with open('data/negative_pairs/hsa_mirnas_seq.txt', 'w') as f:
    f.write('\n'.join(hsa_negative_mirnas_seq))
    f.close()


In [4]:
# Get the hsa target sequences from Human genome.
with open('data/hsa/GCF_000001405.40_GRCh38.p14_rna.fna') as f:
    hsa_genome = f.read().split('>')[1:]
    f.close()

hsa_genome_dict = {gene.split('\n')[0].split('),')[0].split(' (')[-1]: ''.join(gene.split('\n')[1:])
                   for gene in hsa_genome}


In [5]:
# Get the list of hsa target genes included in the negative dataset.
with open('data/negative_pairs/hsa_targets.txt') as f:
    hsa_negative_targets = f.read().split('\n')
    f.close()


In [6]:
# Match the negative hsa target names with the respective sequence from the hsa genome [37, 38].
hsa_negative_targets_seq = [hsa_genome_dict[target] if target in hsa_genome_dict.keys() else '-'
                            for target in hsa_negative_targets]

# Create a separate file holding only the target sequences.
with open('data/negative_pairs/hsa_targets_seq.txt', 'w') as f:
    f.write('\n'.join(hsa_negative_targets_seq))
    f.close()


##### 5.3.2. Sequence matching - Homology search and pairwise alignment in A. thaliana
To create the negative dataset for A. thaliana organism, homologous sequences to those appearing in the H. sapiens dataset [33] will be considered. Potential target mRNA sequences from A. thaliana are extracted from the last released genome [29], and the respective mature miRNA sequences are downloaded from miRBase database [36].

To compare the similarity between sequences, pairwise nucleotide local alignment is executed with gap penalty [41], and only sequences with final score >= 70% are stored in the new dataset. Since the gaps and mismatches are more important in miRNA comparisons, the penalties for gap opening and extension used for miRNA alignments are greater than those used for target comparisons.


In [7]:
import warnings
warnings.filterwarnings('ignore')
## TODO remove cell


In [8]:
import gc
from Bio import pairwise2
from Bio.Seq import Seq
from tqdm import tqdm

# Alignment for hsa and ath miRNAs.
matching_mirnas = {}
for hsa_mirna_name, hsa_mirna_seq in tqdm(hsa_mature_mirnas_dict.items()):
    # Convert Human sequence into Byophyton Sequence object.
    hsa_seq = Seq(hsa_mirna_seq)
    hsa_len = len(hsa_mirna_seq)
    for ath_mirna_seq, ath_mirna_name in ath_mature_mirnas_dict.items():
        # Convert A. thaliana sequence into Byophyton Sequence object.
        ath_seq = Seq(ath_mirna_seq)
        # Performs the alignment.
        try:
            # Although the algorithm tries all the possible alignments, this part of the code filters and stores only the most relevant one (i.e., minimized gaps and maximized score).
            # To speed up the calculations and optimize memory usage, the tested alignments are discarded and only the best score float is retrieved.
            # The parameters for the local alignment include:
            #   Matching nucleotides     =   +1 score points
            #   Mismatching nucleotides  =   -2 score points
            #   Opening a gap            =   -0.5 score points
            #   Continuing the gap       =   -0.2 score points
            # Note: The maximum score is the total length of the shortest sequence.
            best_alignment_score = pairwise2.align.localms(hsa_seq, ath_seq,
                                                           1, -2,
                                                           -.5, -.2,
                                                           score_only=True)

            # Expresses the score as a percentage based on the shortest sequence.
            ath_len = len(ath_mirna_seq)
            shortest_len = min(hsa_len, ath_len)
            best_alignment_perc = best_alignment_score / shortest_len
            # Sets the threshold to select or discard the sequence as valid homologous.
            threshold = 0.7
            if best_alignment_perc >= threshold:
                if hsa_mirna_name not in matching_mirnas.keys():
                    matching_mirnas[hsa_mirna_name] = [ath_mirna_name]
                else:
                    matching_mirnas[hsa_mirna_name].append(ath_mirna_name)

        except MemoryError:
            print(f'Memory err: hsa-{hsa_mirna_name}\tath-{ath_mirna_name}')
            gc.collect()
            continue

print(f'Total hsa miRNAs with at least one ath homologous: '
      f'{len(matching_mirnas.keys())}')
print(f'Total ath miRNAs homologous to a hsa miRNA: '
      f'{sum([len(mirnas) for mirnas in matching_mirnas.values()])}')


100%|██████████| 2655/2655 [00:52<00:00, 50.71it/s]

Total hsa miRNAs with at least one ath homologous: 100
Total ath miRNAs homologous to a hsa miRNA: 116





These results indicate that there will be at least 116 non-pair interactions to create a negative set. The number of negative examples in the dataset can increase considering that miRNAs can have more than one non-target.

The next step in the process is to find the target genes that are homologous between both organisms. For this purpose, a global alignment is proposed with gap penalty. The global alignment is suitable for this purpose because unlike miRNAs, target transcripts are longer, which implies that the local alignment could take too long. Furthermore, the threshold and penalty values are set lower, selecting those homologous genes with at least 65% similarity.

In [8]:
# Get the all the A. thaliana sequences in the genome.
with open('data/ath/GCF_000001735.4_TAIR10.1_rna.fna') as f:
    ath_genome = f.read().split('>')[1:]
    f.close()

ath_genome_dict = {''.join(gene.split('\n')[1:]): gene.split('\n')[0].split('),')[0].split(' (')[-1]
                   for gene in ath_genome}

print(f'Total genes in A. thaliana genome: {len(ath_genome_dict.keys())}')
print(f'Total target genes in negative dataset: {len(hsa_negative_targets)}')


Total genes in A. thaliana genome: 53743
Total target genes in negative dataset: 235


In this case, 12629605 alignments should be performed, thus a multiprocessing approach is adopted.

In [9]:
import itertools

# Prepares a list with all the possible permutations of hsa and ath genes.
hsa_ath_gene_permutations = list(itertools.product(
    [(name, seq) for name, seq in zip(hsa_negative_targets, hsa_negative_targets_seq)],
    [(name, seq) for seq, name in ath_genome_dict.items()]
))


In [None]:
from concurrent.futures import ProcessPoolExecutor
from alignments_targets import align_targets
import multiprocessing
from tqdm import tqdm
import gc

num_cores = multiprocessing.cpu_count()

if __name__ == '__main__':
    with ProcessPoolExecutor(max_workers=num_cores) as pool:
        with tqdm(total=len(hsa_ath_gene_permutations)) as progress:
            futures = []
            results = []

            for gene_pair in hsa_ath_gene_permutations:
                hsa_gene = gene_pair[0]
                ath_gene = gene_pair[1]

                future = pool.submit(align_targets, [hsa_gene, ath_gene])
                future.add_done_callback(lambda x: progress.update())
                futures.append(future)

            try:
                for future in futures:
                    result = future.result()
                    results.append(result)

            except Exception as e:
                print('---ERROR: ', e, hsa_gene[0], ath_gene[0])
                gc.collect()
                raise Exception


  0%|          | 9379/12629605 [02:50<198:32:16, 17.66it/s] 

In [10]:
# Look for all target genes existing in both species. ## TODO: Check if remove cell or not.
for hsa_name in hsa_negative_targets:
    if hsa_name in ath_genome_dict.values():
        print(hsa_name)


CA2
CUL1
HIRA
GATB
ABCA4
PEX12
ARPC1A
PEX11B
KAT5
PES1
MSH2
MSH2
ARPC1B
GPX3
GATA6
CA2
GPX3
ABCG4
ACAT2


In [11]:
hsa_negative_targets

['ITK',
 'ACAP1',
 'SLC25A5',
 'ACAP1',
 'EZH2',
 'PSMD8',
 'GOLGA4',
 'THRB',
 'AMT',
 'CILP',
 'STXBP1',
 'NCKIPSD',
 'CA2',
 'SCHIP1',
 'ASIC2',
 'ARHGEF3',
 'SLBP',
 'ARHGEF4',
 'MPP1',
 'BSN',
 'POLR3K',
 'QDPR',
 'CHGB',
 'USP6',
 'PKM',
 'LMO2',
 'INPP5A',
 'MPP1',
 'STX16',
 'NPEPL1',
 'PPP1R8',
 'CUL1',
 'HIRA',
 'HARS1',
 'PRKCB',
 'POLR2E',
 'GATB',
 'AZGP1',
 'GNG11',
 'FXYD3',
 'HTR2B',
 'S100A2',
 'NPDC1',
 'PKP1',
 'SEMG2',
 'CALCRL',
 'RPS6KA5',
 'MAP3K6',
 'ABCA4',
 'MUC4',
 'SLC9A3R2',
 'FXYD3',
 'DNAJB2',
 'ALDOC',
 'ABR',
 'ARHGEF9',
 'AP3B2',
 'ELOC',
 'NEFL',
 'BMP3',
 'QDPR',
 'GLUD1',
 'NRCAM',
 'HRAS',
 'NRGN',
 'ANAPC5',
 'USF2',
 'GTF2F2',
 'GSTM4',
 'PLIN2',
 'RGN',
 'OGDH',
 'PRKCD',
 'FOLR2',
 'VWF',
 'AGPAT2',
 'PEX12',
 'HMBS',
 'CD37',
 'ATP5MC1',
 'HMOX2',
 'DGCR6',
 'KCNN4',
 'ELAC1',
 'HSPG2',
 'GAD1',
 'UBE2V2',
 'ZNF91',
 'ARPC1A',
 'PEX11B',
 'CRYM',
 'HLF',
 'HARS1',
 'FXYD6',
 'DUSP8',
 'FXYD5',
 'CPA3',
 'SPI1',
 'MAGEA1',
 'PMCH',
 'ACE2',
 'B

### 6. Model beating the baseline

### 7. Model overfitting


### 8. Regularization and hyperparameter tuning

In [12]:
import tensorflow as tf

2023-08-29 17:09:36.026473: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### References

[1] A. Pla, X. Zhong, and S. Rayner, “miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts,” PLOS Computational Biology, vol. 14, no. 7, p. e1006185, Jul. 2018, doi: 10.1371/journal.pcbi.1006185.

[2] J. O’Brien, H. Hayder, Y. Zayed, and C. Peng, “Overview of MicroRNA Biogenesis, Mechanisms of Actions, and Circulation,” Frontiers in Endocrinology, vol. 9, 2018, Accessed: May 04, 2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fendo.2018.00402

[3] A. Quillet et al., “Improving Bioinformatics Prediction of microRNA Targets by Ranks Aggregation,” Frontiers in Genetics, vol. 10, 2020, Accessed: May 04, 2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fgene.2019.01330

[4] H. Nakayashiki, ‘RNA silencing in fungi: Mechanisms and applications’, FEBS Letters, vol. 579, no. 26, pp. 5950–5957, Oct. 2005, doi: 10.1016/j.febslet.2005.08.016.

[5] T. Kakati, D. K. Bhattacharyya, J. K. Kalita, and T. M. Norden-Krichmar, ‘DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning’, BMC Bioinformatics, vol. 23, no. 1, p. 17, Jan. 2022, doi: 10.1186/s12859-021-04527-4.

[6] B. Hanczar, F. Zehraoui, T. Issa, and M. Arles, ‘Biological interpretation of deep neural network for phenotype prediction based on gene expression’, BMC Bioinformatics, vol. 21, no. 1, p. 501, Nov. 2020, doi: 10.1186/s12859-020-03836-4.

[7] D. Urda, J. Montes-Torres, F. Moreno, L. Franco, and J. M. Jerez, ‘Deep Learning to Analyze RNA-Seq Gene Expression Data’, in Advances in Computational Intelligence, I. Rojas, G. Joya, and A. Catala, Eds., in Lecture Notes in Computer Science, vol. 10306. Cham: Springer International Publishing, 2017, pp. 50–59. doi: 10.1007/978-3-319-59147-6_5.

[8] ‘Central Dogma’, Genome.gov, Sep. 14, 2022. https://www.genome.gov/genetics-glossary/Central-Dogma (accessed May 07, 2023).

[9] A. Talukder, W. Zhang, X. Li, and H. Hu, “A deep learning method for miRNA/isomiR target detection,” Sci Rep, vol. 12, no. 1, Art. no. 1, Jun. 2022, doi: 10.1038/s41598-022-14890-8.

[10] O. P. Gupta, P. Sharma, R. K. Gupta, and I. Sharma, “Current status on role of miRNAs during plant–fungus interaction,” Physiological and Molecular Plant Pathology, vol. 85, pp. 1–7, Jan. 2014, doi: 10.1016/j.pmpp.2013.10.002.

[11] E. Marín-González and P. Suárez-López, “‘And yet it moves’: Cell-to-cell and long-distance signaling by plant microRNAs,” Plant Science, vol. 196, pp. 18–30, Nov. 2012, doi: 10.1016/j.plantsci.2012.07.009.

[12] T. Siddika and I. U. Heinemann, “Bringing MicroRNAs to Light: Methods for MicroRNA Quantification and Visualization in Live Cells,” Frontiers in Bioengineering and Biotechnology, vol. 8, 2021, Accessed: Apr. 18, 2023. [Online]. Available: https://www.frontiersin.org/articles/10.3389/fbioe.2020.619583

[13] J. K. W. Lam, M. Y. T. Chow, Y. Zhang, and S. W. S. Leung, “siRNA Versus miRNA as Therapeutics for Gene Silencing,” Mol Ther Nucleic Acids, vol. 4, no. 9, p. e252, Sep. 2015, doi: 10.1038/mtna.2015.23.

[14] “miRTarBase: the experimentally validated microRNA-target interactions database.” https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2022/php/index.php (accessed May 08, 2023).

[15] “Gene Regulation,” Genome.gov, Sep. 14, 2022. https://www.genome.gov/genetics-glossary/Gene-Regulation (accessed May 09, 2023).

[16] C. Stylianopoulou, “Carbohydrates: Regulation of metabolism,” in Encyclopedia of Human Nutrition (Fourth Edition), B. Caballero, Ed., Oxford: Academic Press, 2023, pp. 126–135. doi: 10.1016/B978-0-12-821848-8.00173-6.

[17] L. He and G. J. Hannon, “MicroRNAs: small RNAs with a big role in gene regulation,” Nat Rev Genet, vol. 5, no. 7, Art. no. 7, Jul. 2004, doi: 10.1038/nrg1379.

[18] D. Pradhan, A. Kumar, H. Singh, and U. Agrawal, “Chapter 4 - High-throughput sequencing,” in Data Processing Handbook for Complex Biological Data Sources, G. Misra, Ed., Academic Press, 2019, pp. 39–52. doi: 10.1016/B978-0-12-816548-5.00004-6.

[19] B. Hanczar, F. Zehraoui, T. Issa, and M. Arles, “Biological interpretation of deep neural network for phenotype prediction based on gene expression,” BMC Bioinformatics, vol. 21, no. 1, p. 501, Nov. 2020, doi: 10.1186/s12859-020-03836-4.

[20] A. L. Leitão and F. J. Enguita, “A Structural View of miRNA Biogenesis and Function,” Non-Coding RNA, vol. 8, no. 1, Art. no. 1, Feb. 2022, doi: 10.3390/ncrna8010010.

[21] ‘Gene Expression | Learn Science at Scitable’. https://www.nature.com/scitable/topicpage/gene-expression-14121669/ (accessed May 07, 2023).

[22] W. Guo, Y. Xu, and X. Feng, ‘DeepMetabolism: A Deep Learning System to Predict Phenotype from Genome Sequencing’. arXiv, May 08, 2017. doi: 10.48550/arXiv.1705.03094.

[23] M. Wen, P. Cong, Z. Zhang, H. Lu, and T. Li, ‘DeepMirTar: a deep-learning approach for predicting human miRNA targets’, Bioinformatics, vol. 34, no. 22, pp. 3781–3787, Nov. 2018, doi: 10.1093/bioinformatics/bty424.

[24] X. M. Xu and S. G. Møller, ‘The value of Arabidopsis research in understanding human disease states’, Curr Opin Biotechnol, vol. 22, no. 2, pp. 300–307, Apr. 2011, doi: 10.1016/j.copbio.2010.11.007.

[25] G. P. Way and C. S. Greene, ‘Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders’. bioRxiv, p. 174474, Aug. 11, 2017. doi: 10.1101/174474.

[26] J. Rocca, ‘Understanding Variational Autoencoders (VAEs)’, Medium, Mar. 21, 2021. https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73 (accessed Jun. 07, 2023).

[27] C. H. Grønbech, M. F. Vording, P. Timshel, C. K. Sønderby, T. H. Pers, and O. Winther, ‘scVAE: Variational auto-encoders for single-cell gene expression data’. bioRxiv, p. 318295, Oct. 02, 2019. doi: 10.1101/318295.

[28] K. Y. Gao, A. Fokoue, H. Luo, A. Iyengar, S. Dey, and P. Zhang, ‘Interpretable Drug Target Prediction Using Deep Neural Representation’, in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden: International Joint Conferences on Artificial Intelligence Organization, Jul. 2018, pp. 3371–3377. doi: 10.24963/ijcai.2018/468.

[29] ‘Arabidopsis thaliana (ID 4) - Genome - NCBI’. https://www.ncbi.nlm.nih.gov/genome/4?genome_assembly_id=380024 (accessed Jul. 02, 2023).

[30] G. B. Or and I. Veksler-Lublinsky, ‘Comprehensive machine-learning-based analysis of microRNA-target interactions reveals variable transferability of interaction rules across species’. bioRxiv, p. 2021.03.28.437385, Mar. 29, 2021. doi: 10.1101/2021.03.28.437385.

[31] ‘Arabidopsis thaliana (ID 4) - Genome - NCBI’. https://www.ncbi.nlm.nih.gov/genome/4?genome_assembly_id=380024 (accessed Jul. 02, 2023).

[32] X. Chen, ‘Small RNAs – secrets and surprises of the genome’, Plant J, vol. 61, no. 6, pp. 941–958, Mar. 2010, doi: 10.1111/j.1365-313X.2009.04089.x.

[33] S. Bandyopadhyay and R. Mitra, ‘TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples’, Bioinformatics, vol. 25, no. 20, pp. 2625–2631, Oct. 2009, doi: 10.1093/bioinformatics/btp503.

[34] ‘PmiREN: Plant microRNA Encyclopedia’. https://www.pmiren.com/download (accessed Aug. 04, 2023).

[35] ‘refSeq Accession to Gene Symbol Converter - Genomics Biotools’. https://www.biotools.fr/mouse/refseq_symbol_converter (accessed Aug. 07, 2023).

[36] ‘miRBase - Downloads’. https://mirbase.org/download/ (accessed Aug. 13, 2023).

[37] ‘Genome’, NCBI. https://www.ncbi.nlm.nih.gov/datasets/genome/?taxon=9606 (accessed Aug. 13, 2023).

[38] ‘11968211 - Assembly - NCBI’. https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001405 (accessed Aug. 13, 2023).

[39] B. Murcott, R. J. Pawluk, A. V. Protasio, R. Y. Akinmusola, D. Lastik, and V. L. Hunt, ‘stepRNA: Identification of Dicer cleavage signatures and passenger strand lengths in small RNA sequences’, Frontiers in Bioinformatics, vol. 2, 2022, Accessed: Aug. 18, 2023. [Online].

[40] H.-Y. Huang et al., ‘miRTarBase update 2022: an informative resource for experimentally validated miRNA–target interactions’, Nucleic Acids Research, vol. 50, no. D1, pp. D222–D230, Jan. 2022, doi: 10.1093/nar/gkab1079.

[41] ‘Bio.pairwise2 module — Biopython 1.75 documentation’. https://biopython.org/docs/1.75/api/Bio.pairwise2.html (accessed Aug. 29, 2023).
