Skip to content

s-a-nersisyan/miRGTF-net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miRGTF-net

About

miRGTF-net is a tool for construction of miRNA-gene-TF regulatory networks using both database-level and transcriptomics/miRNomics data.

Requirements

  • Python 3.6+
  • scikit-learn (0.23+), networkx (2.3+), pandas, numpy, scipy

Usage

python3 run.py /path/to/config.json

All running options should be specified in config.json file (see example with default values in the repository):

  • interaction_databases_path: path to folder where interaction databases are stored. A database should be presented as a two-column tab-separated (tsv) file, see examples in input_examples/interaction_databases folder (all human databases listed in the manuscript + databases for mouse)
  • sign_constraints: any user-defined constraints on interaction directions (signs) should be put there. Do not write anything for databases without constraints. In example we assume negative correlation between miRNA and its target gene (miRTarBase_v8_0) and positive correlation between host gene and corresponding intronic miRNA
  • gene_expression_table_path: path to tsv table containing gene expression estimates. Rows of a table should correspond to genes, columns to samples, see example in input_examples/gene_expression.tsv (100 breast cancer samples from TCGA-BRCA).
  • miRNA_expression_table_path: same as for gene expression but for miRNAs. Note that order of samples should match for gene and miRNA expression tables (see input_examples/miRNA_expression.tsv)
  • output_path: path were all output files should be stored
  • Spearman_correlation_cutoff_percentile: percentile, specifying fraction of edges which will be removed during pre-processing. For example, value equal to 90 will preserve edges with top-10% Spearman correlations. The lower value is, the more "weak" edges will appear in the constructed network
  • incoming_score_threshold: threshold on regression goodness of fit (R^2 values). The higher value is, the more confident regulations will be included in the network
  • interaction_score_cutoff_percentile: percentile specifying fraction of edges which should be discarded at the final step. For example, value equal to 10 will discard edges with lowest-10% interaction scores. The higher value is, the more edges will be discarded

TCGA data

ER+ breast cancer expression data from TCGA used in the manuscript are avaiable at https://drive.google.com/drive/folders/1c50BIlfF2SlnswJ2iTD0pXG6Rzt0Rs0m?usp=sharing.

For user convenience, we also added instruction and prepared a script how to import arbitrary TCGA data subset:

  1. Access GDC Data Portal (https://portal.gdc.cancer.gov/)
  2. Select a project (e.g. TCGA-BRCA), add *.FPKM.txt.gz (RNA-seq) and *.isoforms.quantification.txt (miRNA-seq) files to the cart
  3. Download the cart and Sample Sheet, unpack *.tar.gz archive
  4. Go to the TCGA folder and run python3 process.py /path/to/sample_sheet.tsv /path/to/gdc_download_unpacked_folder . This will generate gene and miRNA expression tables (TPM) for matched primary tumors as described in the manuscript. GENCODE v22 and miRBase v22.1 are used as default versions.

Note about miRNA/gene identifiers

Please be careful when using different miRNA/gene/TF databases – they could have built on different versions of miRBase/RefSeq/Ensembl, which will result in missing nodes/edges. We recommend using miRBaseConverter (https://taoshengxu.shinyapps.io/mirbaseconverter/) and HGNC Multi-symbol checker (https://www.genenames.org/tools/multi-symbol-checker/) for miRNA/gene name convertations. All pre-included miRNA databases (both for human and mouse) are already compatible with miRBase v22.1.

Network vizualization

miRGTF-net uses graphml format to export networks and their weakly/strongly connected components. We recommend using Gephi (https://gephi.org) or yEd Graph Editor (https://www.yworks.com/products/yed) to vizualize them in effective and beautiful way.

References

If you use miRGTF-net in your research, please cite the following manuscript:

Nersisyan S, Galatenko A, Galatenko V, Shkurnikov M, Tonevitsky A (2021) miRGTF-net: Integrative miRNA-gene-TF network analysis reveals key drivers of breast cancer recurrence. PLOS ONE 16(4): e0249424. https://doi.org/10.1371/journal.pone.0249424

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages