<a href="https://colab.research.google.com/github/monabiyan/github_test/blob/master/The_AMARETTO_framework_via_GitHub_and_Bioconductor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The &#42;AMARETTO framework in R via GitHub and Bioconductor
<i> Multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct across biological systems of human disease</i>
<br><br>
Mohsen Nabian, Celine Everaert, Jayendra Shinde, Shaimaa Bakr, Ted Liefeld, Mikel Hernaez, Thomas Baumert, Michael Reich, Jill Mesirov, Vincent Carey, Olivier Gevaert&#42;, Nathalie Pochet&#42;

## Introduction to the &#42;AMARETTO algorithm and software toolbox

Computational inference of regulatory networks underlying complex human diseases is one of the fundamental goals of systems biology and has shown great promise for deciphering the regulatory cell circuits driving complex disease biology, including cancer. The availability of increasing volumes of multimodal data ranging from multi-omics (e.g., genetic, epigenetic, transcriptomic and proteomic) to imaging and clinical data across multiscale systems (e.g., from model systems to patient studies, from <i>in vitro</i> to <i>in vivo</i> systems) promises to improve our understanding of the regulatory mechanisms underlying complex human diseases. The main challenges are to integrate the multiple levels of multimodal data and to translate them across multiscale biological systems to decipher the underpinnings of human diseases.

Here we introduce the <B>&#42;AMARETTO framework</B> as a toolbox for learning how regulatory networks - cell circuits and their drivers - are shared or distinct within and across biological systems with a broad range of applications, from disease subtyping to driver and drug discovery in studies of human disease, including cancer. The &#42;AMARETTO framework currently consists of two algorithmic software tools:

(<B>1</B>) The <B>AMARETTO algorithm</B> facilitates multimodal inference of regulatory networks within one biological system via multi-omics data fusion (e.g., genetic, epigenetic, transcriptomic, proteomic) and association with phenotypes derived from clinical (e.g., diagnostic and prognostic) and imaging (e.g., histopathology and radiographic) data.

(<B>2</B>) The <B>Community-AMARETTO algorithm</B> enables multiscale inference of how these regulatory networks are shared or distinct across biological systems and diseases across systems (e.g., across model systems and patient studies, and across <i>in vitro</i> and <i>in vivo</i> systems). 

The &#42;AMARETTO framework is available as user-friendly tools from GitHub, Bioconductor, GenePattern, GenomeSpace, GenePattern Notebook and R Jupyter Notebook (see <B>Resources</B>).

Beyond our recent applications to studies of cancer (see <B>References</B>) the &#42;AMARETTO software toolbox is more generally applicable to studies of human disease, including cancer, infectious, neurologic and immune-mediated diseases.

## &#42;AMARETTO core tools and downstream analytic functionalities

The &#42;AMARETTO framework currently consists of two algorithmic software tools. First, <B>AMARETTO</B> infers regulatory networks within each biological system via multi-omics data fusion. Specifically, AMARETTO identifies potential cancer drivers by identifying genes whose genetic and epigenetic cancer aberrations have a direct functional impact on their own transcriptomic or proteomic expression. These (epi)genetic drivers can be augmented, intersected or replaced with predefined candidate drivers with known regulatory function (e.g., transcription factors from TFutils). AMARETTO then connects these drivers in a regulatory program with modules of co-expressed target genes that they putatively control, defined as regulatory modules or cell circuits, using a penalized regulatory program. Next, <B>Community-AMARETTO</B> learns communities or subnetworks by connecting regulatory networks inferred from different systems using an edge betweenness community detection algorithm to identify cell circuits and drivers that are shared and distinct across biological systems and diseases.

The &#42;AMARETTO framework additionally offers tools for <B>downstream analytic functionalities</B> on both module and community levels, including functional annotation of modules and communities (e.g., using known functional categories from MSigDB), stratifying modules and communities for increasingly specific phenotypes (e.g., patient characteristics such as survival, molecular subclasses, known (epi)genetic cancer aberrations, or features derived from non-invasive or histopathology imaging, as well as in-depth studies of etiologies of cancer via spatiotemporal - time course and single-cell - studies in model systems), validation of predicted drivers (e.g., using genetic perturbation studies in model systems – knockdown or overexpression experiments of driver genes), discovering drugs targeting drivers and their predicted target genes (e.g., using chemical perturbation studies in model systems), and systematic assessment and benchmarking of the networks for generalized prediction performance of the (sub)networks.

# Step 1: Before you begin, install software and prepare data for running &#42;AMARETTO

## Installing software

Import libraries required for running the &#42;AMARETTO in R Notebook by running the next code cells.

The &#42;AMARETTO toolbox can be installed from GitHub (development versions) or from Bioconductor (official releases).

For GitHub repositories you can proceed with <B>Step 1.a</B>.

For Bioconductor repositories you can proceed with <B>Step 1.b</B>.

## Preparing data

Example data used in this Notebook is available from <https://www.broadinstitute.org/~npochet/ExampleData/>.

Processed data from TCGA is available from <https://datasets.genepattern.org/?prefix=data/module_support_files/Amaretto/> (To Do: convert .gct files to .rds files) (To Do: prepare files for CCLE).



## Step 1.a. Installing the &#42;AMARETTO toolbox from GitHub (development versions)

### Installing systems libraries on Linux server

In [0]:
system("sudo apt-get install libv8-dev", intern = TRUE, ignore.stderr = TRUE)

### Installing AMARETTO

In [0]:
devtools::install_github("gevaertlab/AMARETTO", ref="for35_develop", dependencies=TRU)

In [0]:
library("AMARETTO")

### Installing Community-AMARETTO

In [0]:
devtools::install_github("broadinstitute/CommunityAMARETTO", ref="develop", dependencies=TRUE)

In [0]:
library("CommunityAMARETTO")

## Step 1.b. Installing the &#42;AMARETTO toolbox from Bioconductor (official releases)

In [0]:
To Do

# Step 2. Running AMARETTO on first example study: inferring regulatory networks via multi-omics data fusion for TCGA LIHC patient cohort

## Step 2.a. Preparing data and parameter settings for running AMARETTO

### Step 2.a.1. Loading multi-omics data from TCGA LIHC cohort

#### Step 2.a.1.1. Loading Gene Expression (MA) data from TCGA LIHC patient cohort (Required)

In [0]:
MA_matrix_LIHC <- readRDS(url("https://www.broadinstitute.org/~npochet/ExampleData/MA_matrix_LIHC.rds"))

#### Step 2.a.1.2. Loading Copy Number Variation (CNV) data from TCGA LIHC cohort (Optional)

In [0]:
CNV_matrix_LIHC <- readRDS(url("https://www.broadinstitute.org/~npochet/ExampleData/CNV_matrix_LIHC.rds"))

#### Step 2.a.1.3. Loading DNA Methylation (MET) data from TCGA LIHC cohort (Optional)

In [0]:
MET_matrix_LIHC <- readRDS(url("https://www.broadinstitute.org/~npochet/ExampleData/MET_matrix_LIHC.rds"))

#### Step 2.a.1.4. Combining multi-omics data sources from TCGA LIHC cohort (Required)

If all previous data sources are available, you can run the following code cell:

In [0]:
ProcessedData_LIHC = list(MA_matrix=MA_matrix_LIHC, CNV_matrix=CNV_matrix_LIHC, MET_matrix=MET_matrix_LIHC)

If only MA and CNV data sources are available, you can run the following code cell:

In [0]:
ProcessedData_LIHC = list(MA_matrix=MA_matrix_LIHC, CNV_matrix=CNV_matrix_LIHC, MET_matrix=NULL)

If only MA and MET data sources are available, you can run the following code cell:

In [0]:
ProcessedData_LIHC = list(MA_matrix=MA_matrix_LIHC, CNV_matrix=NULL, MET_matrix=MET_matrix_LIHC)

If only MA data is available, you can run the following code cell:

In [0]:
ProcessedData_LIHC = list(MA_matrix=MA_matrix_LIHC, CNV_matrix=NULL, MET_matrix=NULL)

### Step 2.a.2. Defining List(s) of Candidate Driver Genes  (Required)

In case CNV and/or MET data are submitted, you can choose to either (1) use those as candidate drivers, or (2) take the union of these with predefined lists of candidate drivers, or (3) take the intersection with predefined lists of candidate drivers.

Alternatively, for example, if only functional genomics (transcriptomic or proteomic) data is available, you should select or upload a predefined list of candidate driver genes (required).

In [0]:
data("Driver_Genes")
TF_candidate_drivers <- Driver_Genes$TFs_TFutils_union

combination_method <- "union"

### Step 2.a.3. Setting parameters for running AMARETTO (Required)

To Do: explain parameters and specify default values/range

In [0]:
NrModules = 10
VarPercentage = 10

### Step 2.a.4. Setting parameters for generating HTML results reports (Optional)

To Do: explain genesets/phenotypes/output

In [0]:
genesets_database_reference <- "H_C2_genesets.gmt"
download.file(url="https://www.broadinstitute.org/~npochet/ExampleData/H_C2_genesets.gmt", destfile=genesets_database_reference)

samples_phenotypes_annotation <- readr::read_delim(url("https://www.broadinstitute.org/~npochet/ExampleData/Samples_LIHC_phenotypes.txt"),"\t")

# Categorize and Factorize phenotypes for better visualization.
samples_phenotypes_annotation$SurvivalTime<-as.factor(cut(samples_phenotypes_annotation$SurvivalTime, 7, include.lowest=TRUE, dig.lab = 5))
samples_phenotypes_annotation$SurvivalCensoring<-as.factor(samples_phenotypes_annotation$SurvivalCensoring)



output_directory_TCGA = "./AMARETTO_report_TCGA/"
dir.create(output_directory_TCGA) 

### Step 2.a.5. Setting parameters for computing on servers (Optional)

Select number of cores for parallel computing:

In [0]:
NrCores = 4

## Step 2.b. Running AMARETTO

In [0]:
AMARETTOinit <- AMARETTO_Initialize(ProcessedData = ProcessedData_LIHC,
                                    Driver_list = TF_candidate_drivers,
                                    method = combination_method,
                                    NrModules = NrModules,
                                    VarPercentage = VarPercentage,
                                    NrCores = NrCores)

AMARETTOresults <- AMARETTO_Run(AMARETTOinit)

## Step 2.c. Generating an HTML report of AMARETTO results

In [0]:
AMARETTO_HTMLreport(AMARETTOinit = AMARETTOinit,
                    AMARETTOresults = AMARETTOresults,
                    ProcessedData = ProcessedData_LIHC,
                    VarPercentage = VarPercentage,
                    hyper_geo_test_bool = FALSE,
                    hyper_geo_reference = genesets_database_reference,
                    output_address = output_directory_TCGA,
                    SAMPLE_annotation = samples_phenotypes_annotation,
                    ID = 'Sample')

## Step 2.d. Saving and downloading AMARETTO results files and HTML report

To Do: downloading or visualizing??

In [0]:
saveRDS(AMARETTOinit, file="TCGA_AMARETTOinit.rds")
saveRDS(AMARETTOresults, file="TCGA_AMARETTOresults.rds")
zip(zipfile = "AMARETTO_report_TCGA.zip",files=output_directory_TCGA)

Now you can downlod those above 3 files by right-clicking on each file and select *Download*. 

# Step 3. Running AMARETTO on second example study: inferring regulatory networks from RNA-Seq data from CCLE liver cell lines

## Step 3.a. Preparing data and parameter settings for running AMARETTO

### Step 3.a.1 Loading RNA-Seq data from CCLE liver cell lines

#### Step 3.a.1.1. Loading Gene Expression (MA) data from CCLE liver cell lines (Required)

In [0]:
MA_matrix_CCLE <- readRDS(url("https://www.broadinstitute.org/~npochet/ExampleData/MA_matrix_CCLE.rds"))
ProcessedData_CCLE = list(MA_matrix=MA_matrix_CCLE, CNV_matrix=NULL, MET_matrix=NULL)

### Step 3.a.2. Defining List(s) of Candidate Driver Genes  (Required)

In [0]:
candidate_drivers_CCLE <- readRDS(url("https://www.broadinstitute.org/~npochet/ExampleData/candidate_drivers_CCLE.rds"))

### Step 3.a.3. Setting parameters for running AMARETTO (Required)

Additional parameters that are required to be set by the user for running AMARETTO include:

(<B>1</B>) <B>Number of regulatory modules</B> (i.e., cell circuits and their drivers) to be inferred. As a rule of thumb, hiqh quality regulatory modules  comprise of ~60-80 genes, so the optimal range of number of modules can be calculated by dividing the total number of genes in the analysis (see parameter % variation) by these numbers. Depending on the number of genes in the analysis, the ideal suggested range is ~100-200 modules.

(<B>2</B>) <B>Percent of most varying genes</B> across the sample population (% genes) to be included in the analysis. This is based on the functional genomics data, which can be population RNA-Seq, single-cell RNA-Seq, or proteomics data. While genes that do not vary across the population (i.e., stdev zero) are automatically filtered out from the analysis, it is recommended to adjust the % variation filter for each dataset. Depending on the type of data, the ideal suggested range is ~25%-75% genes that vary the most across the population.

In [0]:
NrModules = 10
VarPercentage = 10

### Step 3.a.4. Setting parameters for generating HTML results reports (Optional)

Additional parameters that are optional to be set by the user for running AMARETTO include:

(<B>1</B>) Define <B>collections of known functional categories</B> for functional characterization of the regulatory modules or cell circuits. One or more collections can be selected from the predefined MSigDB drop-down list (see <http://software.broadinstitute.org/gsea/msigdb/collections.jsp>) and/or uploaded by the user.

(<B>2</B>) Provide a <B>base "file name" for output files</B> generated by the AMARETTO analysis (e.g., myAmarettoAnalysis).

In [0]:
genesets_database_reference <- "H_C2_genesets.gmt"
download.file(url="https://www.broadinstitute.org/~npochet/ExampleData/H_C2_genesets.gmt", destfile=genesets_database_reference)

output_directory_CCLE = "./AMARETTO_report_CCLE/"
dir.create(output_directory_CCLE)

### Step 3.a.5. Setting parameters for computing on servers (Optional)

Select number of cores for parallel computing:

In [0]:
NrCores = 4

## Step 3.b. Running AMARETTO

In [0]:
AMARETTOinit <- AMARETTO_Initialize(ProcessedData = ProcessedData_CCLE,
                                    Driver_list = candidate_drivers_CCLE,
                                    NrModules = NrModules,
                                    VarPercentage = VarPercentage,
                                    NrCores = NrCores)

AMARETTOresults <- AMARETTO_Run(AMARETTOinit)

## Step 3.c. Generating an HTML report of AMARETTO results

In [0]:
AMARETTO_HTMLreport(AMARETTOinit = AMARETTOinit,
                    AMARETTOresults = AMARETTOresults,
                    ProcessedData = ProcessedData_CCLE,
                    VarPercentage = VarPercentage,
                    hyper_geo_test_bool = FALSE,
                    hyper_geo_reference = genesets_database_reference,
                    output_address = output_directory_CCLE,
                    SAMPLE_annotation = NULL,
                    ID = 'Sample')

## Step 3.d. Saving and downloading AMARETTO results files and HTML report

To Do: downloading or visualizing??

In [0]:
saveRDS(AMARETTOinit, file="CCLE_AMARETTOinit.rds")
saveRDS(AMARETTOresults, file="CCLE_AMARETTOresults.rds")
zip(zipfile = "AMARETTO_report_CCLE.zip",files=output_directory_CCLE)

Now you can downlod those above 3 files by right-clicking on each file and select *Download*. 

# Step 4. Running Community-AMARETTO to combine both example studies: identifying regulatory (sub)networks shared/distinct between TCGA and CCLE cohorts

#-----------------------------------------------------------------------------------------
# To Do: Input Parameters for Community-AMARETTO
#-----------------------------------------------------------------------------------------


In [0]:
AMARETTOinit_TCGA<-readRDS(file="TCGA_AMARETTOinit.rds")
AMARETTOresults_TCGA<-readRDS(file="TCGA_AMARETTOresults.rds")

AMARETTOinit_CCLE<-readRDS(file="CCLE_AMARETTOinit.rds")
AMARETTOresults_CCLE<-readRDS(file="CCLE_AMARETTOresults.rds")

#gmt_filelist=list(ImmuneSignature = "ciberSort.gmt")

HTMLsAMARETTOlist <- c("TCGA"=output_directory_TCGA,"CCLE"=output_directory_CCLE)

hyper_geo_reference = "H_C2_genesets.gmt" 

output_directory_cAMARETTO = "./cAMARETTO_report/"
dir.create(output_directory_cAMARETTO)   


#-----------------------------------------------------------------------------------------
# To Do: Running Community-AMARETTO
#-----------------------------------------------------------------------------------------


In [0]:
cAMARETTOresults<-cAMARETTO_Results(AMARETTOinit_all = list(TCGA=AMARETTOinit_TCGA,CCLE=AMARETTOinit_CCLE) ,
                                    AMARETTOresults_all = list(TCGA=AMARETTOresults_TCGA,CCLE=AMARETTOresults_CCLE),
                                    gmt_filelist = NULL,
                                    NrCores = 4,
                                    drivers=TRUE)

cAMARETTOnetworkM<-cAMARETTO_ModuleNetwork(cAMARETTOresults,
                                           pvalue=0.05,
                                           inter=5)

#Identify significantly connected subnetworks using the Girvan-Newman algorithm
cAMARETTOnetworkC<-cAMARETTO_IdentifyCom(cAMARETTOnetworkM,
                                         filterComm = FALSE,
                                         ratioCommSize=0.01,
                                         MinRuns=2,
                                         ratioRunSize=0.1,
                                         ratioEdgesInOut=0.5)

cAMARETTO_HTMLreport(cAMARETTOresults = cAMARETTOresults,
                     cAMARETTOnetworkM = cAMARETTOnetworkM,
                     cAMARETTOnetworkC = cAMARETTOnetworkC,
                     HTMLsAMARETTOlist = HTMLsAMARETTOlist,
                     CopyAMARETTOReport = TRUE,
                     hyper_geo_test_bool = TRUE,
                     hyper_geo_reference = hyper_geo_reference ,
                     MSIGDB = TRUE,
                     output_address= output_directory_cAMARETTO)

## Step 4.d. Saving and downloading community-AMARETTO results files and the HTML report

To Do: downloading or visualizing??

In [0]:
saveRDS(cAMARETTOresults, file="cAMARETTOresults.rds")
saveRDS(cAMARETTOnetworkM, file="cAMARETTOnetworkM.rds")
saveRDS(cAMARETTOnetworkC, file="cAMARETTOnetworkC.rds")
zip(zipfile = "cAMARETTO_report.zip",files=output_directory_cAMARETTO)

Now you can downlod those above 4 files by right-clicking on each file and select *Download*. 


# Time complexity of &#42;AMARETTO

Depending on the size of the data, for example, for TCGA cohorts of ~300-500 samples, it can take up to ~2 hours to run the &#42;AMARETTO algorithms and generate reports on the GenePattern Amazon Cloud server.

# Resources

The source code and user-friendly tools of the current &#42;AMARETTO toolbox and future developments are available from GitHub, Bioconductor, GenePattern, GenomeSpace, GenePattern Notebook and R Jupyter Notebook.

#### &#42;AMARETTO in Bioconductor
- <B>AMARETTO in Bioconductor</B>: submitted (<https://github.com/Bioconductor/Contributions/issues/1001>)<br>
- <B>Community-AMARETTO in Bioconductor</B>: in preparation for submission<br>

#### &#42;AMARETTO in GitHub
- <B>AMARETTO in GitHub</B>: <https://github.com/gevaertlab/AMARETTO><br>
- <B>Community-AMARETTO in GitHub</B>: <https://github.com/broadinstitute/CommunityAMARETTO><br>

#### &#42;AMARETTO in GenePattern
- <B>AMARETTO in GenePattern</B>:<br>
<https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00378><br>
- <B>Community-AMARETTO in GenePattern</B>:<br>
<https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00381><br>

#### &#42;AMARETTO in GenomeSpace
The AMARETTO and Community-AMARETTO modules in GenePattern are also available within GenomeSpace: <http://www.genomespace.org/>

#### &#42;AMARETTO in GenePattern Notebook
This <B>&#42;AMARETTO in GenePattern Notebook</B> provides users with a complete analysis pipeline that enables running AMARETTO on one or multiple data cohorts and connecting them using Community-AMARETTO. Each AMARETTO and Community-AMARETTO analysis generates a detailed report of genome-wide networks inferred from one cohort and/or shared/distinct across multiple cohorts. These reports include queryable tables and visualizations (heatmaps and network graphs) of shared/distinct cell circuits and their drivers, as well as their functional and phenotypic characterizations.

#### &#42;AMARETTO example reports
# Resources

The source code and user-friendly tools of the current &#42;AMARETTO toolbox and future developments are available from GitHub, Bioconductor, GenePattern, GenomeSpace, GenePattern Notebook and R Jupyter Notebook.

#### &#42;AMARETTO in Bioconductor
- <B>AMARETTO in Bioconductor</B>: submitted (<https://github.com/Bioconductor/Contributions/issues/1001>)<br>
- <B>Community-AMARETTO in Bioconductor</B>: in preparation for submission<br>

#### &#42;AMARETTO in GitHub
- <B>AMARETTO in GitHub</B>: <https://github.com/gevaertlab/AMARETTO><br>
- <B>Community-AMARETTO in GitHub</B>: <https://github.com/broadinstitute/CommunityAMARETTO><br>

#### &#42;AMARETTO in GenePattern
- <B>AMARETTO in GenePattern</B>:<br>
<https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00378><br>
- <B>Community-AMARETTO in GenePattern</B>:<br>
<https://cloud.genepattern.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00381><br>

#### &#42;AMARETTO in GenomeSpace
The AMARETTO and Community-AMARETTO modules in GenePattern are also available within GenomeSpace: <http://www.genomespace.org/>

#### &#42;AMARETTO in GenePattern Notebook
This <B>&#42;AMARETTO in GenePattern Notebook</B> provides users with a complete analysis pipeline that enables running AMARETTO on one or multiple data cohorts and connecting them using Community-AMARETTO. Each AMARETTO and Community-AMARETTO analysis generates a detailed report of genome-wide networks inferred from one cohort and/or shared/distinct across multiple cohorts. These reports include queryable tables and visualizations (heatmaps and network graphs) of shared/distinct cell circuits and their drivers, as well as their functional and phenotypic characterizations.

#### &#42;AMARETTO example reports
<B>Studying hepatitis C & B virus-induced hepatocellular carcinoma using AMARETTO & Community-AMARETTO:</B><br>
- An example report that learns regulatory networks from multi-omics data for hepatocellular carcinoma based on integrating genetic, epigenic and functional genomics data from TCGA: <a href = "http://portals.broadinstitute.org/pochetlab/example_reports/AMARETTO_results/LIHC_Report_TfUtils/AMARETTOhtmls/index.html">AMARETTO Report</a><br>
- An example report that integrates regulatory networks derived from >6 liver data sources (multi-omics hepatocellular carcinoma patient data from TCGA, ~25 liver cell line models from CCLE, time course hepatitis C virus infection data in Huh7 models, time course hepatitis B virus infection data in HepG2 models, single-cell hepatitis C virus infection data in Huh7 models, single-cell hepatitis B virus infection data in HepG2 models, further augmented with previously published prognostic network models that were derived from hepatocellular carcinoma patient data): <a href = "http://portals.broadinstitute.org/pochetlab/example_reports/Community-AMARETTO_results/cAMARETTO_all6_nonfiltered_SignaturesLiverHoshida/index.html">Community-AMARETTO Report</a><br>
- An example of ongoing work on developing gene-level ontology network representations from AMARETTO modules & Community-AMARETTO communities: <a href = "https://monabiyan.shinyapps.io/app_1/">Shiny App</a>

<B>Multi-omics & imaging data fusion for glioblastoma multiforme using AMARETTO:</B><br>
- An example report that integrates imaging data into the multi-omics regulatory networks for glioblastoma multiforme based on multi-omics and non-invasive imaging data from TCGA/TCIA (that we will later connect with networks learned from integrating RNA-Seq refined for anatomic structures and stem cells with histopathology imaging data from IvyGAP and that we will subsequently further refine based on single-cell RNA-Seq studies): <a href = "http://portals.broadinstitute.org/pochetlab/example_reports/AMARETTO_results/GBM_Report/AMARETTOhtmls/index.html">AMARETTO Report</a>

# References

1. Multiscale and multimodal inference of regulatory networks using &#42;AMARETTO. <i>In preparation for submission.</i>

2. Champion M., Brennan K., Croonenborghs T., Gentles A. J., Pochet N., Gevaert O. (2018). Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response. <i>EBioMedicine</i>, 27, 156-166. PMID:29331675 PMCID:PMC5828545

3. Gevaert O., Villalobos V., Sikic B. I., Plevritis S. K. (2014). Identification of ovarian cancer driver genes by using module network integration of multi-omics data. <i>Interface Focus</i>, 3(4), 20130013. PMID:24511378 PMCID:PMC3915833

4. Gevaert O., Tibshirani R., Plevritis S. K. (2015). Pancancer analysis of DNA methylation-driven genes using MethylMix. <i>Genome Biology</i>, 16(1), 17. PMID:25631659 PMCID:PMC4365533

5. Gevaert O. (2015). MethylMix: an R package for identifying DNA methylation-driven genes. <i>Bioinformatics</i>, 31(11), 1839-41. PMID:25609794 PMCID:PMC4443673

6. Stubbs B. J., Gopaulakrishnan S., Glass K., Pochet N., Everaert C., Raby B., Carey V. (2019). TFutils: Data structures for transcription factor bioinformatics. <i>F1000Research</i>, 8:152. (<https://doi.org/10.12688/f1000research.17976.1>)

7. Reich M., Liefeld T., Ocana M., Jang D., Bistline J., Robinson J., Carr P., Hill B., McLaughlin J., Pochet N., Borges-Rivera D., Tabor T., Thorvaldsdottir H., Regev A., Mesirov J. P. (2013). GenomeSpace: an environment for frictionless bioinformatics. <i>F1000Posters</i>, 4:804 (<https://f1000research.com/posters/1093972>)

8. Qu K., Garamszegi S., Wu F., Thorvaldsdottir H., Liefeld T., Ocana M., Borges-Rivera D., Pochet N., Robinson J. T., Demchak B., Hull T., Ben-Artzi G., Blankenberg D., Barber G. P., Lee B. T., Kuhn R. M., Nekrutenko A., Segal E., Ideker T., Reich M., Regev A., Chang H. Y., Mesirov J. P. (2016). Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace. <i>Nature Methods</i>, 13(3), 245-247. PMID:26780094 PMCID:PMC4767623

9. Cedoz PL, Prunello M, Brennan K, Gevaert O. MethylMix 2.0: an R package for identifying DNA methylation genes. <i>Bioinformatics</i>. 2018 Sep 1;34(17):3044-3046. doi: 10.1093/bioinformatics/bty156. PubMed PMID: 29668835; PubMed Central PMCID: PMC6129298.

10. Gevaert O, Tibshirani R, Plevritis SK. Pancancer analysis of DNA methylation-driven genes using MethylMix. <i>Genome Biology</i> 2015 Jan 29;16:17. doi: 10.1186/s13059-014-0579-8. PubMed PMID: 25631659; PubMed Central PMCID: PMC4365533.

11. Gevaert O. MethylMix: an R package for identifying DNA methylation-driven genes. <i>Bioinformatics</i>. 2015 Jun 1;31(11):1839-41. doi: 10.1093/bioinformatics/btv020. Epub 2015 Jan 20. PubMed PMID: 25609794; PubMed Central PMCID: PMC4443673.

# Questions?

For any questions with the <B>&#42;AMARETTO in R via GitHub and Bioconductor Notebook</B>, please contact <B>Nathalie Pochet</B> (<npochet@broadinstitute.org>) and <B>Olivier Gevaert</B> (<olivier.gevaert@stanford.edu>).