# Mastermap
------
Mastermap is an Enrichment map that contains more than 2 enrichment analyses.

## Input

1. ***Expression file (Optional)*** - a file with absolute or relative expression values for a set of genes and experiments.
  * Two different options:
    * One expression file where each column represents an individual experiment and there are N columns (one for each of the N experiments.)
    * N expression files where each file contains the expression for each experiment (possibly with ctrl vs treatment values)
    
1. ***Enrichment Results***
  * Two different options:
    * GSEA results
      * a GMT file
      * N GSEA output directories with enrichment results
    * Gprofiler/David/Bingo results
      * N enrichment results files.
      
1. ***User specified parameters***:
  * p-value threshold (Optional)
  * q-value threshold
  * Minimum number of experiments pathway needs to appear in for it to be included in output
  * ~~NES threshold (Optional, GSEA speciric, not sure if it is required)~~

------

## Generating a Mastermap in the absence of an app.

### Collate all the gprofiler enrichment results.

1. Place all the gprofiler enrichment files into one directory.
1. Navigate to that directory and run the following shell script
<pre>
`
    #go through each gprofiler file
    for file in `ls *_gprofiler_results.txt`; do
        
        NAME1=`echo $file | cut -d'_' -f1 `
        NAME2=`echo $file | cut -d'_' -f2 `
        NAME=${NAME1}_${NAME2}
        #append the name to the gprofiler results file		
        awk -v name="${NAME}" '{print name"\t", $0}' $file > ${file}_forR
        #remove the first line
        tail +2 ${file}_forR > temp.txt
        mv temp.txt ${file}_forR
        
    done
    #merge all results into one file
    cat *_forR > gprofiler_reports_alltissues.txt
    rm *_forR
`
</pre>
1. The above shell script will go into each of the gprofiler results file (based on the presence of *_gprofiler_results.txt in the file name.  and append the file name as the first column to the gprofiler results file.  The files for each of the folders is appended to create the file   
  * gprofiler results files contain the following headers:
    1. NAME
    1. Description
    1. p-value
    1. q-value
    1. phenotype
    1. genes
  * Resulting "gprofiler_reports_alltisues.txt" file contains the same headers with the addition of one column in front of Name
    1. Analysis name - taken from the name of the gprofiler results file.

## User defined thresholds

In [6]:
pval_thresh <- 0.01
fdr_thresh <- 0.01
min_experiments <- 3

similarity_cutoff <- "0.25"
similarity_metric <- "JACCARD"

## Load file containing the collated gprofiler results

In [4]:
#set working directory
setwd("./One_expression_file_example/Gprofiler/");

In [14]:
gprofiler_enrichments <- read.table("gprofiler_reports_alltissues.txt", header = TRUE, sep = "\t",as.is = TRUE, quote="\"")

colnames(gprofiler_enrichments) <- c("Experiment","Name","Description","p-value","q-value","pheontype",
                                     "genes");


#filter gprofiler enrichment by thresholds
#only include NES scores that are > 0 (for the proteomics data the under-enriched is not fitting for this analysis)
gprofiler_enrichments_filtered <- gprofiler_enrichments[which(gprofiler_enrichments[,'p-value']<=pval_thresh & 
                                                    gprofiler_enrichments[,'q-value']<=fdr_thresh ),]

#row_names - get the unique set of pathways that is contained in the collated data.  column 2 indicates the pathway name
row_names <- unique(gprofiler_enrichments_filtered[,2])
#column_names - get the unique set of experiments contains in the collated data.  column 1 indicates the experiment type
column_names <- unique(gprofiler_enrichments_filtered[,1])

#create a matrix which will store the tissue profiles for all genesets in the thresholded set
pathways2experiments_significant <- matrix(nrow=length(row_names), ncol=length(column_names),dimnames=list(row_names, column_names))
pathways2experiments_all <- matrix(nrow=length(row_names), ncol=length(column_names),dimnames=list(row_names, column_names))

for (i in 1:length(row_names)){
  for (j in 1:length(column_names)){
      #only add the NES value if this pathway is significant for this experiment
    if(length(which(gprofiler_enrichments_filtered[,1] == column_names[j] & 
                    gprofiler_enrichments_filtered[,2] == row_names[i])) > 0 ){
        pathways2experiments_significant[i,j] = -log(gprofiler_enrichments_filtered[which(gprofiler_enrichments_filtered[,1] == column_names[j] & 
                                                               gprofiler_enrichments_filtered[,2] == row_names[i]), "p-value"])   
    }
      #only add the NES value if it exists in the enrichments (irrespective of significance)
      if(length(which(gprofiler_enrichments[,1] == column_names[j] & 
                    gprofiler_enrichments[,2] == row_names[i])) > 0 ){
      pathways2experiments_all[i,j] = -log(gprofiler_enrichments[which(gprofiler_enrichments[,1] == column_names[j] & 
                                                               gprofiler_enrichments[,2] == row_names[i]), "p-value"])
      }
  }
}

# only include the pathways that are significant in at least X experiments
pathways2experiments_significant <- pathways2experiments_significant[
    which(apply(pathways2experiments_significant,1, function(x){length(which(x!=0))}) >= min_experiments),]
pathways2experiments_all <- pathways2experiments_all[
    which(apply(pathways2experiments_significant,1, function(x){length(which(x!=0))}) >= min_experiments),]

#output the pathways2experiments_all
#
# The pathways2experiments file is a matrix of pathways to experiments where each value in the matrix is a significant NES value.
# This table can be used to calculate which genesets pass the minimum expereiment threshold but should not be used 
# as an expression file for vista clara plugin as it is missing NES values for pathways and experiments that were not significant
write.table(pathways2experiments_all, file="gprofiler_pathways2experiments_all.txt", sep="\t", row.names=TRUE, col.names=TRUE,quote=FALSE)


# create a fake gprofiler summary enrichment file.

#create a collapsed enrichment file
# p-value and q-value is the minimum
collapsedenr_column_names <- c("Name","Description","p-value","q-value","pheontype",
                                     "genes");
#limit to a subset
row_names_subset <- rownames(pathways2experiments_significant)[
    which(apply(pathways2experiments_significant,1, function(x){length(which(x!=0))}) >= min_experiments)]

collapsed_enrichments<- matrix(nrow=length(row_names_subset), ncol=length(collapsedenr_column_names),
                               dimnames=list(row_names_subset, collapsedenr_column_names))
#go through the genesets
for (i in 1:length(row_names_subset)){
  #get all the genesets from the filtered set
  indices <- which(gprofiler_enrichments_filtered[,2] == row_names_subset[i])
  subset <- gprofiler_enrichments_filtered[indices,];
  #grab the first name,gs,gs details, size - they are all the same 
  collapsed_enrichments[i,1] <- row_names_subset[i]
  collapsed_enrichments[i,2] <- subset[1,3]
  #get the minimum pvalue, qvalue
  collapsed_enrichments[i,3] <- min(subset[,4])
  collapsed_enrichments[i,4] <- min(subset[,5])
  #get the summed ES score
  collapsed_enrichments[i,5] <- 1
  #get the summed NES score
  collapsed_enrichments[i,6] <- paste(subset[,7], collapse=",")

}


In [15]:
dim(collapsed_enrichments)

In [16]:
pathways2experiments_all[is.na(pathways2experiments_all)] <- 0
rownames(pathways2experiments_all) <- trimws(rownames(pathways2experiments_all))
output_pathways2experiments_all <- cbind(rownames(pathways2experiments_all), pathways2experiments_all)

write.table(output_pathways2experiments_all, file="gprofiler_pathways2experiments_all.txt", sep="\t", 
            row.names=FALSE, col.names=TRUE,quote=FALSE)

pathways2experiments_significant[is.na(pathways2experiments_significant)] <- 0
rownames(pathways2experiments_significant) <- trimws(rownames(pathways2experiments_significant))
output_pathways2experiments_significant <- cbind(rownames(pathways2experiments_significant), pathways2experiments_significant)

write.table(output_pathways2experiments_significant, file="gprofiler_pathways2experiments_significant.txt", sep="\t", 
            row.names=FALSE, col.names=TRUE,quote=FALSE)


enrichment_results_file_name <- paste("gprofiler_mastermap_enrichments_fdr", fdr_thresh, "_minexp", min_experiments,".txt", sep="")
write.table( collapsed_enrichments, file=enrichment_results_file_name, sep="\t", row.names=FALSE, col.names=TRUE,quote=FALSE)

## Initialize Cytoscape

In [18]:
library(RJSONIO)
library(httr)

port.number=1234
base.url = paste("http://localhost:",toString(port.number),"/v1", sep="")

print(base.url)

version.url = paste(base.url,"version", sep="/")
cytoscape.version = GET(version.url)
cy.version = fromJSON(rawToChar(cytoscape.version$content))
print(cy.version)

[1] "http://localhost:1234/v1"
      apiVersion cytoscapeVersion 
            "v1"          "3.4.0" 


In [19]:
enrichmentmap.url <- paste(base.url, "commands", "enrichmentmap", "build", sep="/")
#mac file paths
path_to_file="/Users/risserlin/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Mastermap/notebooks/One_expression_file_example/Gprofiler/"
exp_file="/Users/risserlin/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Mastermap/notebooks/One_expression_file_example/Human_Proteome_Map_spectral_count_gene_tissue.txt"
#windows file paths
#exp_file="C:/Users/zaphod/Ruth_dropbox/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Mastermap/notebooks/One_expression_file_example/Human_Proteome_Map_spectral_count_gene_tissue.gct"
#path_to_file="C:/Users/zaphod/Ruth_dropbox/Dropbox (Bader Lab)/Ruth Isserlin's files/Enrichment_Analyses/Mastermap/notebooks/One_expression_file_example/Gprofiler/"

In [20]:
enr_file = paste(path_to_file,enrichment_results_file_name,sep="")

em_params <- list(analysisType = "generic",enrichmentsDataset1 = enr_file,pvalue="1.0",qvalue="0.05",
                  expressionDataset1 = exp_file, 
                  similaritycutoff=similarity_cutoff,coeffecients=similarity_metric)

response <- GET(url=enrichmentmap.url, query=em_params)