Skip to content

Latest commit

 

History

History
68 lines (59 loc) · 3.57 KB

revigo.md

File metadata and controls

68 lines (59 loc) · 3.57 KB

Summarizing GO terms using semantic similarity

REVIGO clusters GO terms based on their semantic similarity, p-values, and relatedness resulting in a hierarchical clustering where less dispensable terms are placed closer to the root. The clustering is visualized in a clustered heatmap showing the p-values for each term.

REVIGO generates multiple plots; the easiest way to obtain them in high quality is to download the R scripts that it offers underneath each of the options (e.g. scatter plot, tree map) and to run those yourself.

Here, I've downloaded the R script after selecting the treemap plot ("Make R script for plotting"). The script is aptly named REVIGO_treemap.r. The plot can be generated as easily as this:

## this will generate a PDF file named "REVIGO_treemap.pdf" in your current
## working directory
source("~/Downloads/REVIGO_treemap.r")  

Since I personally don't like plots being immediately printed to PDF (much more difficult to include them in an Rmarkdown!), I've tweaked the function a bit; it's essentially the original REVIGO script that I downloaded minus the part where the revigo.data object is generated and with a couple more options to tune the resulting heatmap.

REVIGO_treemap <- function(revigo.data, col_palette = "Paired",
                           title = "REVIGO Gene Ontology treemap", ...){
  stuff <- data.frame(revigo.data)
  names(stuff) <- c("term_ID","description","freqInDbPercent","abslog10pvalue",
                    "uniqueness","dispensability","representative")
  stuff$abslog10pvalue <- as.numeric( as.character(stuff$abslog10pvalue) )
  stuff$freqInDbPercent <- as.numeric( as.character(stuff$freqInDbPercent) )
  stuff$uniqueness <- as.numeric( as.character(stuff$uniqueness) )
  stuff$dispensability <- as.numeric( as.character(stuff$dispensability) )
  # check the treemap command documentation for all possible parameters - 
  # there are a lot more
  treemap::treemap(
    stuff,
    index = c("representative","description"),
    vSize = "abslog10pvalue",
    type = "categorical",
    vColor = "representative",
    title = title,
    inflate.labels = FALSE,      
    lowerbound.cex.labels = 0,   
    bg.labels = 255,
    position.legend = "none",
    fontsize.title = 22, fontsize.labels=c(18,12,8),
    palette= col_palette, ...
  )
}

I still need the originally downloaded script to generate the revigo.data object, which I will then pass onto my newly tweaked function. Using the command line (!) tools sed and egrep, I'm going to only keep the lines between the one starting with "revigo.data" and the line starting with "stuff". The output of that will be parsed into a new file, which will only generate the revigo.data object that I'm after (no PDF!).

# the system function allows me to run command line stuff outside of R
# just for legibility purposes, I'll break up the command into individual 
# components, which I'll join back together using paste()
sed_cmd <- "sed -n '/^revigo\\.data.*/,/^stuff.*/p'"
fname <- "~/Downloads/REVIGO_treemap.r"
egrep_cmd <- "egrep '^rev|^c'"
out_fname <- "~/Downloads/REVIGO_myData.r"
system(paste(sed_cmd, fname, "|", egrep_cmd, ">", out_fname))
## upon sourcing; no treemap PDF will be generated, but the revigo.data object
## should appear in your R environment
source("~/Downloads/REVIGO_myData.r")
REVIGO_treemap(revigo.data)

If this is all too tedious, you may want to check out a recent Python implementation of REVIGO's principles: GO-figure.