# 2018-11-06 Gene ontology analysis
Here, I want to try to do gene ontology analysis based on the several `R` packages that are available. The `GOenrichmentAnalysis` function from `WGCNA` seems to be outdated. When using it it outputs a warning, saying to use another function, but the website it points to does not exist. I think the best would be to just manually create the objects containing the lists of genes that we want to analyse, and perform the analysis with up-to-date analysis tools.

## topGO tutorial
Here I want to explore the `topGO` package to perform gene ontology enrichment analysis. I'm following here the tutorial on the package's website.

In [None]:
library(topGO)
library(ALL)
data(ALL)
data(geneList)

In [None]:
affyLib <- paste(annotation(ALL), "db", sep = ".")
library(package = affyLib, character.only = TRUE)

In [None]:
sum(topDiffGenes(geneList))

In [None]:
sampleGOdata <- new("topGOdata",
                    description = "Simple session",
                    ontology = "BP",
                    allGenes = geneList,
                    geneSel = topDiffGenes,
                    nodeSize = 10,
                    annot = annFUN.db,
                    affyLib = affyLib)

In [None]:
resultFisher <- runTest(sampleGOdata, algorithm = "classic", statistic = "fisher")

In [None]:
resultFisher

In [None]:
resultKS <- runTest(sampleGOdata, algorithm = "classic", statistic = "ks")
resultKS.elim <- runTest(sampleGOdata, algorithm = "elim", statistic = "ks")

In [None]:
allRes <- GenTable(sampleGOdata, 
                   classicFisher = resultFisher,
                   classicKS = resultKS, elimKS = resultKS.elim,
                   orderBy = "elimKS", ranksOf = "classicFisher", topNodes = 10)

In [None]:
allRes

In [None]:
?termStat

In [None]:
pValue.classic <- score(resultKS)
pValue.elim <- score(resultKS.elim)[names(pValue.classic)]
gstat <- termStat(sampleGOdata, names(pValue.classic))
gSize <- gstat$Annotated / max(gstat$Annotated) * 4

#Defined colMap, ref. https://github.com/Bioconductor-mirror/topGO/blob/master/vignettes/topGO.Rnw
colMap <- function(x) {
  .col <- rep(rev(heat.colors(length(unique(x)))), time = table(x))
  return(.col[match(1:length(x), order(x))])
}

gCol <- colMap(gstat$Significant)
plot(pValue.classic, pValue.elim, xlab = "p-value classic", ylab = "p-value elim",
pch = 19, cex = gSize, col = gCol)

In [None]:
sel.go <- names(pValue.classic)[pValue.elim < pValue.classic]
cbind(termStat(sampleGOdata, sel.go),
      elim = pValue.elim[sel.go],
      classic = pValue.classic[sel.go])

In [None]:
showSigOfNodes(sampleGOdata, score(resultKS.elim), firstSigNodes = 5, useInfo = ' all ' )

## Scratch code

In [None]:
# load the biomaRt library
library(biomaRt)

In [None]:
# load the data corresponding to human genome in the ENSEMBL Mart
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

Let's now write a piece of code that allows to select a module of interest, extract the gene names, and extract the Gene Ontology IDs.

In [None]:
# a function to extract the names of the genes corresponding to a certain module
# of a certain sample name
GenesOfColor <- function(sample.name, module.color) {
    geneColors <- labels2colors(nets[[sample.name]]$colors)
    darkGreenGenes.idx <- which(geneColors == module.color)
    darkGreenGenes <- colnames(datExpr[[sample.name]][, darkGreenGenes.idx])
}

In [None]:
# get the names of the genes of interest
darkGreenGenes <- GenesOfColor("P2449", "darkgreen")

In [None]:
# now let's interrogate the mart and find out something about the genes that we are
# interested in
go.ids <- getBM(attributes = c("ensembl_gene_id_version", "go_id"),
                   filters = "ensembl_gene_id_version",
                   values = darkGreenGenes,
                   mart = mart)$go_id

In [None]:
library(GO.db)

In [None]:
goterms <- Term(GOTERM)

In [None]:
goterms[go.ids]