The ExpressionSet class from the Biobase package is designed to combine different sources of information into one standardized structure. It is very convenient to represent genomic data as it can include expression data, ‘meta-data’ describing samples in the experiment, annotations about the genes, and a flexible structure to describe the experiment. Here we will explore and understand the parts of an ExpressionSet.

## Load packages

In [None]:
library(Biobase)
library(GEOquery)
library(gplots)

## Download data from the GEO public gene expression data repository
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1577

In [None]:
eset <- getGEO('GSE1577')[[1]]
class(eset)
show(eset)

# Dimensions of the ExpressionSet
dim(eset)
# Number of features (genes)
nrow(eset) 
# Number of samples
ncol(eset) 

# "names" of samples--must be unique
sampleNames(eset)[1:8]
# "names" of features--must be unique
featureNames(eset)[1:8]

## featureData - gene information

In [None]:
class(fData(eset))
dim(fData(eset))
all.equal(featureNames(eset),rownames(fData(eset)))
head(fData(eset)[,c(10:12)])
colnames(fData(eset))

## phenoData - sample information

In [None]:
class(pData(eset))
dim(pData(eset))
all.equal(sampleNames(eset),rownames(pData(eset)))
head(pData(eset))
colnames(pData(eset))

## assayData - expression information

In [None]:
head(exprs(eset))
# OR
head(assayDataElement(eset,'exprs')) 
# OR
head(assayData(eset)[['exprs']])
 
class(exprs(eset))
summary(exprs(eset))
all.equal(colnames(exprs(eset)),sampleNames(eset))
all.equal(rownames(exprs(eset)),featureNames(eset))

## Subsetting ExpressionSets
Subsetting works similarly to data.frames or matrices (columns represent samples, and rows represent genes or features).

In [None]:
eset[1:10,]
eset[,1:10]

# Subset our ExpressionSet to include only the "Bone Marrow" samples
levels(pData(eset)$source_name_ch1)
eset[,pData(eset)$source_name_ch1=="Bone marrow sample"]

## Make a heatmap with the top 20 most variable genes

In [None]:
# Prepare data for heatmap
summary(exprs(eset))
hist(exprs(eset))
# Put the expression values in log2 space
summary(log2(exprs(eset)))
hist(log2(exprs(eset)))
# Replace the expression values in our ExpressionSet with the log2-transformed values
exprs(eset) <- log2(exprs(eset))
hist(exprs(eset))
# Compute the expression standard deviation for each feature in the ExpressionSet
stdDev <- apply(exprs(eset),1,sd)
hist(stdDev)
# Subset the expression data to include the top 20 most variable genes
eset2 <- eset[order(stdDev,decreasing=TRUE)[1:20],]
eset2
# Make the heatmap
heatmap.2(exprs(eset2),trace='none')
# Make a more informative heatmap - change probe ID to gene symbol and add the sample types as a color bar
cols <- pData(eset2)$source_name_ch1
heatmapColColors <- c("blue", "red")[factor(cols)]
heatmap.2(exprs(eset2),trace='none',labRow=fData(eset2)$'Gene Symbol', labCol=pData(eset2)$source_name_ch1,ColSideColors=heatmapColColors,margins=c(10,12))
legend(3.5,4, legend=unique(cols),fill=unique(heatmapColColors), xpd=TRUE, box.lwd=NA, cex=0.7) 