# 2.2. (Mouse) Differential expression analysis

We extracted differentially expressed genes for the 3 time points compared to the baseline (Day 0) and in pairwise-manner (Day T vs T-1).

Differential expression analysis was done with the robust QLF test and the adjusted p-value was independently computed for each comparison.

## Input

* `data-create_networks/mouse/mouse_DRG_WT_timecourse_counts.txt`: raw counts generated by featureCounts from the mouse dataset.

## Output

* `data-create_networks/mouse/dea_quality.pdf`: quality control of the differential expression analysis.
* Files within the folder `data-create_networks/mouse/comparisons`: results of the DEA for the comparisons.
* `data-create_networks/mouse/mouse_logCPM.txt`: normalized counts in log CPM.

In [1]:
library("edgeR")

Loading required package: limma



In [2]:
count_file = '../../../data-create_networks/mouse/mouse_DRG_WT_timecourse_counts.txt'
dea_quality_file = '../../../data-create_networks/mouse/dea_quality.pdf'
results_folder = '../../../data-create_networks/mouse/comparisons/'
logCPM_file = '../../../data-create_networks/mouse/mouse_logCPM.txt'

## Import data

In [3]:
# import data
counts = read.table( count_file, skip = 1, header = TRUE, row.names = 1 )
counts = counts[ , 6:ncol(counts) ] # remove first columns

# rename header
cols = gsub( 'Aligned.out.bam', '', colnames(counts) )
cols = gsub( 'bam_trimmed.DRG_WT_', '', cols )
colnames(counts) = paste('Day', cols, sep='')

# experiment design
N_replicates = dim(counts)[2]/2
group = factor( rep( 1:N_replicates, each=2 ), levels=1:N_replicates )
design = model.matrix( ~group )

## Prepare dataset

In [4]:
initCds = DGEList( counts, group = group )

# filter lowly expressed genes
keepGenes = filterByExpr( initCds )
initCds = initCds[ keepGenes, , keep.lib.sizes=FALSE ]

cat( "Number of genes before filtering:", nrow( counts ), "\n" )
cat( "Number of genes after filtering:", nrow( initCds$counts ), "\n" )

# compute model
initCds = calcNormFactors( initCds )
initCds = estimateDisp( initCds, design, robust = TRUE )
fit = glmQLFit( initCds, design, robust = TRUE )
cat( "BCV =", sqrt( initCds$common.dispersion ) )

Number of genes before filtering: 24421 
Number of genes after filtering: 17328 
BCV = 0.08850174

In [5]:
pdf(dea_quality_file)

plotMDS( initCds )
plotBCV( initCds )

qlfTests = list()
contrasts = list( c(0,1,0,0), c(0,0,1,0), c(0,0,0,1), c(0,-1,1,0), c(0,0,-1,1) )
names = c('Day1vs0', 'Day3vs0', 'Day30vs0', 'Day3vs1', 'Day30vs3')
for( i in 1:5 ){
    name = names[i]
    # test
    qlfTests[name] = list( glmQLFTest( fit, contrast = contrasts[[i]] ) )
    # adjusted p-value
    padj = p.adjust( qlfTests[[name]]$table$PValue, method = "BH" )
    qlfTests[[name]]$table$FDR = padj
    # display
    plotQLDisp( qlfTests[[name]], main=paste(name, "glm edgeR QL Dispersion") )
    hist(qlfTests[[name]]$table$PValue, 150,
         main=paste(name, "glm edgeR tagwise dispersion"),xlab="raw pvalue")
}

dev.off()

In [6]:
### exports
i = 1
for( name in names(qlfTests) ){
    write.table( qlfTests[[name]]$table,
                 file=paste( results_folder, formatC(i, width=2, flag="0"), "_mouse_",
                             name, ".txt", sep='' ),
                 sep = "\t", quote=FALSE, row.names=TRUE, col.names=NA )
    i = i + 1
}

write.table( cpm(initCds, prior.count=2, log=TRUE),
             file = logCPM_file,
             sep = "\t", quote=FALSE, row.names=TRUE, col.names=NA )