# 2.2. (NoRep) TMM normalization of read counts

Normalize read counts with TMM method (edgeR).

Consider only the samples from the 25min to 100min (first cell-cycle).

## Input

* `data-create_networks/yeast_Kelliher_2016/yeast_WT_counts.txt`: raw counts generated by featureCounts from Kelliher 2016 dataset.

## Output

* `data-create_networks/yeast_Kelliher_2016/yeast_WT_logCPM_noRep.txt`: normalized counts in log CPM.

In [1]:
library("edgeR")

Loading required package: limma



In [2]:
count_file = '../../data-create_networks/yeast_Kelliher_2016/yeast_WT_counts.txt'
dea_quality_file = '../../data-create_networks/yeast_Kelliher_2016/dea_quality.pdf'
results_folder = '../../data-create_networks/yeast_Kelliher_2016/comparisons/'
logCPM_file = '../../data-create_networks/yeast_Kelliher_2016/yeast_WT_logCPM_noRep.txt'

## Import data

In [3]:
# import data
counts = read.table( count_file, skip = 1, header = TRUE, row.names = 1 )
counts = counts[ , 11:26 ] # remove first columns

# rename header
colnames(counts) = paste((5:20)*5, 'min', sep='')

# experiment design
N_samples = dim(counts)[2]
group = factor( 1:N_samples, levels=1:N_samples )
design = model.matrix( ~0+group )

## Normalize dataset

In [4]:
initCds = DGEList( counts, group = group )

# filter lowly expressed genes
keepGenes = filterByExpr( initCds )
initCds = initCds[ keepGenes, , keep.lib.sizes=FALSE ]

cat( "Number of genes before filtering:", nrow( counts ), "\n" )
cat( "Number of genes after filtering:", nrow( initCds$counts ), "\n" )

# compute model
initCds = calcNormFactors( initCds )

Number of genes before filtering: 7126 
Number of genes after filtering: 6497 


In [5]:
### exports
write.table( cpm(initCds, prior.count=2, log=TRUE),
             file = logCPM_file,
             sep = "\t", quote=FALSE, row.names=TRUE, col.names=NA )