# Removal of blank features and batch normalization

For statistical analysis, mass spectral features with a relative intensity less than 20 times the mean relative intensity of all blank samples were removed. Relative intensities were further batch normalised through scaling by dividing each mass spectral feature by its batch root mean square.

load preprocessed feature table from MZmine

In [1]:
ft <- read.table('../PreprocessedData/PreprocessedData/pp_aligned_MS2.csv', sep = ',', check.names = F, header = T, row.names = 1)

load metadata

In [4]:
md <- read.table('../../metadata.tsv', sep = '\t', header = T, comment.char = '', check.names = F, stringsAsFactors = F)

In [11]:
md$'#SampleID'[-which(md$'#SampleID' %in% colnames(ft))]

In [12]:
colnames(ft)[-which(colnames(ft) %in% md$'#SampleID')]

In [13]:
dim(ft)
dim(md)

In [14]:
ft <- t(ft)
ft <- ft[-c(1,2),]

In [18]:
if (length(which(is.na(rowSums(ft) == T))) != 0){
   
    ft <- ft[-which(is.na(rowSums(ft) == T)),]
    
}

In [19]:
dim(ft)

In [20]:
which(is.na(rowSums(ft) == T))
which(is.na(colSums(ft) == T))

remove blank features <br>
H20 = water (8 water blanks per plate, 5 plates, 40 water samples)

In [21]:
table(md$SampleTypePlate)


   H20_1    H20_2    H20_3    H20_4    H20_5   Pool_1   Pool_2   Pool_3 
       8        8        8        8        8        4        3        4 
  Pool_4   Pool_5 Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 
       4        4       75       80       80       80       68 

In [22]:
length(rownames(ft)[grep('H20',rownames(ft))])

In [23]:
blankmeans <- colMeans(ft[grep('H20',rownames(ft)),])

In [24]:
blankids <- grep('H20',rownames(ft))

remove features in samples, which have a relative intensity of < 20 times the mean relative intensity in all blanks

In [28]:
for (i in 1:ncol(ft)){
    ft[-blankids,i][which(ft[-blankids,i] < 20*blankmeans[i])] <- 0
}

In [29]:
if (length(which(colSums(ft) == 0)) != 0){
    ft <- ft[,-which(colSums(ft) == 0)]
}

In [31]:
md <- md[match(rownames(ft),md$'#SampleID'),]
identical(as.character(md$'#SampleID'),as.character(rownames(ft)))

In [32]:
dim(md)
dim(ft)
identical(as.character(md$'#SampleID'),as.character(rownames(ft)))

Scale feature table by dividing columns by the per batch root mean square. The root-mean-square for a column is defined as √∑(x2)/(n−1), where x is a vector of the non-missing values and n is the number of non-missing values. 

In [35]:
identical(rownames(ft), rownames(md))

In [36]:
batchfts <- split( as.data.frame(ft) , f = md$Plate)

In [37]:
length(batchfts)

In [39]:
batchft_FT <- lapply( batchfts, scale, center = F, scale = T)
batchft_FT <- do.call("rbind", batchft_FT)

In [62]:
write.table(batchft_FT,'featuretable_blankfiltered_batchnormalised_withBlanksPools.tsv', sep = '\t', quote = F)