nQuant/featureCV return wrong sized matrix #208

sgibb · 2017-04-28T16:16:37Z

This PR fixes nQuant and featureCV. Both functions return a wrong-sized matrix if the grouping variable is a factor that contains more levels than items:
MWE:

library("MSnbase")

m <- new("MSnSet",
         exprs=matrix(1:10, nrow=5, ncol=2),
         featureData=new("AnnotatedDataFrame",
                         data=data.frame(accession=
                                         factor(c("A", "A", "A", "B", "B"),
                                                levels=LETTERS[1:10]))))
exprs(m)
#   1  2
# 1 1  6
# 2 2  7
# 3 3  8
# 4 4  9
# 5 5 10

nQuants(m, fData(m)$accession)
#   1 2
# A 3 3
# B 2 2
# C 0 0
# D 0 0
# E 0 0
# F 0 0
# G 0 0
# H 0 0
# I 0 0
# J 0 0

This results in an error if you want to apply TOP3 quantitation after you did some subsetting (e.g. filtering):

library("MSnbase")
data(msnset)

nQuants(msnset, fData(msnset)$ProteinAccession)
#         iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
# BSA              3          3          3          3
# ECA0172          1          1          1          1
# ECA0435          2          2          2          2
# ...
# ECA4514          6          6          6          6
# ENO              4          4          3          4

msnsetSubset <- msnset[1:5]
nQuants(msnsetSubset, fData(msnsetSubset)$ProteinAccession)
#         iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
# BSA              1          1          1          1
# ECA0172          0          0          0          0
# ECA0435          0          0          0          0
# ...
# ECA4514          0          0          0          0
# ENO              0          0          0          0


msnsetSubsetTop3 <- topN(msnsetSubset,
                         groupBy=fData(msnsetSubset)$ProteinAccession,
                         n=3)
nSubsetPeps <- nQuants(msnsetSubsetTop3,
                       groupBy=fData(msnsetSubset)$ProteinAccession)
msnsetSubsetTop3 <- combineFeatures(msnsetSubsetTop3,
                                    groupBy=fData(msnsetSubset)$ProteinAccession,
                                    fun="sum", na.rm=TRUE)

exprs(msnsetSubsetTop3) <- exprs(msnsetSubsetTop3) * (3/nSubsetPeps)
# Error in exprs(msnsetSubsetTop3) * (3/nSubsetPeps) :
#  non-conformable arrays

This PR removes utils.colSd, utils.applyColumnwiseByGroup, rewrites nQuants, featureCV, adds rowmean, rowsd (both similar to rowsum) and adds unit tests for all of them.

As side-effect it changes the output of nQuants and featureCV. Now they return a matrix with nrow(x) == sum(levels(group) %in% group) instead of nrow(x) == nlevels(group). I don't think anybody would use the former output format. In the current phase of the release cycle it should be safe to change the dimension of the return values.

Same example as above with the new implementation:

library("MSnbase")
data(msnset)

nQuants(msnset, fData(msnset)$ProteinAccession)
#         iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
# BSA              3          3          3          3
# ECA0172          1          1          1          1
# ECA0435          2          2          2          2
# ...
# ECA4514          6          6          6          6
# ENO              4          4          3          4

msnsetSubset <- msnset[1:5]
nQuants(msnsetSubset, fData(msnsetSubset)$ProteinAccession)
#         iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
# BSA              1          1          1          1
# ECA1364          1          1          1          1
# ECA1422          1          1          1          1
# ECA3882          1          1          1          1
# ECA4030          1          1          1          1

msnsetSubsetTop3 <- topN(msnsetSubset,
                         groupBy=fData(msnsetSubset)$ProteinAccession,
                         n=3)
nSubsetPeps <- nQuants(msnsetSubsetTop3,
                       groupBy=fData(msnsetSubset)$ProteinAccession)
msnsetSubsetTop3 <- combineFeatures(msnsetSubsetTop3,
                                    groupBy=fData(msnsetSubset)$ProteinAccession,
                                    fun="sum", na.rm=TRUE)

exprs(msnsetSubsetTop3) <- exprs(msnsetSubsetTop3) * (3/nSubsetPeps)
msnsetSubsetTop3
# MSnSet (storageMode: lockedEnvironment)
# assayData: 5 features, 4 samples
#   element names: exprs
# protocolData: none
# phenoData: none
# featureData
#   featureNames: BSA ECA1364 ... ECA4030 (5 total)
#   fvarLabels: spectrum ProteinAccession ... CV.iTRAQ4.117 (19 total)
#   fvarMetadata: labelDescription
# experimentData: use 'experimentData(object)'
# Annotation:
# - - - Processing information - - -
# Data loaded: Wed May 11 18:54:39 2011
# iTRAQ4 quantification by trapezoidation: Wed Apr  1 21:41:53 2015
# Subset [55,4][5,4] Fri Apr 28 18:13:18 2017
# Selected top 3 features [Fri Apr 28 18:13:57 2017]
# Subset [5,4][5,4] Fri Apr 28 18:13:57 2017
# Combined 5 features into 5 using sum: Fri Apr 28 18:13:57 2017
#  MSnbase version: 2.3.1

sgibb added 9 commits April 28, 2017 15:27

add unit test for nQuants

f1ae9d2

rewrite nQuants using rowsum instead of utils.applyColumnwiseByGroup

8f4fce5

extend nQuants unit test

bf2f6de

add rowmean with unit tests

4c2c704

add rowsd and unit tests

61fd335

rewrite featureCV and add unit test

ecf2a41

remove applyColumnwiseByGroup

006e957

update NEWS.md

f27f182

remove utils.colSd

d272524

lgatto merged commit fd32c77 into master Apr 29, 2017

sgibb deleted the nQuants branch April 29, 2017 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nQuant/featureCV return wrong sized matrix #208

nQuant/featureCV return wrong sized matrix #208

sgibb commented Apr 28, 2017

nQuant/featureCV return wrong sized matrix #208

nQuant/featureCV return wrong sized matrix #208

Conversation

sgibb commented Apr 28, 2017