-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chromatogram fdata #290
Chromatogram fdata #290
Conversation
- Add featureData slot to Chromatograms class. - Add getter/setter method for featureData. - Add precursorMz, productMz and mz,Chromatograms method. - Add related unit tests and documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all a fine PR.
#' | ||
#' @noRd | ||
.mz_chromatograms <- function(x, mz = "mz") { | ||
mz <- match.arg(mz, c("mz", "precursorMz", "productMz")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just my personal preference. I like to see all choices for an argument in the definition of a function (especially for an exported and documented function):
.mz_chromatograms <- function(x, mz = c("mz", "precursorMz", "productMz")) {
mz <- match.arg(mz) ## will automatically use "mz" as default
## ...
}
## If we've got the values in the featureData, use these. | ||
if (mz %in% c("precursorMz", "productMz")) | ||
vl <- rep(paste0(sub(mz, pattern = "Mz", replacement = ""), | ||
"IsolationWindowTargetMZ"), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paste0
is superfluous here: vl <- rep(sub("Mz", "IsolationWindowTargetMZ", mz), 2)
vl <- c("mzmin", "mzmax") | ||
if (all(vl %in% fvarLabels(x))) { | ||
## Want to return a matrix, not a data.frame | ||
cbind(mzmin = fData(x)[, vl[1]], mzmax = fData(x)[, vl[2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not explicitly convert into a matrix (the return value of cbind
depends on its input, would be a numeric
matrix
in our use case but who knows ...).
m <- as.matrix(fData(x)[, vl])
dimnames(m) <- list(NULL, c("mzmin", "mzmax"))
m
Or in one line:
as.matrix(setNames(fData(x)[, vl], c("mzmin", "mzmax")), rownames.force=FALSE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check - I think there was something behind using cbind
here (performance wise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cbind
is slightly faster than as.matrix
. Not that it matters here, I just prefer using cbind
and extract individual columns from a data.frame
instead of anything that involves accessing multiple columns at once in a data.frame
, as that might/can involve copying of the data, while accessing single columns of a data.frame
never copies the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, cool. I wasn't aware of that. You are right:
library("microbenchmark")
set.seed(2017)
n <- 1e5
d <- data.frame(a=sample(n), b=sample(n), c=sample(n))
f1 <- function(x, vl=c("a", "b"))cbind(mzmin=x[, vl[1L]], mzmax=x[, vl[2L]])
f2 <- function(x, vl=c("a", "b"))matrix(c(x[, vl[1L]], x[, vl[2L]]), ncol=2L,
dimnames=list(NULL, c("mzmin", "mzmax")))
all.equal(f1(d), f2(d))
# [1] TRUE
microbenchmark(f1(d), f2(d))
# Unit: microseconds
# expr min lq mean median uq max neval
# f1(d) 112.814 122.4080 203.0495 126.7465 139.109 3693.441 100
# f2(d) 505.302 523.0355 638.2103 526.0990 536.168 4133.861 100
## the values in one row are not identical | ||
mzr <- matrix(nrow = nrow(x), ncol = 2, | ||
dimnames = list(NULL, c("mzmin", "mzmax"))) | ||
for (i in 1:nrow(mzr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use seq_len(nrow(mzr))
instead of 1:nrow(mzr)
: https://bioconductor.org/developers/how-to/efficient-code/#avoid--style-iterations
#' @rdname Chromatograms-class | ||
#' | ||
#' @description \code{fData}: return the feature data as a \code{data.frame}. | ||
setMethod("fData", "Chromatograms", function(object) pData(object@featureData)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pData
correct (instead of fData
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, although confusing. pData,AnnotationDataFrame
accesses the adf@data
slot, but is also the accessor for object@phenoData@data
slot where object
is an eSet
type instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - @sgibb did all the work already anyway :-).
@jotsetung - is this something that needs to be pushed to Bioc quickly? (sorry, initially commented on wrong PR) |
Nope, no need to push to Bioc now. This is some preparatory work for the future I'm confused - you did already merge? Then I'll change the requested minor stuff above in the |
Yes, I already merged, sorry. What about |
yes, I believe SRM and MRM are the same thing - with SRM being the correct term. Having a |
featureData
slot to theChromatograms
object.mz,Chromatograms
,precursorMz,Chromatograms
andproductMz,Chromatograms
.Short description: each row of an
Chromatograms
object should contain chromatogram data for the same ion or m/z range (+ eventually rt range). Having this data in afeatureData
allows users a quick way to access such data.