Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromatogram fdata #290

Merged
merged 5 commits into from
Dec 19, 2017
Merged

Chromatogram fdata #290

merged 5 commits into from
Dec 19, 2017

Conversation

jorainer
Copy link
Collaborator

  • Add a featureData slot to the Chromatograms object.
  • Add mz,Chromatograms, precursorMz,Chromatograms and productMz,Chromatograms.
  • Add all related unit tests and documentation.

Short description: each row of an Chromatograms object should contain chromatogram data for the same ion or m/z range (+ eventually rt range). Having this data in a featureData allows users a quick way to access such data.

- Add featureData slot to Chromatograms class.
- Add getter/setter method for featureData.
- Add precursorMz, productMz and mz,Chromatograms method.
- Add related unit tests and documentation.
Copy link
Collaborator

@sgibb sgibb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all a fine PR.

#'
#' @noRd
.mz_chromatograms <- function(x, mz = "mz") {
mz <- match.arg(mz, c("mz", "precursorMz", "productMz"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just my personal preference. I like to see all choices for an argument in the definition of a function (especially for an exported and documented function):

.mz_chromatograms <- function(x, mz = c("mz", "precursorMz", "productMz")) {
    mz <- match.arg(mz) ## will automatically use "mz" as default
## ...
}

## If we've got the values in the featureData, use these.
if (mz %in% c("precursorMz", "productMz"))
vl <- rep(paste0(sub(mz, pattern = "Mz", replacement = ""),
"IsolationWindowTargetMZ"), 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paste0 is superfluous here: vl <- rep(sub("Mz", "IsolationWindowTargetMZ", mz), 2)

vl <- c("mzmin", "mzmax")
if (all(vl %in% fvarLabels(x))) {
## Want to return a matrix, not a data.frame
cbind(mzmin = fData(x)[, vl[1]], mzmax = fData(x)[, vl[2]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not explicitly convert into a matrix (the return value of cbind depends on its input, would be a numeric matrix in our use case but who knows ...).

m <- as.matrix(fData(x)[, vl])
dimnames(m) <- list(NULL, c("mzmin", "mzmax"))
m

Or in one line:

as.matrix(setNames(fData(x)[, vl], c("mzmin", "mzmax")), rownames.force=FALSE)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check - I think there was something behind using cbind here (performance wise).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cbind is slightly faster than as.matrix. Not that it matters here, I just prefer using cbind and extract individual columns from a data.frame instead of anything that involves accessing multiple columns at once in a data.frame, as that might/can involve copying of the data, while accessing single columns of a data.frame never copies the data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, cool. I wasn't aware of that. You are right:

library("microbenchmark")
set.seed(2017)
n <- 1e5
d <- data.frame(a=sample(n), b=sample(n), c=sample(n))

f1 <- function(x, vl=c("a", "b"))cbind(mzmin=x[, vl[1L]], mzmax=x[, vl[2L]])
f2 <- function(x, vl=c("a", "b"))matrix(c(x[, vl[1L]], x[, vl[2L]]), ncol=2L,
                                        dimnames=list(NULL, c("mzmin", "mzmax")))

all.equal(f1(d), f2(d))
# [1] TRUE

microbenchmark(f1(d), f2(d))
# Unit: microseconds
#   expr     min       lq     mean   median      uq      max neval
#  f1(d) 112.814 122.4080 203.0495 126.7465 139.109 3693.441   100
#  f2(d) 505.302 523.0355 638.2103 526.0990 536.168 4133.861   100

## the values in one row are not identical
mzr <- matrix(nrow = nrow(x), ncol = 2,
dimnames = list(NULL, c("mzmin", "mzmax")))
for (i in 1:nrow(mzr)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#' @rdname Chromatograms-class
#'
#' @description \code{fData}: return the feature data as a \code{data.frame}.
setMethod("fData", "Chromatograms", function(object) pData(object@featureData))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pData correct (instead of fData)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, although confusing. pData,AnnotationDataFrame accesses the adf@data slot, but is also the accessor for object@phenoData@data slot where object is an eSet type instance.

Copy link
Owner

@lgatto lgatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - @sgibb did all the work already anyway :-).

@lgatto lgatto merged commit f620044 into master Dec 19, 2017
@lgatto
Copy link
Owner

lgatto commented Dec 19, 2017

@jotsetung - is this something that needs to be pushed to Bioc quickly?

(sorry, initially commented on wrong PR)

@jorainer
Copy link
Collaborator Author

Nope, no need to push to Bioc now. This is some preparatory work for the future readSRMData function to read mzML files with chromatogram data - but that depends on the related pull request in mzR (sneumann/mzR#142).

I'm confused - you did already merge? Then I'll change the requested minor stuff above in the master branch?

@lgatto
Copy link
Owner

lgatto commented Dec 19, 2017

Yes, I already merged, sorry.

What about readMRMData rather than readSRMData, or have the two that do the same thing? (SRMs and MRMs are essentially the same thing, as far as I know).

@jorainer
Copy link
Collaborator Author

yes, I believe SRM and MRM are the same thing - with SRM being the correct term. Having a readSRMData with an alias readMRMData could be OK. Hope its not confusing to the user.

jorainer added a commit that referenced this pull request Dec 20, 2017
@jorainer jorainer deleted the chromatogram_fdata branch December 20, 2017 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants