Skip to content

Commit

Permalink
refactor: modify featureArea function
Browse files Browse the repository at this point in the history
- Update `featureArea` function to consider all chromatographic peaks instead of
  just the one with the highes peak/apex intensity. As a consequence, returned
  m/z and rt ranges might be higher which affects `featureChromatograms`, and
  related, also EIC-based feature grouping. Related documentation was updated
  to describe this.
  • Loading branch information
jorainer committed Jan 18, 2024
1 parent bdc595d commit c8a27d5
Show file tree
Hide file tree
Showing 18 changed files with 286 additions and 111 deletions.
75 changes: 64 additions & 11 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -569,9 +569,15 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' [XCMSnExp-class] object. The function returns for each feature the
#' extracted ion chromatograms (along with all associated chromatographic
#' peaks) in each sample. The chromatogram is extracted from the m/z - rt
#' region including all chromatographic peaks of that features (i.e. based on
#' the ranges of `"mzmin"`, `"mzmax"`, `"rtmin"`, `"rtmax"` of all
#' chromatographic peaks of the feature).
#' region including all chromatographic peaks of that features. This region is
#' by default, with `mzmin = min`, `mzmax = max`, `rtmin = min` and
#' `rtmax = max` defined using the **ranges** of `"mzmin"`, `"mzmax"`,
#' `"rtmin"`, `"rtmax"` of all chromatographic peaks of the feature. For some
#' features, and depending on the data, the m/z and rt range can thus be
#' relatively large. The ranges could be restricted by using a different
#' function to define them, e.g. by setting `mzmin = median` and
#' `mzmax = median` in which case the median `"mzmin"` and `"mzmax"` values
#' of all chromatographic peaks would be used.
#'
#' By default only chromatographic peaks associated with a feature are
#' included. For `object` being a `XCMSnExp` object parameter `include`
Expand All @@ -582,6 +588,14 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#'
#' @note
#'
#' The **same** m/z and rt boundaries are used on every sample to extract
#' the ion chromatogram. The EIC might thus not exactly represent the actual
#' EIC of each individual chromatographic peak (i.e. signal for the ion in one
#' specific sample), since the m/z and rt boundaries might be slightly
#' different across samples. The [chromPeakChromatograms()] function could be
#' used to extract the actual EIC of the chromatographic peak in a specific
#' sample. See also examples below.
#'
#' Parameters `include`, `filled`, `n` and `value` are only supported
#' for `object` being an `XCMSnExp`.
#'
Expand Down Expand Up @@ -628,6 +642,16 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' Defaults to `"feature_only"`; See description above for options and
#' details.
#'
#' @param mzmax `function` defining how the upper boundary of the m/z region
#' from which the EIC is integrated should be defined. Defaults to
#' `mzmax = max` thus the largest `"mzmax"` value for all chromatographic
#' peaks of a feature will be used.
#'
#' @param mzmin `function` defining how the lower boundary of the m/z region
#' from which the EIC is integrated should be defined. Defaults to
#' `mzmin = min` thus the smallest `"mzmin"` value for all chromatographic
#' peaks of a feature will be used.
#'
#' @param n Only for `object` being an `XCMSnExp`: `integer(1)` to optionally
#' specify the number of *top n* samples from which the EIC should be
#' extracted.
Expand All @@ -642,6 +666,16 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' supported and the results are thus returned as an [XChromatograms()]
#' object.
#'
#' @param rtmax `function` defining how the upper boundary of the rt region
#' from which the EIC is integrated should be defined. Defaults to
#' `rtmax = max` thus the largest `"rtmax"` value for all chromatographic
#' peaks of a feature will be used.
#'
#' @param rtmin `function` defining how the lower boundary of the rt region
#' from which the EIC is integrated should be defined. Defaults to
#' `rtmin = min` thus the smallest `"rtmin"` value for all chromatographic
#' peaks of a feature will be used.
#'
#' @param value Only for `object` being an `XCMSnExp`: `character(1)`
#' specifying the column to be used to sort the samples. Can be either
#' `"maxo"` (the default) or `"into"` to use the maximal peak intensity
Expand Down Expand Up @@ -672,10 +706,8 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' ## Disable parallel processing for this example
#' register(SerialParam())
#'
#' ## Subset the object to a smaller retention time range
#' xdata <- filterRt(faahko_sub, c(2500, 3500))
#'
#' xdata <- groupChromPeaks(xdata,
#' ## Perform correspondence analysis
#' xdata <- groupChromPeaks(faahko_sub,
#' param = PeakDensityParam(minFraction = 0.8, sampleGroups = rep(1, 3)))
#'
#' ## Get the feature definitions
Expand All @@ -686,8 +718,28 @@ setGeneric("family<-", function(object, value) standardGeneric("family<-"))
#' chrs <- featureChromatograms(xdata,
#' features = rownames(featureDefinitions)[1:3])
#'
#' ## Plot the XIC for the first feature using different colors for each file
#' ## Plot the EIC for the first feature using different colors for each file.
#' plot(chrs[1, ], col = c("red", "green", "blue"))
#'
#' ## The EICs for all 3 samples use the same m/z and retention time range,
#' ## which was defined using the `featureArea` function:
#' featureArea(xdata, features = rownames(featureDefinitions(xdata))[1:3],
#' mzmin = min, mzmax = max, rtmin = min, rtmax = max)
#'
#' ## To extract the actual (exact) EICs for each chromatographic peak of
#' ## a feature in each sample, the `chromPeakChromatograms` function would
#' ## need to be used instead. Below we extract the EICs for all
#' ## chromatographic peaks of the first feature. We need to first get the
#' ## IDs of all chromatographic peaks assigned to the first feature:
#' peak_ids <- rownames(chromPeaks(xdata))[featureDefinitions(xdata)$peakidx[[1L]]]
#'
#' ## We can now pass these to the `chromPeakChromatograms` function with
#' ## parameter `peaks`:
#' eic_1 <- chromPeakChromatograms(xdata, peaks = peak_ids)
#'
#' ## To plot these into a single plot we need to use the
#' ## `plotChromatogramsOverlay` function:
#' plotChromatogramsOverlay(eic_1)
setGeneric("featureChromatograms", function(object, ...)
standardGeneric("featureChromatograms"))

Expand Down Expand Up @@ -800,8 +852,8 @@ setGeneric("filepaths<-", function(object, value) standardGeneric("filepaths<-")
#' `ChromPeakAreaParam`, the median `"mzmin"`, `"mzmax"`, `"rtmin"` and
#' `"rtmax"` values from all detected chromatographic peaks of a feature
#' would be used instead.
#' In contrast to the `FillChromPeaksParam` approach this method uses the
#' actual identified chromatographic peaks of a feature to define the area
#' In contrast to the `FillChromPeaksParam` approach this method uses (all)
#' identified chromatographic peaks of a feature to define the area
#' from which the signal should be integrated.
#'
#' @details
Expand Down Expand Up @@ -1822,7 +1874,8 @@ setGeneric("reconstructChromPeakSpectra", function(object, ...)
#'
#' @param ppm For `MergeNeighboringPeaksParam`: `numeric(1)` defining a m/z
#' relative value (in parts per million) by which the m/z range of each
#' chromatographic peak is expanded to check for overlapping peaks.
#' chromatographic peak is expanded (on each side) to check for overlapping
#' peaks.
#'
#' @param threshold For `FilterIntensityParam`: `numeric(1)` defining the
#' threshold below which peaks are removed.
Expand Down
72 changes: 67 additions & 5 deletions R/XcmsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -934,15 +934,77 @@

#' @rdname XcmsExperiment
featureArea <- function(object, mzmin = min, mzmax = max, rtmin = min,
rtmax = max, msLevel = integer(),
features = character()) {
rtmax = max, features = character()) {
if (!hasFeatures(object))
stop("No correspondence results available. Please run ",
"'groupChromPeaks' first.")
if (!length(msLevel))
msLevel <- seq_len(10)
if (!length(features))
features <- rownames(featureDefinitions(object))
.features_ms_region(object, mzmin = mzmin, mzmax = mzmax, rtmin = rtmin,
rtmax = rtmax, msLevel = msLevel, features = features)
rtmax = rtmax, features = features)
}

#' @title Define MS regions for features
#'
#' @param x `XcmsExperiment` or `XCMSnExp`.
#'
#' @param mzmin, mzmax, rtmin, rtmax `function` to be applied to the values
#' (rtmin, ...) of the chrom peaks. Defaults to `median` but would also
#' work with `mean` etc.
#'
#' @param features `character` with the IDs of the features. Mandatory!
#'
#' @return `matrix` with columns `"mzmin"`, `"mzmax"`, `"rtmin"`, `"rtmax"`
#' defining the range of
#'
#' @author Johannes Rainer
#'
#' @noRd
.features_ms_region <- function(x, mzmin = median, mzmax = median,
rtmin = median, rtmax = median,
features = character()) {
features <- .i2index(features, ids = rownames(featureDefinitions(x)),
"features")
pks <- .chromPeaks(x)[, c("mzmin", "mzmax", "rtmin", "rtmax")]
res <- do.call(
rbind, lapply(featureDefinitions(x)$peakidx[features],
function(i) {
## maybe consider/drop gap-filled peaks?
c(mzmin(pks[i, "mzmin"]),
mzmax(pks[i, "mzmax"]),
rtmin(pks[i, "rtmin"]),
rtmax(pks[i, "rtmax"]))
}))
rownames(res) <- rownames(featureDefinitions(x))[features]
colnames(res) <- c("mzmin", "mzmax", "rtmin", "rtmax")
res
}

.features_ms_region_old <- function(x, mzmin = median, mzmax = median,
rtmin = median, rtmax = median,
msLevel = 1L,
features = character()) {
pk_idx <- featureValues(x, value = "index", method = "maxint",
msLevel = msLevel)
if (length(features)) {
## here's a silent bug: should consider msLevel also below.
features <- .i2index(
features, ids = rownames(featureDefinitions(x)), "features")
pk_idx <- pk_idx[features, , drop = FALSE]
}
n_ft <- nrow(pk_idx)
rt_min <- rt_max <- mz_min <- mz_max <- numeric(n_ft)
for (i in seq_len(n_ft)) {
idx <- pk_idx[i, ]
tmp_pks <- .chromPeaks(x)[idx[!is.na(idx)], , drop = FALSE]
rt_min[i] <- rtmin(tmp_pks[, "rtmin"])
rt_max[i] <- rtmax(tmp_pks[, "rtmax"])
mz_min[i] <- mzmin(tmp_pks[, "mzmin"])
mz_max[i] <- mzmax(tmp_pks[, "mzmax"])
}
res <- cbind(mzmin = mz_min, mzmax = mz_max, rtmin = rt_min, rtmax = rt_max)
rownames(res) <- rownames(pk_idx)
res
}

#' *Reconstruct* MS2 spectra for DIA data:
Expand Down
20 changes: 11 additions & 9 deletions R/XcmsExperiment.R
Original file line number Diff line number Diff line change
Expand Up @@ -245,9 +245,7 @@
#' `"rtmin"` and `"rtmax"` with the m/z and retention time range for each
#' feature (row) in `object`. By default these represent the minimal m/z
#' and retention times as well as maximal m/z and retention times for
#' the chromatographi peaks assigned to that feature. Note that if in
#' one sample more than one chromatographic peak is assigned to a feature
#' only the one with the higher intensity is considered. Parameter
#' all chromatographic peaks assigned to that feature. Parameter
#' `features` allows to extract these values for selected features only.
#' Parameters `mzmin`, `mzmax`, `rtmin` and `rtmax` allow to define
#' the function to calculate the reported `"mzmin"`, `"mzmax"`, `"rtmin"`
Expand Down Expand Up @@ -408,7 +406,7 @@
#'
#' @param features For `filterFeatureDefinitions` and `featureArea`: `logical`,
#' `integer` or `character` defining the features to keep or from which
#' to extract the feature are, respectively. See function description
#' to extract the feature area, respectively. See function description
#' for more information.
#'
#' @param file For `filterFile`: `integer` with the indices of the samples
Expand Down Expand Up @@ -1623,12 +1621,13 @@ setMethod(
"featureChromatograms", "XcmsExperiment",
function(object, expandRt = 0, expandMz = 0, aggregationFun = "max",
features = character(), return.type = "XChromatograms",
chunkSize = 2L, ..., progressbar = TRUE, BPPARAM = bpparam()) {
chunkSize = 2L, mzmin = min, mzmax = max, rtmin = min,
rtmax = max, ..., progressbar = TRUE, BPPARAM = bpparam()) {
return.type <- match.arg(return.type)
if (hasAdjustedRtime(object))
object <- applyAdjustedRtime(object)
area <- featureArea(object, mzmin = min, mzmax = max, rtmin = min,
rtmax = max, features = features, msLevel = 1:10)
area <- featureArea(object, mzmin = mzmin, mzmax = mzmax, rtmin = rtmin,
rtmax = rtmax, features = features)
if (expandRt != 0) {
area[, "rtmin"] <- area[, "rtmin"] - expandRt
area[, "rtmax"] <- area[, "rtmax"] + expandRt
Expand Down Expand Up @@ -1760,11 +1759,14 @@ setMethod(
if (!hasFeatures(object, msLevel = msLevel))
stop("No feature definitions for MS level ", msLevel, " present.")
## Define region to integrate from for each file
feature_ids <- rownames(featureDefinitions(object, msLevel = msLevel))
fr <- .features_ms_region(object, mzmin = param@mzmin,
mzmax = param@mzmax, rtmin = param@rtmin,
rtmax = param@rtmax, msLevel = msLevel)
fr <- cbind(fr, mzmed = featureDefinitions(object)$mzmed)
rtmax = param@rtmax, features = feature_ids)
fr <- cbind(
fr, mzmed = featureDefinitions(object, msLevel = msLevel)$mzmed)
fvals <- featureValues(object, value = "index", msLevel = msLevel)
## For each sample, keep features with some missing values.
pal <- lapply(seq_len(ncol(fvals)), function(i) {
fr[is.na(fvals[, i]), , drop = FALSE]
})
Expand Down
13 changes: 7 additions & 6 deletions R/functions-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -349,12 +349,13 @@ MergeNeighboringPeaksParam <- function(expandRt = 2, expandMz = 0, ppm = 10,
}

#' @rdname fillChromPeaks
ChromPeakAreaParam <- function(mzmin = function(z) quantile(z, probs = 0.25),
mzmax = function(z) quantile(z, probs = 0.75),
rtmin = function(z) quantile(z, probs = 0.25),
rtmax = function(z) quantile(z, probs = 0.75)) {
new("ChromPeakAreaParam", mzmin = mzmin, mzmax = mzmax, rtmin = rtmin,
rtmax = rtmax)
ChromPeakAreaParam <-
function(mzmin = function(z) quantile(z, probs = 0.25, names = FALSE),
mzmax = function(z) quantile(z, probs = 0.75, names = FALSE),
rtmin = function(z) quantile(z, probs = 0.25, names = FALSE),
rtmax = function(z) quantile(z, probs = 0.75, names = FALSE)) {
new("ChromPeakAreaParam", mzmin = mzmin, mzmax = mzmax, rtmin = rtmin,
rtmax = rtmax)
}

#' @rdname refineChromPeaks
Expand Down
41 changes: 0 additions & 41 deletions R/functions-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -2014,47 +2014,6 @@ setMethod("hasFilledChromPeaks", "XCMSnExp", function(object) {
nobject
}

#' Define the MS region (m/z - rt range) for each feature based on the rtmin,
#' rtmax, mzmin, mzmax of the corresponding detected peaks.
#'
#' @param x `XCMSnExp` object
#'
#' @param mzmin, mzmax, rtmin, rtmax `function` to be applied to the values
#' (rtmin, ...) of the chrom peaks. Defaults to `median` but would also
#' work with `mean` etc.
#'
#' @return `matrix` with columns `"mzmin"`, `"mzmax"`, `"rtmin"`, `"rtmax"`
#' defining the range of
#'
#' @author Johannes Rainer
#'
#' @noRd
.features_ms_region <- function(x, mzmin = median, mzmax = median,
rtmin = median, rtmax = median,
msLevel = unique(msLevel(x)),
features = character()) {
pk_idx <- featureValues(x, value = "index", method = "maxint",
msLevel = msLevel)
if (length(features)) {
features <- .i2index(
features, ids = rownames(featureDefinitions(x)), "features")
pk_idx <- pk_idx[features, , drop = FALSE]
}
n_ft <- nrow(pk_idx)
rt_min <- rt_max <- mz_min <- mz_max <- numeric(n_ft)
for (i in seq_len(n_ft)) {
idx <- pk_idx[i, ]
tmp_pks <- chromPeaks(x)[idx[!is.na(idx)], , drop = FALSE]
rt_min[i] <- rtmin(tmp_pks[, "rtmin"])
rt_max[i] <- rtmax(tmp_pks[, "rtmax"])
mz_min[i] <- mzmin(tmp_pks[, "mzmin"])
mz_max[i] <- mzmax(tmp_pks[, "mzmax"])
}
res <- cbind(mzmin = mz_min, mzmax = mz_max, rtmin = rt_min, rtmax = rt_max)
rownames(res) <- rownames(pk_idx)
res
}

#' @param x `XCMSnExp` object of a single file.
#'
#' @param nValues `integer(1)` defining the number of values that have to be above
Expand Down
5 changes: 4 additions & 1 deletion R/methods-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -2542,9 +2542,12 @@ setMethod("fillChromPeaks",
startDate <- date()
message("Defining peak areas for filling-in .",
appendLF = FALSE)
feature_ids <- rownames(featureDefinitions(object,
msLevel = msLevel))
fts_region <- .features_ms_region(
object, mzmin = param@mzmin, mzmax = param@mzmax,
rtmin = param@rtmin, rtmax = param@rtmax, msLevel = msLevel)
rtmin = param@rtmin, rtmax = param@rtmax,
features = feature_ids)
fts_region <- cbind(group_idx = seq_len(nrow(fts_region)),
fts_region,
mzmed = featureDefinitions(object)$mzmed)
Expand Down
12 changes: 12 additions & 0 deletions R/methods-group-features.R
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,12 @@ plotFeatureGroups <- function(x, xlim = numeric(), ylim = numeric(),
#'
#' @note
#'
#' At present the [featureChromatograms()] function is used to extract the
#' EICs for each feature, which does however use one m/z and rt range for
#' each feature and the EICs do thus not exactly represent the identified
#' chromatographic peaks of each sample (i.e. their specific m/z and
#' retention time ranges).
#'
#' While being possible to be performed on the full data set without prior
#' feature grouping, this is not suggested for the following reasons: I) the
#' selection of the top `n` samples with the highest signal for the
Expand Down Expand Up @@ -932,6 +938,12 @@ setMethod(
}
if (length(idx) > 1) {
eics <- do.call(
## NOTE: chromPeakChromatograms should be used here
## instead, since featureChromatograms uses the same
## m/z - rt boundary for all EICs and does thus not
## exactly represent the chromatographic peaks.
## TODO: check if we can't use chromPeakChromatograms
## instead (slower performance?).
featureChromatograms,
args = c(list(obj_sub, features = rownames(fvals)[idx],
filled = TRUE, progressbar = FALSE),
Expand Down
7 changes: 7 additions & 0 deletions inst/NEWS
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
Changes in version 4.1.6
----------------------

- Update `featureArea` function to consider all chromatographic peaks per
feature, not only the one with the highest intensity. As a consequence,
returned m/z and rt ranges might be higher which has an influence in
`featureChromatograms`, EIC-based feature grouping and, to a lesser extent
also in gap-filling. Related documentation was updated.
- Improve performance of the `featureArea` function (and related of the
`PeakAreaParam`-based gap filling).
- Add parameter `ppm` to `PeakDensityParam` to enable peak-density-based
correspondence throgh m/z-dependent bins along the m/z.

Expand Down
Loading

0 comments on commit c8a27d5

Please sign in to comment.