Skip to content

Commit

Permalink
feat: add parameter ppm to PeakDensityParam
Browse files Browse the repository at this point in the history
- Add parameter `ppm` to `PeakDensityParam` to enable m/z-dependent bin sizes
  for the correspondence analysis (issue #711).
  • Loading branch information
jorainer committed Jan 16, 2024
1 parent 548c248 commit 46039ff
Show file tree
Hide file tree
Showing 18 changed files with 146 additions and 46 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: xcms
Version: 4.1.5
Version: 4.1.6
Title: LC-MS and GC-MS Data Analysis
Description: Framework for processing and visualization of chromatographically
separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF,
Expand Down Expand Up @@ -87,7 +87,7 @@ URL: https://github.com/sneumann/xcms
BugReports: https://github.com/sneumann/xcms/issues/new
VignetteBuilder: knitr
biocViews: ImmunoOncology, MassSpectrometry, Metabolomics
RoxygenNote: 7.2.3
RoxygenNote: 7.3.0
Collate:
'AllGenerics.R'
'functions-XChromatograms.R'
Expand Down
20 changes: 16 additions & 4 deletions R/AllGenerics.R
Original file line number Diff line number Diff line change
Expand Up @@ -1235,10 +1235,16 @@ setGeneric("group", function(object, ...) standardGeneric("group"))
#'
#' - `PeakDensityParam`: correspondence using the *peak density* method
#' (Smith 2006) that groups chromatographic peaks along the retention time
#' axis within slices of (partially overlapping) m/z ranges. All peaks (from
#' the same or from different samples) with their apex position being close
#' on the retention time axis are grouped into a LC-MS feature. See in
#' addition [do_groupChromPeaks_density()] for the core API function.
#' axis within slices of (partially overlapping) m/z ranges. By default,
#' these m/z ranges (bins) have a constant size. By setting `ppm` to a value
#' larger than 0, m/z dependent bin sizes can be used instead (better
#' representing the m/z dependent measurement error of some MS instruments).
#' Setting `ppm` to a value different than 0 results thus in m/z dependent
#' bin sizes.
#' All peaks (from the same or from different samples) with their apex
#' position being close on the retention time axis are grouped into a LC-MS
#' feature. See in addition [do_groupChromPeaks_density()] for the core API
#' function.
#'
#' - `NearestPeaksParam`: performs peak grouping based on the proximity of
#' chromatographic peaks from different samples in the m/z - rt space similar
Expand Down Expand Up @@ -1304,6 +1310,12 @@ setGeneric("group", function(object, ...) standardGeneric("group"))
#'
#' @param ppm For `MzClustParam`: `numeric(1)` representing the relative m/z
#' error for the clustering/grouping (in parts per million).
#' For `PeakDensityParam`: `numeric(1)` to define m/z-dependent, increasing
#' m/z bin sizes. If `ppm = 0` (the default) m/z bins are defined by the
#' sequence of values from the smallest to the larges m/z value with a
#' constant bin size of `binSize`. For `ppm` > 0 the size of each bin is
#' increased in addition by the `ppm` of the (upper) m/z boundary of the
#' bin.
#'
#' @param param The parameter object selecting and configuring the algorithm.
#'
Expand Down
8 changes: 5 additions & 3 deletions R/DataClasses.R
Original file line number Diff line number Diff line change
Expand Up @@ -441,13 +441,13 @@ setClass("XProcessHistory",
#' method to extend the EIC to a integer base-2 length prior to being passed to
#' \code{convolve} rather than the default "reflect" method. See
#' https://github.com/sneumann/xcms/issues/445 for more information.
#'
#'
#' @param verboseBetaColumns Option to calculate two additional metrics of peak
#' quality via comparison to an idealized bell curve. Adds \code{beta_cor} and
#' \code{beta_snr} to the \code{chromPeaks} output, corresponding to a Pearson
#' correlation coefficient to a bell curve with several degrees of skew as well
#' as an estimate of signal-to-noise using the residuals from the best-fitting
#' bell curve. See https://github.com/sneumann/xcms/pull/685 and
#' bell curve. See https://github.com/sneumann/xcms/pull/685 and
#' https://doi.org/10.1186/s12859-023-05533-4 for more information.
#'
#' @details
Expand Down Expand Up @@ -1314,14 +1314,16 @@ setClass("PeakDensityParam",
minFraction = "numeric",
minSamples = "numeric",
binSize = "numeric",
maxFeatures = "numeric"),
maxFeatures = "numeric",
ppm = "numeric"),
contains = "Param",
prototype = prototype(
sampleGroups = numeric(),
bw = 30,
minFraction = 0.5,
minSamples = 1,
binSize = 0.25,
ppm = 0,
maxFeatures = 50),
validity = function(object) {
msg <- character()
Expand Down
3 changes: 2 additions & 1 deletion R/XcmsExperiment-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,8 @@
cp, sampleGroups = sampleGroups(param), bw = bw(param),
minFraction = minFraction(param),
minSamples = minSamples(param), binSize = binSize(param),
maxFeatures = maxFeatures(param), index = index)
maxFeatures = maxFeatures(param), ppm = ppm(param),
index = index)
},
MzClustParam = {
tmp <- do_groupPeaks_mzClust(
Expand Down
53 changes: 42 additions & 11 deletions R/do_groupChromPeaks-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,31 @@
#'
#' The `do_groupChromPeaks_density` function performs chromatographic peak
#' grouping based on the density (distribution) of peaks, found in different
#' samples, along the retention time axis in slices of overlapping mz ranges.
#' samples, along the retention time axis in slices of overlapping m/z ranges.
#' By default (with parameter `ppm = 0`) these m/z ranges have all the same
#' (constant) size (depending on parameter `binSize`). For values of `ppm`
#' larger than 0 the m/z bins (ranges or slices) will have increasing sizes
#' depending on the m/z value. This better models the m/z-dependent
#' measurement error/precision seen on some MS instruments.
#'
#' @details For overlapping slices along the mz dimension, the function
#' @details
#'
#' For overlapping slices along the mz dimension, the function
#' calculates the density distribution of identified peaks along the
#' retention time axis and groups peaks from the same or different samples
#' that are close to each other. See (Smith 2006) for more details.
#'
#' @note The default settings might not be appropriate for all LC/GC-MS setups,
#' @note
#'
#' The default settings might not be appropriate for all LC/GC-MS setups,
#' especially the `bw` and `binSize` parameter should be adjusted
#' accordingly.
#'
#' @param peaks A `matrix` or `data.frame` with the mz values and
#' retention times of the identified chromatographic peaks in all samples of an
#' experiment. Required columns are `"mz"`, `"rt"` and
#' `"sample"`. The latter should contain `numeric` values representing
#' the index of the sample in which the peak was found.
#' retention times of the identified chromatographic peaks in all samples
#' of an experiment. Required columns are `"mz"`, `"rt"` and
#' `"sample"`. The latter should contain `numeric` values representing
#' the index of the sample in which the peak was found.
#'
#' @param index An optional `integer` providing the indices of the peaks in the
#' original peak matrix.
Expand Down Expand Up @@ -83,7 +92,8 @@ do_groupChromPeaks_density <- function(peaks, sampleGroups,
bw = 30, minFraction = 0.5,
minSamples = 1, binSize = 0.25,
maxFeatures = 50, sleep = 0,
index = seq_len(nrow(peaks))) {
index = seq_len(nrow(peaks)),
ppm = 0) {
if (missing(sampleGroups))
stop("Parameter 'sampleGroups' is missing! This should be a vector of ",
"length equal to the number of samples specifying the group ",
Expand Down Expand Up @@ -120,9 +130,10 @@ do_groupChromPeaks_density <- function(peaks, sampleGroups,
rtRange <- range(peaks[, "rt"])

## Define the mass slices and the index in the peaks matrix with an mz
## value >= mass[i].
mass <- seq(peaks[1, "mz"], peaks[nrow(peaks), "mz"] + binSize,
by = binSize / 2)
## value >= mass[i]. If ppm != 0 the size of the individual bins will
## be dependend on the m/z value.
mass <- .breaks_ppm(peaks[1, "mz"], peaks[nrow(peaks), "mz"] + binSize,
by = binSize / 2, ppm = ppm)
masspos <- findEqualGreaterM(peaks[, "mz"], mass)

densFrom <- rtRange[1] - 3 * bw
Expand Down Expand Up @@ -551,3 +562,23 @@ do_groupChromPeaks_nearest <- function(peaks, sampleGroups, mzVsRtBalance = 10,
}
x
}

#' Generate a sequence of values with increasing difference between consecutive
#' values. The difference is increased using the `ppm`.
#'
#' Add that also to MsCoreUtils and eventually replace
#'
#' @noRd
.breaks_ppm <- function(from = 1, to = 1, by = 1, ppm = 0) {

This comment has been minimized.

Copy link
@jorainer

jorainer Jan 16, 2024

Author Collaborator

Once the PR in MsCoreUtils is accepted we can remove this function here.

l <- ceiling((to - from + 1) / by)
res <- rep(NA_real_, l)
res[1L] <- a <- from
i <- 2L
while(a < to) {
a <- a + by
a <- a + (a * ppm * 1e-6)
res[i] <- a
i <- i + 1L
}
res[!is.na(res)]
}
10 changes: 5 additions & 5 deletions R/functions-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ CentWaveParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise,
verboseColumns = verboseColumns, roiList = roiList,
firstBaselineCheck = firstBaselineCheck, roiScales = roiScales,
extendLengthMSW = extendLengthMSW,
extendLengthMSW = extendLengthMSW,
verboseBetaColumns=verboseBetaColumns))
}

Expand Down Expand Up @@ -223,7 +223,7 @@ CentWavePredIsoParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise,
verboseColumns = verboseColumns, roiList = roiList,
firstBaselineCheck = firstBaselineCheck, roiScales = roiScales,
extendLengthMSW = extendLengthMSW,
extendLengthMSW = extendLengthMSW,
verboseBetaColumns = verboseBetaColumns,
snthreshIsoROIs = snthreshIsoROIs, maxIso = as.integer(maxIso),
maxCharge = as.integer(maxCharge),
Expand All @@ -232,14 +232,14 @@ CentWavePredIsoParam <- function(ppm = 25, peakwidth = c(20, 50), snthresh = 10,

#' @rdname groupChromPeaks
PeakDensityParam <- function(sampleGroups = numeric(), bw = 30,
minFraction = 0.5, minSamples = 1,
binSize = 0.25, maxFeatures = 50) {
minFraction = 0.5, minSamples = 1,
binSize = 0.25, ppm = 0, maxFeatures = 50) {
if (length(sampleGroups) == 0 | any(is.na(sampleGroups)))
stop("Argument 'sampleGroups' has to be defined. It should not ",
"contain 'NA's")
new("PeakDensityParam", sampleGroups = sampleGroups, bw = bw,
minFraction = minFraction, minSamples = minSamples,
binSize = binSize, maxFeatures = maxFeatures)
binSize = binSize, ppm = ppm, maxFeatures = maxFeatures)
}

#' @rdname groupChromPeaks
Expand Down
7 changes: 7 additions & 0 deletions R/methods-Params.R
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,13 @@ setReplaceMethod("maxFeatures", "PeakDensityParam", function(object, value) {
return(object)
})

#' @rdname groupChromPeaks
setMethod("ppm", "PeakDensityParam", function(object) {
if (.hasSlot(object, "ppm"))

This comment has been minimized.

Copy link
@jorainer

jorainer Jan 16, 2024

Author Collaborator

ensure backward compatibility, e.g. if objects are loaded that were saved with a previous version and hence the slot @ppm was not yet defined.

object@ppm
else 0.0
})


############################################################
## MzClustParam
Expand Down
1 change: 1 addition & 0 deletions R/methods-XCMSnExp.R
Original file line number Diff line number Diff line change
Expand Up @@ -1482,6 +1482,7 @@ setMethod("groupChromPeaks",
minFraction = minFraction(param),
minSamples = minSamples(param),
binSize = binSize(param),
ppm = ppm(param),
maxFeatures = maxFeatures(param))
xph <- XProcessHistory(param = param, date. = startDate,
type. = .PROCSTEP.PEAK.GROUPING,
Expand Down
6 changes: 6 additions & 0 deletions inst/NEWS
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
Changes in version 4.1.6
----------------------

- Add parameter `ppm` to `PeakDensityParam` to enable peak-density-based
correspondence throgh m/z-dependent bins along the m/z.

Changes in version 4.1.5
----------------------

Expand Down
2 changes: 1 addition & 1 deletion man/do_findChromPeaks_centWave.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/do_findChromPeaks_centWaveWithPredIsoROIs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

23 changes: 19 additions & 4 deletions man/do_groupChromPeaks_density.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/do_groupChromPeaks_nearest.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 9 additions & 3 deletions man/do_groupPeaks_mzClust.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/findChromPeaks-centWave.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/findChromPeaks-centWaveWithPredIsoROIs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 46039ff

Please sign in to comment.