diff --git a/DESCRIPTION b/DESCRIPTION index 607103dc7..5ba9178ae 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: xcms -Version: 4.1.11 +Version: 4.1.12 Title: LC-MS and GC-MS Data Analysis Description: Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, @@ -39,7 +39,11 @@ Authors@R: c( person(given = "Pablo", family = "Vangeenderhuysen", email = "pablo.vangeenderhuysen@ugent.be", role = "ctb", - comment = c(ORCID = "0000-0002-5492-6904")) + comment = c(ORCID = "0000-0002-5492-6904")), + person(given = "Carl", family = "Brunius", + email = "carl.brunius@chalmers.se", + role = "ctb", + comment = c(ORCID = 0000-0003-3957-870X)) ) Depends: R (>= 4.0.0), @@ -56,7 +60,7 @@ Imports: S4Vectors, IRanges, SummarizedExperiment, - MsCoreUtils (>= 1.15.3), + MsCoreUtils (>= 1.15.5), MsFeatures, MsExperiment (>= 1.5.4), Spectra (>= 1.13.7), @@ -79,7 +83,8 @@ Suggests: RANN, multtest, MsBackendMgf, - signal + signal, + mgcv Enhances: Rgraphviz, rgl diff --git a/NAMESPACE b/NAMESPACE index 49aa0b7e2..1592c843d 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -33,7 +33,7 @@ importFrom("SummarizedExperiment", "rowData") importFrom("SummarizedExperiment", "rowData<-") importFrom("SummarizedExperiment", "assay") importFrom("MsCoreUtils", "rbindFill", "closest", "i2index", "sumi", "between", - "maxi", "breaks_ppm") + "maxi", "breaks_ppm", "force_sorted") importFrom("RColorBrewer", "brewer.pal") importFrom("graphics", "image", "boxplot", "matplot", "rect", "axis", @@ -49,7 +49,7 @@ importFrom("stats", "aov", "approx", "convolve", "cor", "deriv3", "dist", "fft", "fitted", "lm", "loess", "lsfit", "median", "na.omit", "nextn", "nls", "predict", "pt", "quantile", "runmed", "sd", "stepfun", "weighted.mean", "density", "approxfun", - "rnorm", "runif", "dbeta") + "rnorm", "runif", "dbeta", "resid") importFrom("utils", "flush.console", "head", "object.size", "packageVersion", "read.csv", "tail", "write.csv", "write.table", "capture.output", "data") @@ -251,7 +251,10 @@ export( "groupOverlaps", "estimatePrecursorIntensity", "featureArea", - "loadXcmsData" + "loadXcmsData", + "matchLamasChromPeaks", + "summarizeLamaMatch", + "matchedRtimes" ) ## New analysis methods @@ -275,6 +278,7 @@ exportClasses( "MzClustParam", "NearestPeaksParam", "PeakGroupsParam", + "LamaParama", "ObiwarpParam", "GenericParam", "FillChromPeaksParam", @@ -444,6 +448,7 @@ export("CentWaveParam", "MzClustParam", "NearestPeaksParam", "PeakGroupsParam", + "LamaParama", "ObiwarpParam", "GenericParam", "FillChromPeaksParam", @@ -593,7 +598,7 @@ exportMethods("storeResults") ## filtering features things importFrom("MetaboCoreUtils", "rowRsd", "rowDratio", "rowPercentMissing", - "rowBlank") + "rowBlank", "mclosest") export("RsdFilter") export("DratioFilter") export("PercentMissingFilter") diff --git a/R/AllGenerics.R b/R/AllGenerics.R index 966130d2f..2d69b37bf 100644 --- a/R/AllGenerics.R +++ b/R/AllGenerics.R @@ -19,14 +19,14 @@ setGeneric("addProcessHistory", function(object, ...) #' @description #' #' The `adjustRtime` method(s) perform retention time correction (alignment) -#' between chromatograms of different samples. Alignment is performed by defaul -#' on MS level 1 data. Retention times of spectra from other MS levels, if -#' present, are subsequently adjusted based on the adjusted retention times -#' of the MS1 spectra. Note that calling `adjustRtime` on a *xcms* result object -#' will remove any eventually present previous alignment results as well as -#' any correspondence analysis results. To run a second round of alignment, -#' raw retention times need to be replaced with adjusted ones using the -#' [applyAdjustedRtime()] function. +#' between chromatograms of different samples/dataset. Alignment is performed +#' by default on MS level 1 data. Retention times of spectra from other MS +#' levels, if present, are subsequently adjusted based on the adjusted +#' retention times of the MS1 spectra. Note that calling `adjustRtime` on a +#' *xcms* result object will remove any eventually present previous alignment +#' results as well as any correspondence analysis results. To run a second +#' round of alignment, raw retention times need to be replaced with adjusted +#' ones using the [applyAdjustedRtime()] function. #' #' The alignment method can be specified (and configured) using a dedicated #' `param` argument. @@ -40,7 +40,7 @@ setGeneric("addProcessHistory", function(object, ...) #' The alignment is performed directly on the [profile-matrix] and can hence #' be performed independently of the peak detection or peak grouping. #' -#' - `PeakGroupsParam`: performs retention time correctoin based on the +#' - `PeakGroupsParam`: performs retention time correction based on the #' alignment of features defined in all/most samples (corresponding to #' *house keeping compounds* or marker compounds) (Smith 2006). First the #' retention time deviation of these features is described by fitting either a @@ -60,6 +60,15 @@ setGeneric("addProcessHistory", function(object, ...) #' in `param`. See also [do_adjustRtime_peakGroups()] for the core API #' function. #' +#' - `LamaParama`: This function performs retention time correction by aligning +#' chromatographic data to an external reference dataset (concept and initial +#' implementation by Carl Brunius). The process involves identifying and +#' aligning peaks within the experimental chromatographic data, represented +#' as an `XcmsExperiment` object, to a predefined set of landmark features +#' called "lamas". These landmark features are characterized by their +#' mass-to-charge ratio (m/z) and retention time. see [LamaParama()] for more +#' information on the method. +#' #' @section Subset-based alignment: #' #' All alignment methods allow to perform the retention time correction on a @@ -189,9 +198,9 @@ setGeneric("addProcessHistory", function(object, ...) #' be used to interpolate corrected retention times for all peak groups. #' Can be either `"loess"` or `"linear"`. #' -#' @param span For `PeakGroupsParam`: `numeric(1)` defining the degree of -#' smoothing (if `smooth = "loess"`). This parameter is passed to the -#' internal call to [loess()]. +#' @param span For `PeakGroupsParam`: `numeric(1)` defining +#' the degree of smoothing (if `smooth = "loess"`). This parameter is +#' passed to the internal call to [loess()]. #' #' @param subset For `ObiwarpParam` and `PeakGroupsParam`: `integer` with the #' indices of samples within the experiment on which the alignment models @@ -206,7 +215,7 @@ setGeneric("addProcessHistory", function(object, ...) #' #' @param value For all assignment methods: the value to set/replace. #' -#' @param x An `ObiwarpParam` or `PeakGroupsParam` object. +#' @param x An `ObiwarpParam`, `PeakGroupsParam` or `LamaParama` object. #' #' @param ... ignored. #' @@ -219,7 +228,8 @@ setGeneric("addProcessHistory", function(object, ...) #' `XcmsExperiment` with the adjusted retention times stored in an new #' *spectra variable* `rtime_adjusted` in the object's `spectra`. #' -#' `ObiwarpParam` and `PeakGroupsParam` return the respective parameter object. +#' `ObiwarpParam`, `PeakGroupsParam` and `LamaParama` return the respective +#' parameter object. #' #' `adjustRtimeGroups` returns a `matrix` with the retention times of *marker* #' features in each sample (each row one feature, each row one sample). @@ -230,7 +240,7 @@ setGeneric("addProcessHistory", function(object, ...) #' #' @seealso [plotAdjustedRtime()] for visualization of alignment results. #' -#' @author Colin Smith, Johannes Rainer +#' @author Colin Smith, Johannes Rainer, Philippine Louail, Carl Brunius #' #' @references #' diff --git a/R/DataClasses.R b/R/DataClasses.R index e767698f5..fab0cc414 100644 --- a/R/DataClasses.R +++ b/R/DataClasses.R @@ -1471,6 +1471,45 @@ setClass("PeakGroupsParam", else TRUE }) +setClass("LamaParama", + slots = c(lamas = "matrix", + method = "character", + span = "numeric", + outlierTolerance = "numeric", + zeroWeight = "numeric", + ppm = "numeric", + tolerance = "numeric", + toleranceRt = "numeric", + bs = "character", + rtMap = "list", + nChromPeaks = "numeric"), + contains = "Param", + prototype = prototype( + lamas = matrix(ncol = 2, nrow = 0), + method = "loess", + span = 0.5, + outlierTolerance = 3, + zeroWeight = 10, + ppm = 20, + tolerance = 0, + toleranceRt = 20, + bs = "tp", + rtMap = list(), + nChromPeaks = numeric()), + validity = function(object) { + msg <- NULL + if (!nrow(object@lamas)) + msg <- c(msg, paste0("'lamas' cannot be empty")) + else { + } + if (length(object@method) > 1 | + !all(object@method %in% c("gam", "loess"))) + msg <- c(msg, paste0("'method' has to be either \"", + "gam\" or \"loess\"!")) + msg + }) + + setClass("ObiwarpParam", slots = c(binSize = "numeric", centerSample = "integer", diff --git a/R/XcmsExperiment.R b/R/XcmsExperiment.R index 7b3672046..1c86f0496 100644 --- a/R/XcmsExperiment.R +++ b/R/XcmsExperiment.R @@ -1357,6 +1357,66 @@ setMethod( object }) +#'@rdname LamaParama +setMethod( + "adjustRtime", signature(object = "XcmsExperiment", param = "LamaParama"), + function(object, param, BPPARAM = bpparam(), ...) { + if (!hasChromPeaks(object)) + stop("'object' needs to have detected chromPeaks. ", + "Run 'findChromPeaks()' first") + if (hasAdjustedRtime(object)) + stop("Alignment results already present. Please either remove ", + "them with 'dropAdjustedRtime' in order to perform an ", + "alternative, new, alignment, or use 'applyAdjustedRtime'", + " prior 'adjustRtime' to perform a second round of ", + "alignment.") + fidx <- as.factor(fromFile(object)) + rt_raw <- split(rtime(object), fidx) + idx <- seq_along(object) + + # Check if user as ran matching lama vs chrompeaks beforehand + if (length(param@rtMap) == 0) + param <- matchLamasChromPeaks(object, param) + rtMap <- param@rtMap + if (length(rtMap) != length(object)) + stop("Mismatch between the number of files matched to lamas: ", + length(rtMap), " and files in the object: ", length(object)) + + # Make model and adjust retention for each file + rt_adj <- bpmapply(rtMap, rt_raw, idx, FUN = function(x, y, i, param) { + if (nrow(x) >= 10) { # too strict ? Gam always throws error when less than that and loess does not work that well either. + .adjust_rt_model(y, method = param@method, + rt_map = x, span = param@span, + resid_ratio = param@outlierTolerance, + zero_weight = param@zeroWeight, + bs = param@bs) + } else { + warning("Too few chrompeaks could be assigned to external", + " reference peaks (lamas) for sample ", i, + ". Skipping alignment for this sample.") + y + } + }, SIMPLIFY = FALSE, BPPARAM = BPPARAM, MoreArgs = list(param = param)) + + # post processing housekeeping steps + pt <- vapply(object@processHistory, processType, character(1)) + idx_pg <- .match_last(.PROCSTEP.PEAK.GROUPING, pt, + nomatch = -1L) + if (idx_pg > 0) + ph <- object@processHistory[idx_pg] + else ph <- list() + object <- dropFeatureDefinitions(object) + object@spectra$rtime_adjusted <- unlist(rt_adj, use.names = FALSE) + object@chromPeaks <-.applyRtAdjToChromPeaks( + .chromPeaks(object), rtraw = rt_raw, rtadj = rt_adj) + xph <- XProcessHistory( + param = param, type. = .PROCSTEP.RTIME.CORRECTION, + fileIndex = seq_along(object)) + object@processHistory <- c(object@processHistory, ph, list(xph)) + validObject(object) + object + }) + #' @rdname XcmsExperiment setMethod("dropAdjustedRtime", "XcmsExperiment", function(object) { if (!hasAdjustedRtime(object)) diff --git a/R/do_adjustRtime-functions.R b/R/do_adjustRtime-functions.R index 80aaf825d..c45b0940a 100644 --- a/R/do_adjustRtime-functions.R +++ b/R/do_adjustRtime-functions.R @@ -582,3 +582,382 @@ adjustRtimeSubset <- function(rtraw, rtadj, subset, } rtadj } + +########################################################### +###### LamaParama + +#' @title Landmark-based alignment: aligning a dataset against an external +#' reference +#' +#' @aliases LamaParama-class +#' +#' @description +#' Alignment is achieved using the ['adjustRtime()'] method with a `param` of +#' class `LamaParama`. This method corrects retention time by aligning +#' chromatographic data with an external reference dataset. +#' +#' Chromatographic peaks in the experimental data are first matched to +#' predefined (external) landmark features based on their mass-to-charge ratio +#' and retention time and subsequently the data is aligned by minimizing the +#' differences in retention times between the matched chromatographic peaks and +#' lamas. This adjustment is performed file by file. +#' +#' Adjustable parameters such as `ppm`, `tolerance`, and `toleranceRt` define +#' acceptable deviations during the matching process. It's crucial to note that +#' only lamas and chromatographic peaks exhibiting a one-to-one mapping are +#' considered when estimating retention time shifts. If a file has no peaks +#' matching with lamas, no adjustment will be performed, and the the retention +#' times will be returned as-is. Users can evaluate this matching, for example, +#' by checking the number of matches and ranges of the matching peaks, by first +#' running `[matchLamasChromPeaks()]`. +#' +#' Different warping methods are available; users can choose to fit a *loess* +#' (`method = "loess"`, the default) or a *gam* (`method = "gam"`) between the +#' reference data points and observed matching ChromPeaks. Additional +#' parameters such as `span`, `weight`, `outlierTolerance`, `zeroWeight`, +#' and `bs` are specific to these models. These parameters offer flexibility +#' in fine-tuning how the matching chromatographic peaks are fitted to the +#' lamas, thereby generating a model to align the overall retention time for +#' a single file. +#' +#' Other functions related to this method: +#' +#' - `LamaParama()`: return the respective parameter object for alignment +#' using `adjustRtime()` function. It is also the input for the functions +#' listed below. +#' +#' - `matchLamasChromPeaks()`: quickly matches each file's ChromPeaks +#' to Lamas, allowing the user to evaluate the matches for each file. +#' +#' - `summarizeLamaMatch()`: generates a summary of the `LamaParama` method. +#' See below for the details of the return object. +#' +#' - `matchedRtimes()`: Access the list of `data.frame` saved in the +#' `LamaParama` object, generated by the `matchLamasChromPeaks()` function. +#' +#' - `plot()`:plot the chromatographic peaks versus the reference lamas as +#' well as the fitting line for the chosen model type. The user can decide +#' what file to inspect by specifying the assay number with the parameter +#' `assay` +#' +#' +#' @param BPPARAM For `matchLamasChromPeaks()`: parallel processing setup. +#' Defaults to `BPPARAM = bpparam()`. See [bpparam()] for more information. +#' +#' @param bs For `LamaParama()`: `character(1)` defining the GAM smoothing method. +#' (defaults to thin plate, `bs = "tp"`) +#' +#' @param colPoints For `plot()`: color for the plotting of the datapoint. +#' +#' @param colFit For `plot()`: color of the fitting line. +#' +#' @param index For `plot()`: `numeric(1)` index of the file that should be +#' plotted. +#' +#' @param lamas For `LamaParama`: `matrix` or `data.frame` with the m/z and +#' retention times values of features (as first and second column) from the +#' external dataset on which the alignment will be based on. +#' +#' +#' @param method For `LamaParama`:`character(1)` with the type of warping. +#' Either `method = "gam"` or `method = "loess"` (default). +#' +#' @param object An object of class `XcmsExperiment` with defined ChromPeaks. +#' +#' @param outlierTolerance For `LamaParama`: `numeric(1)` defining the settings +#' for outlier removal during the fitting. By default +#' (with `outlierTolerance = 3`), all data points with absolute residuals +#' larger than 3 times the mean absolute residual of all data points from +#' the first, initial fit, are removed from the final model fit. +#' +#' @param param An object of class `LamaParama` that will later be used for +#' adjustment using the `[adjustRtime()]` function. +#' +#' @param ppm For `LamaParama`: `numeric(1)` defining the m/z-relative maximal +#' allowed difference in m/z between `lamas` and chromatographic peaks. Used +#' for the mapping of identified chromatographic peaks and lamas. +#' +#' @param span For `LamaParama`: `numeric(1)` defining +#' the degree of smoothing (`method = "loess"`). This parameter is passed +#' to the internal call to [loess()]. +#' +#' @param tolerance For `LamaParama`: `numeric(1)` defining the absolute +#' acceptable difference in m/z between lamas and chromatographic peaks. +#' Used for the mapping of identified chromatographic peaks and `lamas`. +#' +#' @param toleranceRt For `LamaParama`: `numeric(1)` defining the absolute +#' acceptable difference in retention time between lamas and +#' chromatographic peaks. Used for the mapping of identified chromatographic +#' peaks and `lamas`. +#' +#' @param x For `plot()`: object of class `LamaParama` to be plotted. +#' +#' @param xlab,ylab For `plot()`: x- and y-axis labels. +#' +#' @param zeroWeight For `LamaParama`: `numeric(1)`: defines the weight of the +#' first data point (i.e. retention times of the first lama-chromatographic +#' peak pair). Values larger than 1 reduce warping problems in the early RT +#' range. +#' +#' @param ... For `plot()`: extra parameters to be passed to the function. +#' +#' @return +#' For `matchLamasChromPeaks()`: A `LamaParama` object with new slot `rtMap` +#' composed of a list of matrices representing the 1:1 matches between Lamas +#' (ref) and ChromPeaks (obs). To access this, `matchedRtimes()` can be used. +#' +#' For `matchedRtimes()`: A list of `data.frame` representing matches +#' between chromPeaks and `lamas` for each files. +#' +#' For `summarizeLamaMatch()`:A `data.frame` with: +#' +#' - "Total_peaks": total number of chromatographic peaks in the file. +#' +#' - "Matched_peak": The number of matched peaks to Lamas. +#' +#' - "Total_Lamas": Total number of Lamas. +#' +#' - "Model_summary": `summary.loess` or `summary.gam` object for each file. +#' +#' @examples +#' ## load test and reference datasets +#' ref <- loadXcmsData("xmse") +#' tst <- loadXcmsData("faahko_sub2") +#' +#' ## create lamas input from the reference dataset +#' library(MsExperiment) +#' f <- sampleData(ref)$sample_type +#' f[f == "QC"] <- NA +#' ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) +#' ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] +#' +#' ## Set up the LamaParama object +#' param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5, +#' outlierTolerance = 3, zeroWeight = 10, ppm = 20, +#' tolerance = 0, toleranceRt = 20, bs = "tp") +#' +#' ## input into `adjustRtime()` +#' tst_adjusted <- adjustRtime(tst, param = param) +#' +#' ## run diagnostic functions to pre-evaluate alignment +#' param <- matchLamasChromPeaks(tst, param = param) +#' mtch <- matchedRtimes(param) +#' +#' ## Access summary of matches and model information +#' summary <- summarizeLamaMatch(param) +#' +#' ##coverage for each file +#' summary$Matched_peaks / summary$Total_peaks * 100 +#' +#' ## Access the information on the model of for the first file +#' summary$model_summary[[1]] +#' +#' @note +#' If there are no matches when using `matchLamasChromPeaks()`, the file +#' retention will not be adjusted when calling [adjustRtime()] with the same +#' `LamaParama` and `XcmsExperiment` object. +#' +#' To see examples on how to utilize this methods and its functionality, +#' see the vignette. +#' +#' @author Carl Brunius, Philippine Louail +#' +#' @name LamaParama +NULL + +#' @description +#' +#' Match anchor (reference) peaks to chrompeaks based on rt and m/z. Peaks with +#' multiple matches are excluded. +#' +#' @param obs_peaks `matrix` of 2 columns with the m/z and retention times of +#' chrompeaks of one data file. +#' +#' @param ref_anchors `matrix` of (external) reference anchor peaks with +#' columns representing the anchor peaks' m/z and retention times (in that +#' order!). Possibly generated by the `.getAnchorePeaks()` function. +#' +#' @param ppm maximal acceptable (m/z relative) difference in m/z values for +#' peaks to be considered matching. +#' +#' @param tolerance maximal absolute difference in m/z values for peaks to be +#' considered matching. +#' +#' @param toleranceRt maximl absolute difference in retention times for peaks +#' to be considered matching. +#' +#' @return a `data.frame` with columns `"ref"` and `"obs"` with the retention +#' times of the pairs of matched peaks. This `data.frame` can be used +#' in `.adjust_rt_model`'s parameter `rt_raw`. +#' +#' @author Johannes Rainer, Philippine Louail +#' +#' @importFrom MetaboCoreUtils mclosest +#' +#' @noRd +.match_reference_anchors <- function(obs_peaks, ref_anchors, ppm = 20, + tolerance = 0, toleranceRt = 5) { + idx <- mclosest(obs_peaks, ref_anchors, + ppm = c(ppm, 0), tolerance = c(tolerance, toleranceRt)) + nna <- !is.na(idx) + idx <- cbind(obs = which(nna), ref = idx[nna]) + dups <- idx[duplicated(idx[, 2L]), 2L] + idx <- idx[!idx[, 2L] %in% dups, , drop = FALSE] + data.frame(ref = ref_anchors[idx[, 2L], 2L], + obs = obs_peaks[idx[, 1L], 2L]) +} + +#' @description +#' +#' Compute a model representing the relationship between observed and reference +#' retention times (`rt_map`) and adjust the raw retention times (`rt_raw`) +#' based on this. +#' +#' @param rt_map `data.frame` with *reference* retention times of LaMas and +#' *observed* retention times of matching peaks in the same sample from +#' which the retention times in `rt_raw` are. +#' +#' @author Carl Brunius, Philippine Louail +#' +#' @importFrom stats predict +#' +#' @importFrom MsCoreUtils force_sorted +#' +#' @noRd +.adjust_rt_model <- function(rt_raw, + method = c("loess", "gam"), + rt_map, + span = 0.5, + resid_ratio = 3, + zero_weight = 10, + bs = "tp") { + model <- .rt_model(method = method, + rt_map, span = span, + resid_ratio = resid_ratio, + zero_weight = zero_weight, + bs = bs) + adj <- predict(model, newdata = data.frame(obs = rt_raw)) + if (is.unsorted(adj, na.rm = TRUE)){ + warning("Adjusted retention times are not sorted, linear ", + "interpolation will be performed for the unsorted data points") + adj <- force_sorted(adj) + } + idx <- which(rt_raw < min(rt_map$obs)) + lidx <- length(idx) + if (lidx) + adj[idx] <- rt_raw[idx] - (rt_raw[lidx + 1L] - adj[lidx + 1L]) + idx <- which(rt_raw > max(rt_map$obs)) + if (length(idx)) + adj[idx] <- rt_raw[idx] - (rt_raw[idx[1L] - 1L] - adj[idx[1L] - 1L]) + adj +} + +#' @description +#' +#' Get a model representing the differences between observed and reference +#' retention times (parameter `rt_map`). After an initial fit, the model is +#' re-fitted excluding potential outliers. +#' +#' @param rt_map `data.frame` with the observed (column `"obs"`) and reference +#' (column `"ref"`) retention time pairs. +#' +#' @importFrom stats loess resid +#' +#' @author Carl Brunius, Philippine Louail +#' +#' @noRd +.rt_model <- function(method = c("loess", "gam"), + rt_map, span = 0.5, + resid_ratio = 3, + zero_weight = 10, + bs = "tp"){ + rt_map <- rt_map[order(rt_map$obs), ] + # add first row of c(0,0) to set a fix timepoint. + rt_map <- rbind(c(0,0), rt_map) + weights <- rep(1, nrow(rt_map)) + weights[1L] <- zero_weight + + if (method == "gam") { + .check_gam_library() + model <- mgcv::gam(ref ~ s(obs, bs = bs), weights = weights, + data = rt_map) + } else + model <- loess(ref ~ obs, data = rt_map, span = span, + weights = weights) + ## compute outliers + SSq <- resid(model)^2 + meanSSq <- mean(SSq) + not_outlier <- (SSq / meanSSq) < resid_ratio + + ## re-run only if there is outliers and keep the zero. + if (any(!not_outlier)){ + not_outlier[1] <- TRUE + rt_map <- rt_map[not_outlier, , drop = FALSE] + weights <- weights[not_outlier] + if (method == "gam") { + model <- mgcv::gam(ref ~ s(obs, bs = "tp"), weights = weights, + data = rt_map) + } else { + model <- loess(ref ~ obs, data = rt_map, span = span, + weights = weights) + } + } + model +} + +#' Simple helper to ensure gam package is installed - if needed. +#' +#' @noRd +.check_gam_library <- function() { + if (!requireNamespace("mgcv", quietly = TRUE)) + stop("'method = \"gam\"' requires the package 'mgcv'. Please ", + "install with 'BiocInstaller::install(\"mgcv\")'") +} + +#' @export +#' @rdname LamaParama +matchLamasChromPeaks <- function(object, param, BPPARAM = bpparam()){ + if (!hasChromPeaks(object)) + stop("'object' needs to have detected ChromPeaks. ", + "Run 'findChromPeaks()' first.") + f <- factor(chromPeaks(object)[, "sample"], levels = seq_along(object)) + cp_raw <- split.data.frame(chromPeaks(object)[, c("mz", "rt")], f) + param@nChromPeaks <- vapply(cp_raw, nrow, numeric(1)) + param@rtMap <- bplapply(cp_raw, FUN = function(x) { + .match_reference_anchors(obs_peaks = x, ref_anchors = param@lamas, + ppm = param@ppm, tolerance = param@tolerance, + toleranceRt = param@toleranceRt)}, + BPPARAM = BPPARAM) + param +} + +#' @export +#' @rdname LamaParama +summarizeLamaMatch <- function(param){ + if (!inherits(param, "LamaParama")) + stop("The input needs to be of class 'LamaParama'") + if (length(param@nChromPeaks) == 0 || length(param@rtMap) == 0) + stop("Summary inputs are missing. Please run `matchLamasChromPeaks` ", + "first.") + res <- data.frame(Total_peaks = param@nChromPeaks, + Matched_peaks = vapply(param@rtMap, nrow, numeric(1)), + Total_lamas = nrow(param@lamas)) + res_model <- lapply(param@rtMap, function(x){ + s <- summary(.rt_model(method = param@method, + rt_map= x, span = param@span, + resid_ratio = param@outlierTolerance, + zero_weight = param@zeroWeight, + bs = param@bs)) + }) + res$Model_summary <- res_model + res +} + +#' @export +#' @rdname LamaParama +matchedRtimes <- function(param){ + if(!inherits(param, "LamaParama")) + stop("The inputs need to be of class 'LamaParama'") + rtMap <- param@rtMap + rtMap +} diff --git a/R/functions-Params.R b/R/functions-Params.R index 71c05f89d..104702ecd 100644 --- a/R/functions-Params.R +++ b/R/functions-Params.R @@ -57,6 +57,8 @@ return("nearest peaks") if (is(x, "PeakGroupsParam")) return("peak groups") + if (is(x, "LamaParama")) + return("lama") if (is(x, "ObiwarpParam")) return("obiwarp") return("unknown") @@ -272,6 +274,37 @@ PeakGroupsParam <- function(minFraction = 0.9, extraPeaks = 1, subset = as.integer(subset), subsetAdjust = subsetAdjust) } +#' @rdname LamaParama +LamaParama <- function(lamas = matrix(ncol = 2, nrow = 0, + dimnames = list(NULL, c("mz", "rt"))), + method = c("loess", "gam"), + span = 0.5, + outlierTolerance = 3, + zeroWeight = 10, + ppm = 20, + tolerance = 0, + toleranceRt = 5, + bs = "tp") { + method <- match.arg(method) + if (method == "gam") + .check_gam_library() + if (is.data.frame(lamas)) + lamas <- as.matrix(lamas) + if (ncol(lamas) != 2) + stop("the 'lamas' matrix needs to have two columns, composed of m/z, ", + "and retention time of the peaks from the reference dataset, in this ", + "order") + new("LamaParama", lamas = lamas, + method = method, + span = span, + outlierTolerance = outlierTolerance, + zeroWeight = zeroWeight, + ppm = ppm, + tolerance = tolerance, + toleranceRt = toleranceRt, + bs = bs) +} + #' @rdname adjustRtime ObiwarpParam <- function(binSize = 1, centerSample = integer(), response = 1L, distFun = "cor_opt", gapInit = numeric(), diff --git a/R/methods-Params.R b/R/methods-Params.R index d07c07c88..e08ffda42 100644 --- a/R/methods-Params.R +++ b/R/methods-Params.R @@ -1249,6 +1249,26 @@ setReplaceMethod("subsetAdjust", "PeakGroupsParam", function(object, value) { return(object) }) +############################################################ +## LamaParama + +#' @rdname LamaParama +setMethod("plot", signature(x = "LamaParama"), + function(x, index = 1L, + colPoints = "#00000060", + colFit = "#00000080", + xlab = "Matched Chromatographic peaks", + ylab = "Lamas",...){ + model <- .rt_model(method = x@method, + rt_map= x@rtMap[[index]], span = x@span, + resid_ratio = x@outlierTolerance, + zero_weight = x@zeroWeight, + bs = x@bs) + datap <- x@rtMap[[index]] + plot(datap, type = "p", xlab = xlab, ylab = ylab, col = colPoints, ...) + points(model, type = "l", col = colFit) +}) + ############################################################ ## ObiwarpParam diff --git a/inst/NEWS b/inst/NEWS index dc0e6e1ae..77bd2ba17 100644 --- a/inst/NEWS +++ b/inst/NEWS @@ -1,3 +1,13 @@ +Changes in version 4.1.12 +---------------------- + +- Implementation of the `LamaParama` class and method for the `adjustRtime()` + function. Allowing alignment of a dataset based on landmarks (lamas) from an + external reference dataset. +- Implementation of related user-level function `matchLamasChromPeaks()`, + `summarizeMatchLama()` and `plot(LamaParama)` which allows for evaluation of + matching between lamas and chromPeaks. + Changes in version 4.1.11 ---------------------- diff --git a/man/LamaParama.Rd b/man/LamaParama.Rd new file mode 100644 index 000000000..1d5e7904e --- /dev/null +++ b/man/LamaParama.Rd @@ -0,0 +1,216 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/XcmsExperiment.R, R/do_adjustRtime-functions.R, +% R/functions-Params.R, R/methods-Params.R +\name{adjustRtime,XcmsExperiment,LamaParama-method} +\alias{adjustRtime,XcmsExperiment,LamaParama-method} +\alias{LamaParama} +\alias{LamaParama-class} +\alias{matchLamasChromPeaks} +\alias{summarizeLamaMatch} +\alias{matchedRtimes} +\alias{plot,LamaParama,ANY-method} +\title{Landmark-based alignment: aligning a dataset against an external +reference} +\usage{ +\S4method{adjustRtime}{XcmsExperiment,LamaParama}(object, param, BPPARAM = bpparam(), ...) + +matchLamasChromPeaks(object, param, BPPARAM = bpparam()) + +summarizeLamaMatch(param) + +matchedRtimes(param) + +LamaParama( + lamas = matrix(ncol = 2, nrow = 0, dimnames = list(NULL, c("mz", "rt"))), + method = c("loess", "gam"), + span = 0.5, + outlierTolerance = 3, + zeroWeight = 10, + ppm = 20, + tolerance = 0, + toleranceRt = 5, + bs = "tp" +) + +\S4method{plot}{LamaParama,ANY}( + x, + index = 1L, + colPoints = "#00000060", + colFit = "#00000080", + xlab = "Matched Chromatographic peaks", + ylab = "Lamas", + ... +) +} +\arguments{ +\item{object}{An object of class `XcmsExperiment` with defined ChromPeaks.} + +\item{param}{An object of class `LamaParama` that will later be used for +adjustment using the `[adjustRtime()]` function.} + +\item{BPPARAM}{For `matchLamasChromPeaks()`: parallel processing setup. +Defaults to `BPPARAM = bpparam()`. See [bpparam()] for more information.} + +\item{...}{For `plot()`: extra parameters to be passed to the function.} + +\item{lamas}{For `LamaParama`: `matrix` or `data.frame` with the m/z and +retention times values of features (as first and second column) from the +external dataset on which the alignment will be based on.} + +\item{method}{For `LamaParama`:`character(1)` with the type of warping. +Either `method = "gam"` or `method = "loess"` (default).} + +\item{span}{For `LamaParama`: `numeric(1)` defining +the degree of smoothing (`method = "loess"`). This parameter is passed +to the internal call to [loess()].} + +\item{outlierTolerance}{For `LamaParama`: `numeric(1)` defining the settings +for outlier removal during the fitting. By default +(with `outlierTolerance = 3`), all data points with absolute residuals +larger than 3 times the mean absolute residual of all data points from +the first, initial fit, are removed from the final model fit.} + +\item{zeroWeight}{For `LamaParama`: `numeric(1)`: defines the weight of the +first data point (i.e. retention times of the first lama-chromatographic +peak pair). Values larger than 1 reduce warping problems in the early RT +range.} + +\item{ppm}{For `LamaParama`: `numeric(1)` defining the m/z-relative maximal +allowed difference in m/z between `lamas` and chromatographic peaks. Used +for the mapping of identified chromatographic peaks and lamas.} + +\item{tolerance}{For `LamaParama`: `numeric(1)` defining the absolute +acceptable difference in m/z between lamas and chromatographic peaks. +Used for the mapping of identified chromatographic peaks and `lamas`.} + +\item{toleranceRt}{For `LamaParama`: `numeric(1)` defining the absolute +acceptable difference in retention time between lamas and +chromatographic peaks. Used for the mapping of identified chromatographic +peaks and `lamas`.} + +\item{bs}{For `LamaParama()`: `character(1)` defining the GAM smoothing method. +(defaults to thin plate, `bs = "tp"`)} + +\item{x}{For `plot()`: object of class `LamaParama` to be plotted.} + +\item{index}{For `plot()`: `numeric(1)` index of the file that should be +plotted.} + +\item{colPoints}{For `plot()`: color for the plotting of the datapoint.} + +\item{colFit}{For `plot()`: color of the fitting line.} + +\item{xlab, ylab}{For `plot()`: x- and y-axis labels.} +} +\value{ +For `matchLamasChromPeaks()`: A `LamaParama` object with new slot `rtMap` +composed of a list of matrices representing the 1:1 matches between Lamas +(ref) and ChromPeaks (obs). To access this, `matchedRtimes()` can be used. + +For `matchedRtimes()`: A list of `data.frame` representing matches +between chromPeaks and `lamas` for each files. + +For `summarizeLamaMatch()`:A `data.frame` with: + +- "Total_peaks": total number of chromatographic peaks in the file. + +- "Matched_peak": The number of matched peaks to Lamas. + +- "Total_Lamas": Total number of Lamas. + +- "Model_summary": `summary.loess` or `summary.gam` object for each file. +} +\description{ +Alignment is achieved using the ['adjustRtime()'] method with a `param` of +class `LamaParama`. This method corrects retention time by aligning +chromatographic data with an external reference dataset. + +Chromatographic peaks in the experimental data are first matched to +predefined (external) landmark features based on their mass-to-charge ratio +and retention time and subsequently the data is aligned by minimizing the +differences in retention times between the matched chromatographic peaks and +lamas. This adjustment is performed file by file. + +Adjustable parameters such as `ppm`, `tolerance`, and `toleranceRt` define +acceptable deviations during the matching process. It's crucial to note that +only lamas and chromatographic peaks exhibiting a one-to-one mapping are +considered when estimating retention time shifts. If a file has no peaks +matching with lamas, no adjustment will be performed, and the the retention +times will be returned as-is. Users can evaluate this matching, for example, +by checking the number of matches and ranges of the matching peaks, by first +running `[matchLamasChromPeaks()]`. + +Different warping methods are available; users can choose to fit a *loess* +(`method = "loess"`, the default) or a *gam* (`method = "gam"`) between the +reference data points and observed matching ChromPeaks. Additional +parameters such as `span`, `weight`, `outlierTolerance`, `zeroWeight`, +and `bs` are specific to these models. These parameters offer flexibility +in fine-tuning how the matching chromatographic peaks are fitted to the +lamas, thereby generating a model to align the overall retention time for +a single file. + +Other functions related to this method: + + - `LamaParama()`: return the respective parameter object for alignment + using `adjustRtime()` function. It is also the input for the functions + listed below. + + - `matchLamasChromPeaks()`: quickly matches each file's ChromPeaks + to Lamas, allowing the user to evaluate the matches for each file. + + - `summarizeLamaMatch()`: generates a summary of the `LamaParama` method. + See below for the details of the return object. + + - `matchedRtimes()`: Access the list of `data.frame` saved in the + `LamaParama` object, generated by the `matchLamasChromPeaks()` function. + + - `plot()`:plot the chromatographic peaks versus the reference lamas as + well as the fitting line for the chosen model type. The user can decide + what file to inspect by specifying the assay number with the parameter + `assay` +} +\note{ +If there are no matches when using `matchLamasChromPeaks()`, the file +retention will not be adjusted when calling [adjustRtime()] with the same +`LamaParama` and `XcmsExperiment` object. + +To see examples on how to utilize this methods and its functionality, +see the vignette. +} +\examples{ +## load test and reference datasets +ref <- loadXcmsData("xmse") +tst <- loadXcmsData("faahko_sub2") + +## create lamas input from the reference dataset +library(MsExperiment) +f <- sampleData(ref)$sample_type +f[f == "QC"] <- NA +ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) +ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] + +## Set up the LamaParama object +param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5, + outlierTolerance = 3, zeroWeight = 10, ppm = 20, + tolerance = 0, toleranceRt = 20, bs = "tp") + +## input into `adjustRtime()` +tst_adjusted <- adjustRtime(tst, param = param) + +## run diagnostic functions to pre-evaluate alignment +param <- matchLamasChromPeaks(tst, param = param) +mtch <- matchedRtimes(param) + +## Access summary of matches and model information +summary <- summarizeLamaMatch(param) + +##coverage for each file +summary$Matched_peaks / summary$Total_peaks * 100 + +## Access the information on the model of for the first file +summary$model_summary[[1]] + +} +\author{ +Carl Brunius, Philippine Louail +} diff --git a/man/adjustRtime.Rd b/man/adjustRtime.Rd index 70b8182a0..884a3778a 100644 --- a/man/adjustRtime.Rd +++ b/man/adjustRtime.Rd @@ -259,9 +259,9 @@ sample that are assigned to the group.} be used to interpolate corrected retention times for all peak groups. Can be either \code{"loess"} or \code{"linear"}.} -\item{span}{For \code{PeakGroupsParam}: \code{numeric(1)} defining the degree of -smoothing (if \code{smooth = "loess"}). This parameter is passed to the -internal call to \code{\link[=loess]{loess()}}.} +\item{span}{For \code{PeakGroupsParam}: \code{numeric(1)} defining +the degree of smoothing (if \code{smooth = "loess"}). This parameter is +passed to the internal call to \code{\link[=loess]{loess()}}.} \item{family}{For \code{PeakGroupsParam}: \code{character(1)} defining the method for loess smoothing. Allowed values are \code{"gaussian"} and \code{"symmetric"}. See @@ -331,7 +331,7 @@ initiating an alignment (for local alignment only).} \item{value}{The value for the slot.} -\item{x}{An \code{ObiwarpParam} or \code{PeakGroupsParam} object.} +\item{x}{An \code{ObiwarpParam}, \code{PeakGroupsParam} or \code{LamaParama} object.} } \value{ \code{adjustRtime} on an \code{OnDiskMSnExp} or \code{XCMSnExp} object will return an @@ -341,21 +341,22 @@ initiating an alignment (for local alignment only).} \code{XcmsExperiment} with the adjusted retention times stored in an new \emph{spectra variable} \code{rtime_adjusted} in the object's \code{spectra}. -\code{ObiwarpParam} and \code{PeakGroupsParam} return the respective parameter object. +\code{ObiwarpParam}, \code{PeakGroupsParam} and \code{LamaParama} return the respective +parameter object. \code{adjustRtimeGroups} returns a \code{matrix} with the retention times of \emph{marker} features in each sample (each row one feature, each row one sample). } \description{ The \code{adjustRtime} method(s) perform retention time correction (alignment) -between chromatograms of different samples. Alignment is performed by defaul -on MS level 1 data. Retention times of spectra from other MS levels, if -present, are subsequently adjusted based on the adjusted retention times -of the MS1 spectra. Note that calling \code{adjustRtime} on a \emph{xcms} result object -will remove any eventually present previous alignment results as well as -any correspondence analysis results. To run a second round of alignment, -raw retention times need to be replaced with adjusted ones using the -\code{\link[=applyAdjustedRtime]{applyAdjustedRtime()}} function. +between chromatograms of different samples/dataset. Alignment is performed +by default on MS level 1 data. Retention times of spectra from other MS +levels, if present, are subsequently adjusted based on the adjusted +retention times of the MS1 spectra. Note that calling \code{adjustRtime} on a +\emph{xcms} result object will remove any eventually present previous alignment +results as well as any correspondence analysis results. To run a second +round of alignment, raw retention times need to be replaced with adjusted +ones using the \code{\link[=applyAdjustedRtime]{applyAdjustedRtime()}} function. The alignment method can be specified (and configured) using a dedicated \code{param} argument. @@ -368,7 +369,7 @@ rt data using the \emph{obiwarp} method (Prince (2006)). It is based on the alignment of multiple samples by aligning each against a \emph{center} sample. The alignment is performed directly on the \link{profile-matrix} and can hence be performed independently of the peak detection or peak grouping. -\item \code{PeakGroupsParam}: performs retention time correctoin based on the +\item \code{PeakGroupsParam}: performs retention time correction based on the alignment of features defined in all/most samples (corresponding to \emph{house keeping compounds} or marker compounds) (Smith 2006). First the retention time deviation of these features is described by fitting either a @@ -387,6 +388,14 @@ function is used to define this \code{matrix}. This function identifies peak groups (features) for alignment in \code{object} based on the parameters defined in \code{param}. See also \code{\link[=do_adjustRtime_peakGroups]{do_adjustRtime_peakGroups()}} for the core API function. +\item \code{LamaParama}: This function performs retention time correction by aligning +chromatographic data to an external reference dataset (concept and initial +implementation by Carl Brunius). The process involves identifying and +aligning peaks within the experimental chromatographic data, represented +as an \code{XcmsExperiment} object, to a predefined set of landmark features +called "lamas". These landmark features are characterized by their +mass-to-charge ratio (m/z) and retention time. see \code{\link[=LamaParama]{LamaParama()}} for more +information on the method. } } \section{Subset-based alignment}{ @@ -429,6 +438,6 @@ Nonlinear Peak Alignment, Matching, and Identification" \emph{Anal. Chem.} \code{\link[=plotAdjustedRtime]{plotAdjustedRtime()}} for visualization of alignment results. } \author{ -Colin Smith, Johannes Rainer +Colin Smith, Johannes Rainer, Philippine Louail, Carl Brunius } \concept{retention time correction methods} diff --git a/man/do_adjustRtime_peakGroups.Rd b/man/do_adjustRtime_peakGroups.Rd index 28e4297c2..4d0d898ea 100644 --- a/man/do_adjustRtime_peakGroups.Rd +++ b/man/do_adjustRtime_peakGroups.Rd @@ -52,9 +52,9 @@ sample that are assigned to the group.} be used to interpolate corrected retention times for all peak groups. Can be either \code{"loess"} or \code{"linear"}.} -\item{span}{For \code{PeakGroupsParam}: \code{numeric(1)} defining the degree of -smoothing (if \code{smooth = "loess"}). This parameter is passed to the -internal call to \code{\link[=loess]{loess()}}.} +\item{span}{For \code{PeakGroupsParam}: \code{numeric(1)} defining +the degree of smoothing (if \code{smooth = "loess"}). This parameter is +passed to the internal call to \code{\link[=loess]{loess()}}.} \item{family}{For \code{PeakGroupsParam}: \code{character(1)} defining the method for loess smoothing. Allowed values are \code{"gaussian"} and \code{"symmetric"}. See diff --git a/tests/testthat.R b/tests/testthat.R index 0d93e1900..a3f2a3018 100644 --- a/tests/testthat.R +++ b/tests/testthat.R @@ -76,6 +76,14 @@ pdp <- PeakDensityParam(sampleGroups = rep(1, 3)) xmseg <- groupChromPeaks(xmse, param = pdp, add = FALSE) expect_true(length(processHistory(xmseg)) == 2L) +## Data for LamaParama checks +ref <- loadXcmsData("xmse") +f <- sampleData(ref)$sample_type +f[f == "QC"] <- NA +ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) +ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] +tst <- loadXcmsData("faahko_sub2") + test_check("xcms") bpstop(prm) diff --git a/tests/testthat/test_XcmsExperiment.R b/tests/testthat/test_XcmsExperiment.R index f2b0e6303..d538a7623 100644 --- a/tests/testthat/test_XcmsExperiment.R +++ b/tests/testthat/test_XcmsExperiment.R @@ -506,6 +506,16 @@ test_that("adjustRtime,MsExperiment,PeakGroupsParam works", { expect_true(length(processHistory(res_3)) == 1L) }) + +test_that("LamaParama works", { + expect_no_error(LamaParama(lamas = ref_mz_rt)) + expect_error(LamaParama(), "cannot be empty") + param <- LamaParama(lamas = ref_mz_rt) + expect_equal(ncol(param@lamas), 2) + expect_true(inherits(param@lamas, "matrix")) + expect_equal(length(param@method), 1) +}) + test_that("findChromPeaks,XcmsExperiment,MatchedFilterParam works", { mfp <- MatchedFilterParam(binSize = 20, impute = "lin") ref <- findChromPeaks(faahko_od, param = mfp) @@ -1001,7 +1011,7 @@ test_that("manualChromPeaks,XcmsExperiment works", { res2 <- manualChromPeaks(tmp, pks, samples = 2) expect_equal(unname(chromPeaks(res2)), unname(pks_2)) - + }) test_that("filterChromPeaks,XcmsExperiment works", { diff --git a/tests/testthat/test_do_adjustRtime-functions.R b/tests/testthat/test_do_adjustRtime-functions.R index e8a3e6b02..1f90ec845 100644 --- a/tests/testthat/test_do_adjustRtime-functions.R +++ b/tests/testthat/test_do_adjustRtime-functions.R @@ -265,3 +265,117 @@ test_that(".getPeakGroupsRtMatrix works with subsets", { extraPeaks = 1L) expect_true(!any(is.na(res))) }) + +test_that(".rt_model works", { + obs <- ref_mz_rt + rt_m <- data.frame(ref = obs[, "rtmed"], + obs = obs[, "rtmed"] + 2 + rnorm(nrow(obs), 0, 0.03)) + res <- .rt_model(rt_map = rt_m, method = "loess") + expect_true(is(res, "loess")) + expect_true(length(res$residuals) < nrow(obs)) + ## skip outlier removal + res <- .rt_model(rt_map = rt_m, method = "loess", resid_ratio = 100) + expect_true(is(res, "loess")) + expect_equal(length(res$residuals), nrow(obs) + 1) # for the c(0,0) extra row + + ## gam + res <- .rt_model(rt_map = rt_m, method = "gam") + expect_true(is(res, "gam")) + expect_true(length(res$residuals) < nrow(obs)) + ## skip outlier removal + res <- .rt_model(rt_map = rt_m, method = "gam", resid_ratio = 100) + expect_true(is(res, "gam")) + expect_equal(length(res$residuals), nrow(obs) + 1) +}) + +test_that(".match_reference_anchors works", { + a <- cbind(mz = c(200.1, 200.1, 200.1, 232.1, 233.1, 233.2), + rt = c(100, 150.1, 190, 190, 190, 192)) + b <- cbind(mz = c(200.2, 232, 233.1, 234), + rt = c(150, 190.4, 193, 240)) + + res <- .match_reference_anchors(a, b) + expect_true(is.data.frame(res)) + expect_equal(colnames(res), c("ref", "obs")) + expect_true(nrow(res) == 1L) + expect_equal(res$ref, 193.0) + expect_equal(res$obs, 190.0) + + ## no matches: + res <- .match_reference_anchors(a, b, tolerance = 0, toleranceRt = 0) + expect_true(is.data.frame(res)) + expect_equal(colnames(res), c("ref", "obs")) + expect_true(nrow(res) == 0L) + + ## skip multiple matches: rows 1 and 2 from `a` match row 1 from `b` and + ## rows 5 and 6 from `a` match row 3 from `b` + res <- .match_reference_anchors(a, b, tolerance = 0.1, toleranceRt = 52) + expect_true(is.data.frame(res)) + expect_equal(colnames(res), c("ref", "obs")) + expect_equal(res$ref, 190.4) + expect_equal(res$obs, 190.0) + + ## little relaxed matching: should match row 2 in `a` with row 1 in `b` and + ## row 4 in `a` with row 2 in `b`. rows 5 and 6 in `a` match both row 3 in + ## `b` and should thus not be reported. + res <- .match_reference_anchors(a, b, tolerance = 0.1, toleranceRt = 5) + expect_true(is.data.frame(res)) + expect_equal(colnames(res), c("ref", "obs")) + expect_equal(res$ref, c(150, 190.4)) + expect_equal(res$obs, c(150.1, 190.0)) + + ## Same but reducing toleranceRt to have also a match between row 6 in `a` + ## with row 3 in `b`. + res <- .match_reference_anchors(a, b, tolerance = 0.1, toleranceRt = 2) + expect_true(is.data.frame(res)) + expect_equal(colnames(res), c("ref", "obs")) + expect_equal(res$ref, c(150, 190.4, 193.0)) + expect_equal(res$obs, c(150.1, 190.0, 192.0)) +}) + +test_that(".adjust_rt_model works", { + ref_anchors <- ref_mz_rt + ## Filter the test data to the same rt range than the reference data + tst <- filterRt(tst, rt = c(2550, 4250)) + obs <- chromPeaks(tst[2]) + + rt_map <- .match_reference_anchors(obs[, c("mz", "rt")], ref_anchors, + tolerance = 0.01, + toleranceRt = 5) + + rt_raw <- rtime(tst[2]) + rt_adj <- .adjust_rt_model(rt_raw, method = "loess", rt_map = rt_map) + expect_equal(length(rt_raw), length(rt_adj)) + expect_true(!any(is.na(rt_adj))) + ## Adjusted retention times should be closer to the ones in the preprocessed + ## data set + rt_ref <- rtime(ref[2, keepAdjustedRtime = TRUE]) + expect_true(mean(abs(rt_adj - rt_ref)) < mean(abs(rt_raw - rt_ref))) +}) + +test_that("matchLamasChromPeaks works", { + param <- LamaParama(lamas = ref_mz_rt) + expect_equal(param@rtMap, list()) + param <- matchLamasChromPeaks(tst, param) + expect_true(inherits(param, "LamaParama")) + expect_equal(length(param@rtMap), length(tst)) + expect_equal(length(param@nChromPeaks), length(tst)) +}) + +test_that("summarizeLamaMatch works", { + param <- LamaParama(lamas = ref_mz_rt, toleranceRt = 10) + expect_error(summarizeLamaMatch(param), "missing") + param <- matchLamasChromPeaks(tst, param) + res <- summarizeLamaMatch(param) + expect_equal(nrow(res), length(tst)) + expect_equal(ncol(res), 4) + expect_true(inherits(res$Model_summary[[1]], "summary.loess")) +}) + +test_that("Accessing rtMap from LamaParama object works", { + param <- LamaParama(lamas = ref_mz_rt, toleranceRt = 10) + param <- matchLamasChromPeaks(tst, param) + expect_error(matchedRtimes(ObiwarpParam()), "class") + mtch <- matchedRtimes(param) + expect_equal(length(mtch), length(param@rtMap)) +}) diff --git a/vignettes/xcms.Rmd b/vignettes/xcms.Rmd index 1e1fa101a..924e1b870 100644 --- a/vignettes/xcms.Rmd +++ b/vignettes/xcms.Rmd @@ -136,7 +136,6 @@ performance. See the package vignette from the `r Biocpkg("Spectra")` package or the [SpectraTutorials](https://jorainer.github.io/SpectraTutorials) tutorial for more details on `Spectra` backends and how to change between them. - ## Initial data inspection The `MsExperiment` object is a simple and flexible container for MS @@ -1285,6 +1284,138 @@ identification of e.g. features with significant different intensities/abundances it is suggested to use functionality provided in other R packages, such as Bioconductor's excellent `limma` package. +## Alignment to an external reference dataset + +In certain experiments, aligning two different datasets is necessary. This +can involve comparing runs of the same samples conducted across different +laboratories or runs with MS2 recorded after the initial MS1 run. Across +laboratories and over time, the same samples may result in variation in +retention time, especially because the LC system can be quite unstable. In these +cases, an alignment step using the `adjustRtime()` function with the +`LamaParam` parameter can allow the user to perform this type of alignment. +We will go through this step by step below. + +Let's load an already analyzed dataset `ref` and our previous dataset before +alignment, which will be `tst`. We will first restrict their retention time +range to be the same for both dataset. + +```{r} +ref <- loadXcmsData("xmse") +tst <- loadXcmsData("faahko_sub2") +``` + +Now, we will attempt to align these two samples with the previous dataset. The +first step is to extract landmark features (referred to as `lamas`). To achieve +this, we will identify the features present in every QC sample of the `ref` +dataset. To do so, we will categorize (using `factor()`) our data by +`sample_type` and only retain the QC samples. This variable will be utilized to +filter the features using the `PercentMissingFilter()` parameter within the +`filterFeatures()` function (see section above for more information on this +method) + +```{r} +f <- sampleData(ref)$sample_type +f[f != "QC"] <- NA +ref <- filterFeatures(ref, PercentMissingFilter(threshold = 0, f = f)) +ref_mz_rt <- featureDefinitions(ref)[, c("mzmed","rtmed")] +ref_mz_rt +``` + +This is what the `lamas` input should look like for alignment. In terms of +how this method works, the alignment algorithm matches chromatographic peaks +from the experimental data to the lamas, fitting a model based on this +match to adjust their retention times and minimize differences between +the two datasets. + +Now we can define our `param` object `LamaParama` to prepare for the +alignment. Parameters such as `tolerance`, `toleranceRt`, and `ppm` relate +to the matching between chromatographic peaks and lamas. Other parameters +are related to the type of fitting generated between these data points. +More details on each parameter and the overall method can be found by +searching `?adjustRtime`. Below is an example using default parameters. + +```{r} +param <- LamaParama(lamas = ref_mz_rt, method = "loess", span = 0.5, + outlierTolerance = 3, zeroWeight = 10, ppm = 20, + tolerance = 0, toleranceRt = 20, bs = "tp") + +#' input into `adjustRtime()` +tst_adjusted <- adjustRtime(tst, param = param) +tst_adjusted <- applyAdjustedRtime(tst_adjusted) +``` + +We extract the base peak chromatogram (BPC) to visualize and evaluate the +alignment: + +```{r fig.height=12, fig.width=5} +#' evaluate the results with BPC +bpc <- chromatogram(ref, chromPeaks = "none") +bpc_tst_raw <- chromatogram(tst, chromPeaks = "none") +bpc_tst_adj <- chromatogram(tst_adjusted, chromPeaks = "none") +``` + +We generate plots to visually compare the alignment to the reference +dataset (black) both before (red) and after (blue) adjustment: + +```{r fig.height=4, fig.width=10} +#' BPC of a sample +par(mfrow = c(1, 2), mar = c(4, 2.5, 1, 0.5)) +plot(bpc[1, 1], col = "#00000080", main = "Before Alignment") +points(rtime(bpc_tst_raw[1, 1]), intensity(bpc_tst_raw[1, 1]), type = "l", + col = "#ff000080") +grid() + +plot(bpc[1, 1], col = "#00000080", main = "After Alignment") +points(rtime(bpc_tst_adj[1, 1]), intensity(bpc_tst_adj[1, 1]), type = "l", + col = "#0000ff80") +grid() +``` + +It appears that certain time intervals (2500 to 3000 and 3500 to 4500 seconds) +exhibit better alignment than others. This variance can be elucidated by +examining the distribution of matched peaks, as illustrated below. The +`matchLamaChromPeaks()` function facilitates the assessment of how well the +`lamas` correspond with the chromatographic peaks in each file. This analysis +can be conducted prior to any adjustments. + +```{r} +param <- matchLamasChromPeaks(tst, param = param) +mtch <- matchedRtimes(param) + +#' BPC of the first sample with matches to lamas overlay +par(mfrow = c(1, 1)) +plot(bpc[1, 1], col = "#00000080", main = "Distribution CP matched to Lamas") +points(rtime(bpc_tst_adj[1, 1]), intensity(bpc_tst_adj[1, 1]), type = "l", + col = "#0000ff80") +grid() +abline(v = mtch[[1]]$obs) +``` + +The overlay of BPC above provides insight into the correlation between accurate +alignment and the presence of peaks matching with `lamas.` Furthermore, a more +detailed examination of the matching and the model used for fitting each file +is possible. Numerical information can be obtained using the +`summarizeLamaMatch()` function. From this, the percentage of chromatographic +peaks utilized for alignment can be computed relative to the total number of +peaks in the file. Additionally, it is feasible to directly `plot()` the +`param` object for the file of interest, showcasing the distribution of these +chromatographic peaks along with the fitted model line. + +```{r} +#access summary of matches and model information +summary <- summarizeLamaMatch(param) +summary + +# coverage for each file +summary$Matched_peaks / summary$Total_peaks * 100 + +#access the information on the model of for the first file +summary$model_summary[[1]] + +# Plot obs vs. ref with fitting line +plot(param, index = 1L, main = "ChromPeaks versus Lamas for the first file", + colPoint = "red") +``` # Additional details and notes