docs: describe the ppm-based correspondence in vignette

sneumann · Jan 16, 2024 · 9c41f81 · 9c41f81
1 parent 46039ff
commit 9c41f81
Showing 1 changed file with 50 additions and 6 deletions.
diff --git a/vignettes/xcms.Rmd b/vignettes/xcms.Rmd
@@ -60,7 +60,9 @@ This document describes data import, exploration and pre-processing of a simple
 test LC-MS data set with the *xcms* package version >= 4. The same functions can
 be applied to the older *MSnbase*-based workflows (xcms version 3). Additional
 documents and tutorials covering also other topics of untargeted metabolomics
-analysis are listed at the end of this document.
+analysis are listed at the end of this document. There is also a [xcms
+tutorial](https://jorainer.github.io/xcmsTutorials) available with more examples
+and details.
 
 
 # Pre-processing of LC-MS data
@@ -325,7 +327,7 @@ internal standard of known compound. It is suggested to inspect the ranges of
 m/z values for several compounds (either internal standards or compounds known
 to be present in the sample) and define the `ppm` parameter for *centWave*
 according to these. See also this
-[tutorial](https://jorainer.github.io/metabolomics2018) for additional
+[tutorial](https://jorainer.github.io/xcmsTutorials) for additional
 information and examples on choosing and testing peak detection settings.
 
 Chromatographic peak detection can also be performed on extracted ion
@@ -856,17 +858,59 @@ correspondence settings on manually defined m/z slices before applying them to
 the full data set. For the tested m/z slice the settings seemed to be OK and we
 are thus applying them to the full data set below. Especially the parameter `bw`
 will be very data set dependent (or more specifically LC-dependent) and should
-be adapted to each data set. See the [Metabolomics pre-processing with
-`xcms`](https://jorainer.github.io/metabolomics2018) tutorial for examples and
-more details.
+be adapted to each data set.
+
+Another important parameter is `binSize` that defines the size of the m/z slices
+(bins) within which peaks are being grouped. This parameter thus defines the
+required similarity in m/z values for the chromatographic peaks that are then
+assumed to represent signal from the same (type of ion of a) compound and hence
+evaluated for grouping. By default, a constant m/z bin size is used, but by
+changing parameter `ppm` to a value larger than 0, m/z-relative bin sizes would
+be used instead (i.e., the bin size will increase with the m/z value hence
+better representing the measurement error/precision of some MS instruments).
+
+See also the [xcms
+tutorial](https://jorainer.github.io/xcmsTutorials) for more examples and
+details.
 
 ```{r correspondence, message = FALSE }
-## Perform the correspondence
+## Perform the correspondence using fixed m/z bin sizes.
 pdp <- PeakDensityParam(sampleGroups = sampleData(faahko)$sample_group,
                         minFraction = 0.4, bw = 30)
 faahko <- groupChromPeaks(faahko, param = pdp)
 ```
 
+As an alternative we perform the correspondence using m/z relative bin sizes.
+
+```{r}
+## Drop feature definitions and re-perform the correspondence
+## using m/z-relative bin sizes.
+faahko_ppm <- groupChromPeaks(
+    dropFeatureDefinitions(faahko),
+    PeakDensityParam(sampleGroups = sampleData(faahko)$sample_group,
+                     minFraction = 0.4, bw = 30, ppm = 10))
+```
+
+The results will be *mostly* similar, except for the higher m/z range (in which
+larger m/z bins will be used). Below we plot the m/z range for features against
+their median m/z. For the present data set (acquired with a triple quad
+instrument) no clear difference can be seen for the two approaches hence we
+proceed the analysis with the fixed bin size setting. A stronger relationship
+would be expected for example for data measured on TOF instruments.
+
+```{r, fig.cap = "Relationship between a feature's m/z and the m/z width (max - min m/z) of the feature. Red points represent the results with the fixed m/z bin size, blue with the m/z-relative bin size."}
+## Calculate m/z width of features
+mzw <- featureDefinitions(faahko)$mzmax - featureDefinitions(faahko)$mzmin
+mzw_ppm <- featureDefinitions(faahko_ppm)$mzmax -
+                                        featureDefinitions(faahko_ppm)$mzmin
+plot(featureDefinitions(faahko_ppm)$mzmed, mzw_ppm,
+     xlab = "m/z", ylab = "m/z width", pch = 21,
+     col = "#0000ff20", bg = "#0000ff10")
+points(featureDefinitions(faahko)$mzmed, mzw, pch = 21,
+     col = "#ff000020", bg = "#ff000010")
+
+```
+
 Results from the correspondence analysis can be accessed with the
 `featureDefinitions` and `featureValues` function. The former returns a data
 frame with general information on each of the defined features, with each row