Skip to content

A dead simple R toolkit for quantitative chromatography

License

Notifications You must be signed in to change notification settings

octopode/tidychrom

Repository files navigation

tidychrom

A Dead Simple Toolkit for Quantitative Chromatography

This package takes raw chromatographic data (mass spec or colorimetric) and outputs relative or absolute quantities of identified compounds for compositional biochemistry analyses. It reproduces functions of proprietary instrument software, so researchers can liberate raw data and script reproducible analyses.

workflow overview

design principles

  1. ingredients you can pronounce

No fancy algorithms <cough>ordered bijective interpolated time warping</cough>, though nothing explicitly prevents their use with this package.

Caveat: since tidychrom does not implement RT adjustment nor spectrum deconvolution, it expects chromatographic data with good separation and fairly consistent RTs (within a few sec) across samples. Analyze highly complex mixtures at your own risk, and maybe with a dash of special sauce.

  1. data you can see and touch

Existing R-based chromatography solutions rely on S3/4 objects with slots that are not really standardized. This package attempts to keep all analysis products in tibbles and facilitate downstream analysis with dplyr. Visualization is implemented with ggplot2.

base peak chromatogram

(master base peak chromatogram from analyze_standards.R)

project structure

This repo contains a couple pre-cooked workflows (see workflows below), but above all, tidychrom is meant to be modular. Take the handful of functions provided and use them in your own dplyr-based workflows, perhaps with inspiration from those provided here. To aid you in dissecting this repo, some tips on its organization:

  1. There are no custom objects nor methods, only functions.

  2. Functions are packaged 1 to a file.

  3. Visualization is implemented in ggplot2. Custom plotting functions return a list of gg objects, which can be

  • stored in a tibble column and retrieved later,

(from analyze_samples.R)

ggplot() + areas_all_qc %>%
  filter(
    samp == "JWL0012" &
      id == "C22:6"
    ) %>%
  pull(b2b)

spectrum matchup

  • arranged alongside other stored plots,

(as in analyze_standards.R)

b2b <- lapply(scans_best$b2b, function(x){ggplot() + x})
do.call("grid.arrange", c(b2b, nrow = 5, ncol = 7))

ALL spectrum matchups

  • overlaid with other gg elements, like titles and other plots.

(from analyze_samples.R)

# to show all the ROIs integrated in a given sample
ggplot() + areas_all_qc %>%
  filter(
    samp == "JWL0138"
  ) %>%
  pull(xic) +
  ggtitle("all ROIs: sample JWL0138")

JWL0138 all ROIs

# or to show all the samples found in a given ROI
ggplot() + areas_all_qc %>%
  filter(
    id == "C22:6"
  ) %>%
  pull(xic) +
  ggtitle("C22:6 (DHA): 39 samples")

DHA all XICs

workflows

scanwise blanking

Preprocessing scripts to subtract blank data (like solvent peaks and column bleed):

Scanwise blanking

Scanwise blanking (multisession)

targeted relative quantitation

This workflow was made to calculate the ratio (molar percentages) of fatty acid methyl esters (FAMEs) in a set of biological samples. It has 2 main steps:

  1. Standard ID, saturation point determination, and spectrum extraction

  2. Sample ID and relative quantitation

Comments will walk you through each script. All user-provided parameters (including data directories) are provided at the top. Data files used in the scripts will be hosted at a later date.

untargeted relative quantitation (work in progress)

  1. ROI determination by peak frequency, saturation analysis, and spectrum extraction

  2. Sample ID (against above ROIs) and relative quantitation

About

A dead simple R toolkit for quantitative chromatography

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages