# Metabolomics data processing by *metabengine*

Welcome to ```metabengine```!

* ```metabengine``` aims to provide tools for the accurate and reproducible metabolomics data processing. Informed by ion identity, ```metabengine``` groups millions of ions in liquid chromatography-mass spectrometry (LC-MS) data for generating a list of unique chemical species.

* Artificial neural network (ANN) is employed in ```metabengine``` to automatically interpret chromatographic peak shapes, labeling the high-quality features with Gaussian-shaped peak for reliable quantitative analysis.

* ```metabengine``` also labels isotope, adducts, and in-source fragments based on peak-peak correlation and tandem MS/MS spectra (if available).

## Example 1 | Untargeted metabolomics workflow (Data-dependent acquisition)

This section demonstrates the most commonly used untargeted metabolomics data processing workflow. The goal is to generate feature table from raw LC-MS data with annotation.

The proposed workflow contains five steps:

1. Initiate project folder and set parameters

2. Peak picking

3. Peak evaluation by artificial neural network (ANN) model

4. Feature grouping (isotopes, adducts, in-source fragments)

5. Feature annotation

🛎️**Note** 

Create a folder for the new project, and load raw LC-MS data to the ```sample``` directory. Example:

```md
project/
├── sample
└── sample_table.csv
```

In [None]:
# import
import metabengine as mbe
from metabengine.params import Params

# STEP 1: Create a new project and set parameters
parameters = Params()
parameters.project_dir = "C:/Users/.../metabengine_project"   # Project directory, character string

parameters.rt_range = [0.0, 1000.0]   # RT range in minutes, list of two numbers
parameters.ms2_sim_tol = 0.8    # MS2 similarity tolerance
parameters.ion_mode = "positive"   # Ionization mode, "positive" or "negative"

# Parameters for feature detection
parameters.mz_tol_ms1 = 0.01    # m/z tolerance for MS1, default is 0.01
parameters.mz_tol_ms2 = 0.015   # m/z tolerance for MS2, default is 0.015
parameters.int_tol = 1000       # Intensity tolerance. We recommend 1000 for QTOF and 30000 for Orbitrap
parameters.roi_gap = 2          # Gap within a feature, default is 2 (i.e. 2 consecutive scans without signal)
parameters.min_ion_num = 10     # Minimum scan number a feature, default is 10
parameters.cut_roi = True       # Whether to cut ROI, default is True
parameters.ann_model = None     # ANN model for peak quality prediction, default is None

# Parameters for feature alignment
parameters.align_mz_tol = 0.01        # m/z tolerance for MS1, default is 0.01
parameters.align_rt_tol = 0.2         # RT tolerance, default is 0.2
parameters.discard_short_roi = True   # Whether to discard short ROIs with length < 5 and without MS2 from alignment, default is True

# Parameters for feature annotation
parameters.msms_library = None   # MS/MS library in MSP format, character string
parameters.ppr = 0.7             # Peak peak correlation threshold, default is 0.7

# Parameters for output
parameters.output_single_file = False   # Whether to output a single file for each raw file, default is False
parameters.output_aligned_file = True   # Output aligned file path, character string

# see https: for more parameters and their default values

# STEP 2-5: Untargeted metabolomics workflow
mbe.untargeted_workflow(parameters)

### Output

The untargeted workflow will output files in the project folder as specified.

```md
project/
├── sample
│   ├── qc_1.mzML
│   ├── qc_2.mzML
│   ├── ...
│   └── sample_1000.mzML
├── single_file_output
│   ├── qc_1.csv
│   ├── qc_2.csv
│   ├── ...
│   └── sample_1000.csv
├── mbe_project.pickle
└── feature_table.csv
```

## Example 2 | Load an existing project and re-processing

This section demonstrates how to load an existing project and perform further analysis

In [None]:
import metabengine as mbe

mbe.load_project("C:/Users/.../mbe_project.pickle")   # Load project

## Example 3 | Process single file for quick inspection

This example demonstrates the quick processing of single LC-MS data file for feature detection. The processed results can further be used for inspection.

In [None]:
# Set parameters for processing single file for feature detection
from metabengine.params import Params

# STEP 1: Create a new project and set parameters
parameters = Params()
parameters.project_dir = "C:/Users/.../metabengine_project"   # Project directory, character string

parameters.rt_range = [0.0, 60.0]   # RT range in minutes, list of two numbers
parameters.mode = "dda"         # Acquisition mode, "dda", "dia", or "full_scan"
parameters.ms2_sim_tol = 0.7    # MS2 similarity tolerance
parameters.ion_mode = "pos"     # Ionization mode, "pos" or "neg"

# Parameters for feature detection
parameters.mz_tol_ms1 = 0.01    # m/z tolerance for MS1, default is 0.01
parameters.mz_tol_ms2 = 0.015   # m/z tolerance for MS2, default is 0.015
parameters.int_tol = 1000       # Intensity tolerance, recommand 10000 for Orbitrap and 1000 for QTOF MS
parameters.roi_gap = 2          # Gap within a feature, default is 2 (i.e. 2 consecutive scans without signal)
parameters.min_ion_num = 10      # Minimum scan number a feature, default is 10

# Parameters for feature annotation (optional, set to None if not needed)
parameters.msms_library = None  # MS/MS library in MSP format, character string

# see https: for more parameters and their default values

In [None]:
# Example 3-1: Quick inspection on a blank file for background noise

from metabengine import feat_detection

file_name = "C:/Users/.../blank.mzML"
blank_file = feat_detection(file_name, parameters)

# Export a csv file for background noise to a folder
export_path = "C:/Users/.../"   # Folder path
blank_file.output_roi_report(export_path)

# Export extracted ion chromatogram (EIC) for background noise features
export_path = "C:/Users/.../"  # Folder path
blank_file.plot_all_rois(export_path)

In [None]:
# Example 3-2: Quick inspection on a quality control file for internal standards

from metabengine import feat_detection

## Example 4 | Quality control analysis

This example shows how to evaluate the quality control (QC) of LC-MS analysis using pooled QC samples

## Example 5 | Inspection on carryover issue

This example shows how to evaluate the quality control (QC) of LC-MS analysis using pooled QC samples

## Example 6 | Generate molecular networking

This example shows how to compute the correlation between features for creating molecular networking.

## Example 7 | Create EIC for a ion

In [None]:
from metabengine import read_raw_file_to_obj

file_name = ""          # specify the file path (end with .mzml or .mzxml)
d = read_raw_file_to_obj(file_name)
# plot EIC
target_mz = 121.0508    # m/z value of the ion
targeted_rt = None      # RT value of the ion, set to None if you want to plot the whole EIC
mz_tol = 0.01           # m/z tolerance for EIC
rt_tol = 0.3            # RT window for EIC
output = False          # False to show the plot. If you want to save the plot, set it as a file path
# plot EIC
d.plot_eic(target_mz, targeted_rt, mz_tol, rt_tol, output)
# get EIC data
eic_rt, eic_int, eic_mz, eic_scan_idx = d.get_eic_data(target_mz, targeted_rt, mz_tol, rt_tol)