# Metabolomics data processing by *metabengine*

Welcome to ```metabengine```!

* ```metabengine``` aims to provide tools for the accurate and reproducible metabolomics data processing. Informed by ion identity, ```metabengine``` groups millions of ions in liquid chromatography-mass spectrometry (LC-MS) data for generating a list of unique chemical species.

* Artificial neural network (ANN) is employed in ```metabengine``` to automatically interpret chromatographic peak shapes, labeling the high-quality features with Gaussian-shaped peak for reliable quantitative analysis.

* ```metabengine``` also labels isotope, adducts, and in-source fragments based on peak-peak correlation and tandem MS/MS spectra (if available).

## Example 1 | Untargeted metabolomics workflow (Data-dependent acquisition)

This section demonstrates the most commonly used untargeted metabolomics data processing workflow. The goal is to generate feature table from raw LC-MS data with annotation.

The proposed workflow contains five steps:

1. Initiate project folder and set parameters

2. Peak picking

3. Peak evaluation by artificial neural network (ANN) model

4. Feature grouping (isotopes, adducts, in-source fragments)

5. Feature annotation

🛎️**Note** 

Create a folder for the new project, and load raw LC-MS data to the right directory. Example:

```md
project/
├── pooled_qc
├── blank
└── sample
```

In [None]:
# import
import metabengine as mbe
from metabengine.params import Params

# STEP 1: Create a new project and set parameters
parameters = Params()
parameters.project_dir = "C:/Users/.../metabengine_project"   # Project directory, character string

parameters.rt_range = [0.0, 60.0]   # RT range in minutes, list of two numbers
parameters.mode = "dda"         # Acquisition mode, "dda", "dia", or "full_scan"
parameters.ms2_sim_tol = 0.7    # MS2 similarity tolerance
parameters.ion_mode = "pos"     # Ionization mode, "pos" or "neg"

parameters.output_single_file_path = None   # Output single file path, character string

# Parameters for feature detection
parameters.mz_tol_ms1 = 0.01    # m/z tolerance for MS1, default is 0.01
parameters.mz_tol_ms2 = 0.015   # m/z tolerance for MS2, default is 0.015
parameters.int_tol = 1000       # Intensity tolerance, recommand 10000 for Orbitrap and 1000 for QTOF MS
parameters.roi_gap = 2          # Gap within a feature, default is 2 (i.e. 2 consecutive scans without signal)
parameters.min_ion_num = 10      # Minimum scan number a feature, default is 10

# Parameters for feature alignment
parameters.align_mz_tol_ms1 = 0.01  # m/z tolerance for MS1, default is 0.01
parameters.align_rt_tol = 0.1       # RT tolerance, default is 0.1

# Parameters for feature annotation
parameters.msms_library = None  # MS/MS library in MSP format, character string. Example: "C:/Users/.../NIST20.MSP"

# see https: for more parameters and their default values

# Create a new project
mbe.create_project(parameters)

# STEP 2-5: Untargeted metabolomics workflow
mbe.untargeted_workflow(parameters)

### Output

The untargeted workflow will output files in the project folder as specified.

```md
project/
├── pooled_qc
├── blank
├── sample
├── single_files
│   ├── qc_1.csv
│   ├── qc_2.csv
│   ├── ...
│   └── sample_1000.csv
├── testing_project.pickle
└── feature_table.csv
```

## Example 2 | Load an existing project and re-processing

This section demonstrates the most commonly used untargeted metabolomics data processing workflow. The goal is to extract features from raw LC-MS data with quantitative information and annotation.

## Example 3 | Process single file for quick inspection

This example demonstrates the quick processing of single LC-MS data file for feature detection. The processed results can further be used for inspection.

In [None]:
# Set parameters for processing single file for feature detection
from metabengine.params import Params

# STEP 1: Create a new project and set parameters
parameters = Params()
parameters.project_dir = "C:/Users/.../metabengine_project"   # Project directory, character string

parameters.rt_range = [0.0, 60.0]   # RT range in minutes, list of two numbers
parameters.mode = "dda"         # Acquisition mode, "dda", "dia", or "full_scan"
parameters.ms2_sim_tol = 0.7    # MS2 similarity tolerance
parameters.ion_mode = "pos"     # Ionization mode, "pos" or "neg"

# Parameters for feature detection
parameters.mz_tol_ms1 = 0.01    # m/z tolerance for MS1, default is 0.01
parameters.mz_tol_ms2 = 0.015   # m/z tolerance for MS2, default is 0.015
parameters.int_tol = 1000       # Intensity tolerance, recommand 10000 for Orbitrap and 1000 for QTOF MS
parameters.roi_gap = 2          # Gap within a feature, default is 2 (i.e. 2 consecutive scans without signal)
parameters.min_ion_num = 10      # Minimum scan number a feature, default is 10

# Parameters for feature annotation (optional, set to None if not needed)
parameters.msms_library = None  # MS/MS library in MSP format, character string

# see https: for more parameters and their default values

In [None]:
# Example 3-1: Quick inspection on a blank file for background noise

from metabengine import feat_detection

file_name = "C:/Users/.../blank.mzML"
blank_file = feat_detection(file_name, parameters)

# Export a csv file for background noise to a folder
export_path = "C:/Users/.../"   # Folder path
blank_file.output_roi_report(export_path)

# Export extracted ion chromatogram (EIC) for background noise features
export_path = "C:/Users/.../"  # Folder path
blank_file.plot_all_rois(export_path)

In [None]:
# Example 3-2: Quick inspection on a quality control file for internal standards

from metabengine import feat_detection

## Example 4 | Quality control analysis

This example shows how to evaluate the quality control (QC) of LC-MS analysis using pooled QC samples

## Example 5 | Inspection on carryover issue

This example shows how to evaluate the quality control (QC) of LC-MS analysis using pooled QC samples

## Example 6 | Generate molecular networking

This example shows how to compute the correlation between features for creating molecular networking.

## Example 7 | Targeted analysis by a list of compounds

This example shows how to compute the correlation between features for creating molecular networking.