# Chromatographic Data Preprocessing for Analysis

To check the arguments accepted by the script you can execute :
```bash
python Features_Concensus_extraction.py --help
```

# Parameters for Preprocesing Chromatograph Data 
## Mass Trace Detection Parameters
-  *--mass_trace_mass_error_ppm*: Mass error tolerance in parts per million (default: 10.0).
-  *--mass_trace_noise_threshold_int*: Intensity threshold for noise filtering (default: 1200).
-  *--mass_trace_chrom_peak_snr*: Signal-to-noise ratio for chromatographic peak detection (default: 3.0).
-  *--mass_trace_min_sample_rate*: Minimum required sampling rate (default: 0.5).
-  *--mass_trace_min_length*: Minimum length of mass traces (default: 5.0).
-  *--mass_trace_max_length*: Maximum length of mass traces. -1.0 for no maximum (default: -1.0).
-  *--mass_trace_quant_method*: Method for quantification ("area" or alternative methods).


## Elution Peak Detection Parameters
- *--elution_peak_width_filtering*: Setting for peak width filtering (default: "auto").
- *--elution_peak_chrom_fwhm*: Full width at half maximum for chromatographic peaks (default: 2.0).
- *--elution_peak_chrom_peak_snr*: Signal-to-noise ratio for elution peak detection (default: 3.0).
- *--elution_peak_min_fwhm*: Minimum full width at half maximum (default: 1.0).
- *--elution_peak_max_fwhm*: Maximum full width at half maximum (default: 60.0).
- *--elution_peak_masstrace_snr_filtering*: Enable/disable SNR filtering for mass traces (default: "false").

## Feature Detection Parameters
- *--feature_detection_remove_single_traces*: Whether to remove features with single traces (default: "false").
- *--feature_detection_local_rt_range*: Local retention time range for feature detection (default: 2.0).
- *--feature_detection_local_mz_range*: Local m/z range for feature detection (default: 10.0).
- *--feature_detection_charge_lower_bound*: Lower bound for charge state detection (default: 1).
- *--feature_detection_charge_upper_bound*: Upper bound for charge state detection (default: 3).
- *--feature_detection_chrom_fwhm*: Chromatographic FWHM for feature detection (default: 2.0).
- *--feature_detection_report_summed_ints*: Whether to report summed intensities (default: "false").
- *--feature_detection_enable_RT_filtering*: Enable/disable RT filtering (default: "true").
- *--feature_detection_isotope_filtering_model*: Model for isotope filtering (default: "metabolites (5% RMS)").
- *--feature_detection_mz_scoring_13C*: Enable/disable 13C m/z scoring (default: "false").
- *--feature_detection_use_smoothed_intensities*: Use smoothed intensities (default: "true").
- *--feature_detection_report_convex_hulls*: Report convex hulls (default: "true").
- *--feature_detection_report_chromatograms*: Report chromatograms (default: "false").
- *--feature_detection_mz_scoring_by_elements*: Enable/disable m/z scoring by elements (default: "false").
- *--filename_feature_map*: Name of the feautureXML files with the Feature map info. Do not add the extension. (default: feature_map).
- *--filename_consensus_map*: Name of the consensusXML files with the Consensus map info. Do not add the extension. (default: consensus_map).

## MS2-Feature Detection Parameters
- *--make_ms2_mz_tolerance*: m/z tolerance for MS2 features mapping (default: 0.01).
- *--make_ms2_rt_tolerance*: RT tolerance for MS2 features mapping (default: 5.0).
- *--filename_ms2s_mzml*: Name of the mzML files with the MS2 spectra. Do not add the extension. (default: ms2s).
- *--max_peak_filter_pptg*: Maximum peak filter for pptg (default: 0.2).
- *--merger_spectra_mz_binning_width*: m/z binning width for merging spectra (default: 5.0).
- *--merger_spectra_mz_binning_width_unit*: Unit for m/z binning width (default: "ppm").
- *--merger_spectra_sort_blocks*: Sorting method for blocks (default: "RT_ascending").
- *--merger_spectra_mz_tolerance*: m/z tolerance for merging spectra (default: 1.0e-04).
- *--merger_spectra_mass_tolerance*: Mass tolerance for merging spectra (default: 0.0).
- *--merger_spectra_rt_tolerance*: RT tolerance for merging spectra (default: 15.0).
- *--filter_type*: Type of filter to apply (default: "window_mower").
- *--window_mower_windowsize*: Window size for the window mower filter if --filter_type:"window_mower" (default: 50.0).
- *--window_mower_peakcount*: Number of peaks to keep in the window mower filter if --filter_type:"window_mower" (default: 2).
- *--window_mower_movetype*: Movement type for the window mower filter if --filter_type:"window_mower" (default: "slide").
- *--threshold_mower_threshold*: Threshold for the threshold mower filter if --filter_type:"threshold_mower" (default: 0.05).
- *--nlargest_n*: Number of largest peaks to keep if --filter_type:"nlargest" (default: 200).

In the command line execute:

```bash
python Features_Concensus_extraction.py --chromatograms_dir /shared/users/ptfi/data/CASMI/pos/A_M1_posPFP --model_dir /shared/users/ptfi/models --model_name dd_arch1_lf1_data_1.pth --output_dir /users/glara/scratch/ --num_pred 3 --device cpu --protocol CASMI
```

```bash
RT window size calculated as 240 seconds.
Progress of 'mass trace detection':
-- done [took 2.17 s (CPU), 2.18 s (Wall)] -- 
Progress of 'elution peak detection':
-- done [took 3.07 s (CPU), 0.11 s (Wall)] -- 
Progress of 'assembling mass traces to features':
Loading metabolite isotope model with 5% RMS error
-- done [took 2.10 s (CPU), 0.11 s (Wall)] -- 
Progress of 'mass trace detection':
-- done [took 2.74 s (CPU), 2.76 s (Wall)] -- 
Progress of 'elution peak detection':
-- done [took 2.60 s (CPU), 0.08 s (Wall)] -- 
Progress of 'assembling mass traces to features':
-- done [took 2.09 s (CPU), 0.10 s (Wall)] -- 
Progress of 'computing RT transformations':
-- done [took 0.41 s (CPU), 0.41 s (Wall)] -- 
Progress of 'linking features':
-- done [took 0.47 s (CPU), 0.47 s (Wall)] -- 
/shared/users/ptfi/data/CASMI/pos/A_M1_posPFP/A_M1_posPFP_01.mzml
/shared/users/ptfi/data/CASMI/pos/A_M1_posPFP/A_M1_posPFP_02.mzml
Warning: SpectraDistance received the unknown parameter 'mass_tolerance'!
Number of M/z lists: 98
Number of Intensity lists: 98
Number of Retention times: 98
Number of MS2 Precursor masses: 98
MS data saved to /users/glara/scratch/ms_data.pkl
Cluster sizes:
<Loading metabolite isotope model with 5% RMS error> occurred 2 times
  size 2: 6x
  size 3: 5x
  size 4: 10x
  size 5: 2x
  size 6: 2x
  size 7: 1x
  size 8: 6x
  size 9: 2x
  size 10: 3x
  size 11: 4x
  size 12: 2x
  size 13: 1x
  size 14: 2x
  size 16: 3x
  size 17: 1x
  size 18: 1x
  size 21: 2x
  size 23: 2x
  size 24: 1x
  size 25: 1x
  size 26: 1x
  size 27: 1x
  size 28: 1x
  size 31: 1x
  size 32: 5x
  size 33: 2x
  size 36: 2x
  size 37: 1x
  size 45: 1x
  size 46: 1x
  size 47: 1x
  size 48: 1x
  size 52: 1x
  size 54: 1x
  size 66: 1x
  size 68: 1x
  size 83: 1x
  size 87: 1x
  size 106: 1x
  size 107: 1x
  size 144: 1x
  size 178: 1x
  size 189: 1x
  size 194: 1x
  size 205: 1x
  size 207: 1x
  size 212: 1x
  size 385: 1x
  size 487: 1x
  size 635: 1x
  size 640: 1x
  size 723: 1x
  size 1100: 1x
  size 1607: 1x
  size 3577: 1x
Number of merged peaks: 439595/383942 (114.50 %) of blocked spectra
```

# MS2 Spectra Merging
## --merger_spectra_mz_tolerance 1.0e-04 
## --merger_spectra_RT_tolerance 15.0
<img src="https://pyopenms.readthedocs.io/en/latest/_images/spec_merging_3.png" width="800"/>

# MS2 Filter
## --filter_type: window_mower
<img src="https://pyopenms.readthedocs.io/en/latest/_images/window_mower.png" width="400"/>

## --filter_type: threshold_mower
<img src="https://pyopenms.readthedocs.io/en/latest/_images/threshold_mower.png" width="400"/>

## --filter_type: nlargest
<img src="https://pyopenms.readthedocs.io/en/latest/_images/nlargest.png" width="400"/>