Skip to content

Releases: maccoss/mars

MARS v0.1.4

28 Feb 22:31

Choose a tag to compare

Mars v0.1.4 Release Notes

Release Date: TBD

Overview

This release includes bug fixes and improvements.

New Features

  • --mzML flag alias: The --mzml option now also accepts --mzML (matching the file extension casing) across all commands (calibrate, qc, apply).
  • Unquoted wildcards: Shell-expanded wildcards now work without quotes. For example, mars calibrate --mzml *.mzML works the same as mars calibrate --mzml "*.mzML".
  • Positional file arguments: mzML files can now be passed as positional arguments without the --mzml flag, e.g., mars calibrate *.mzML --prism-csv report.csv.
  • Repeatable --mzml: The --mzml option can be specified multiple times to pass individual files, e.g., --mzml a.mzML --mzml b.mzML.

Bug Fixes

(No bug fixes this version)

Changes

  • The --mzml option now uses multiple=True internally, accepting one or more values.
  • All three subcommands (calibrate, qc, apply) accept a trailing [INPUT_FILES]... positional argument for mzML file paths.

Compatibility

  • Fully backward compatible with v0.1.3
  • Supported spectral library formats:
    • blib (BiblioSpec)
    • PRISM CSV
    • DIA-NN parquet (report-lib.parquet + report.parquet)
  • Output mzML files are compatible with:
    • DIA-NN
    • SeeMS (ProteoWizard)
    • MSConvert
    • Skyline
    • Other standard mzML readers

Upgrade Notes

pip install --upgrade mars-ms

MARS v0.1.3

31 Jan 00:20
96e96ad

Choose a tag to compare

Mars v0.1.3 Release Notes

Release Date: January 2026

Overview

This release adds support for DIA-NN parquet library files and fixes critical compatibility issues with mzML output files. The mzML writer has been completely rewritten to use a passthrough approach that preserves all original file metadata, ensuring compatibility with downstream tools like DIA-NN, SeeMS, and MSConvert.

New Features

DIA-NN Parquet Library Support

Mars now supports loading spectral libraries directly from DIA-NN parquet output files as an alternative to blib or PRISM CSV formats.

Usage:

mars calibrate --mzml input.mzML --library report-lib.parquet --output calibrated.mzML

How it works:

  • Library file (report-lib.parquet): Contains fragment ion information (m/z, ion types, charges) used for matching
  • Report file (report.parquet): Contains per-file retention time windows (RT.Start, RT.Stop) for each precursor
  • The report file is automatically detected in the same directory as the library file
  • If report.parquet is not found, Mars will exit with an error

Optional filtering:

Use the --diann-report option to specify a different report file location:

mars calibrate --mzml input.mzML --library report-lib.parquet --diann-report /path/to/report.parquet

File type detection:

Mars automatically detects if you accidentally provide the wrong file type (e.g., report.parquet instead of report-lib.parquet) based on column content, not filename. You'll receive a helpful error message pointing to the correct file.

Bug Fixes

Fixed: DIA-NN Compatibility

The previous psims-based writer generated mzML files that DIA-NN could not read. The issue was caused by differences in:

  • CV reference IDs: psims uses cvRef="PSI-MS" while ProteoWizard uses cvRef="MS"
  • Missing metadata: Thermo nativeID format, instrument configuration, and other CV terms were not preserved
  • Altered file structure: The psims writer generated a different XML structure than the original

The new passthrough writer preserves the original file structure byte-for-byte, only modifying the m/z binary data for MS2 spectra.

Fixed: SeeMS Metadata Display

SeeMS now correctly displays spectrum metadata in separate columns (Controllertype, Controllernumber, Scan) instead of a combined ID field. This is because the original Thermo nativeID format CV term is now preserved.

Fixed: Broken mzML Output Files

The original lxml-based writer caused several issues:

  • Invalid index offsets: The <indexList> section contained stale byte offsets after XML rewriting
  • XML formatting changes: Attribute reordering and whitespace changes could break strict parsers

Changes

New Passthrough mzML Writer

The new writer uses a fundamentally different approach:

  1. Preserves original file exactly - All metadata, CV terms, XML structure, and formatting are kept unchanged
  2. Only modifies m/z binary data - For MS2 spectra, the m/z array is decoded, calibrated, and re-encoded
  3. Regenerates index - Byte offsets are recalculated after m/z data changes
  4. Regenerates checksum - The SHA-1 file checksum is updated

This ensures maximum compatibility with all downstream tools.

Wide-Window MS2 Spectra Handling

  • When --max-isolation-window is specified, MS2 spectra exceeding that width are left unchanged (not calibrated)
  • The spectra remain in the output file but with original m/z values
  • This differs from the previous behavior where wide-window spectra were excluded entirely

Dependencies

  • Added lxml for XML parsing and serialization
  • psims is no longer used for mzML writing (still available as a dependency)

Technical Details

The new writer workflow:

  1. Reads the original mzML file as raw bytes
  2. Parses the mzML content with lxml while preserving structure
  3. Reads spectrum metadata with pyteomics for calibration calculations
  4. For each MS2 spectrum:
    • Decodes the m/z binary array (base64 + zlib)
    • Applies calibration function
    • Re-encodes with same compression settings
    • Updates the encodedLength attribute
  5. MS1 spectra and chromatograms remain completely unchanged
  6. Regenerates the index with correct byte offsets
  7. Recalculates the SHA-1 file checksum

Preserved Metadata

The following are now correctly preserved from the original file:

  • Thermo nativeID format CV term
  • Thermo RAW format CV term
  • Source file references (RAW file path, SHA-1)
  • Instrument configuration (Stellar, serial number)
  • Sample information
  • All spectrum CV parameters (base peak, TIC, filter string, etc.)
  • All chromatograms (TIC, pump pressure, etc.)
  • XML namespaces and schema locations

Compatibility

  • Fully backward compatible with v0.1.2
  • Supported spectral library formats:
    • blib (BiblioSpec)
    • PRISM CSV
    • DIA-NN parquet (report-lib.parquet + report.parquet) NEW
  • Output mzML files are compatible with:
    • DIA-NN
    • SeeMS (ProteoWizard)
    • MSConvert
    • Skyline
    • Other standard mzML readers
  • The calibration model format is unchanged

Upgrade Notes

No action required. Simply update to v0.1.3 and re-run calibration to generate valid mzML files.

pip install --upgrade mars-ms

MARS v0.1.2

15 Jan 06:01
bfe64d1

Choose a tag to compare

Mars v0.1.2 Release Notes

Release Date: January 2026

Overview

This release adds support for high-resolution Orbitrap/Astral analyzer data with PPM-based matching and visualization, plus major performance improvements for large PRISM libraries. Mars can now handle both Stellar Ion Trap data (Th-scale errors) and Astral analyzer data (ppm-scale errors) with automatic detection.

New Features

PPM Tolerance Support

  • New --tolerance-ppm CLI option for fragment matching in ppm (e.g., --tolerance-ppm 10 for ±10 ppm)
  • When specified, overrides the default --tolerance (Th) parameter
  • PPM tolerance scales dynamically with m/z, appropriate for high-resolution Orbitrap data

Delta PPM Metrics

  • All match DataFrames now include both delta_mz (Th) and delta_ppm columns
  • After calibration, delta_ppm_calibrated is computed alongside delta_mz_calibrated
  • Logging shows statistics in both units for easier comparison

Adaptive QC Visualization

  • Auto-detection of ppm vs Th mode based on MAD (Median Absolute Deviation):
    • If MAD < 0.05 Th → ppm mode (high-resolution data)
    • If MAD ≥ 0.05 Th → Th mode (unit-resolution data)
  • All hexbin QC plots updated with use_ppm parameter:
    • Histogram
    • Heatmap (RT × fragment m/z)
    • Intensity vs error
    • RT vs error
    • Fragment m/z vs error
    • TIC vs error
    • Injection time vs error
    • TIC×Injection time vs error
    • Fragment ions vs error
    • Temperature vs error
    • Adjacent ion feature plots
  • Y-axis limits automatically adjust:
    • ppm mode: ±25 ppm
    • Th mode: ±0.25 Th

Performance Improvements

Automatic Replicate Filtering

  • PRISM library loading now automatically filters to only the replicates matching the mzML files being processed
  • Previously, large multi-replicate PRISM exports (e.g., 67M rows) would load entirely; now only relevant rows are processed
  • This dramatically reduces load time and memory usage for large studies

Optimized PRISM Library Loading

  • Column-selective loading: Only loads required columns, reducing I/O and memory
  • Vectorized replicate filtering: Uses pandas string methods instead of row-by-row apply()
  • Vectorized fragment parsing: Ion type, number, and loss type parsed in bulk
  • Faster iteration: Uses itertuples() instead of iterrows() (5-10x faster)
  • Progress logging: Reports progress every 50,000 peptides for large libraries

Faster mzML Calibration

  • Vectorized space charge feature computation: The _compute_ions_in_range_vectorized function now uses fully vectorized NumPy operations instead of a Python for-loop
  • Uses np.searchsorted with array inputs to compute all intensity range sums simultaneously
  • Significantly improves calibration speed when writing large mzML files with many spectra

Dependabot Integration

  • Added .github/dependabot.yml for automated dependency updates
  • Monitors both Python (pip) and GitHub Actions dependencies weekly

Bug Fixes

  • Fixed auto-detection logic to use proper MAD (Median Absolute Deviation) instead of median(|delta_mz|) for determining visualization mode
  • Fixed DtypeWarning when loading large PRISM CSVs with mixed column types
  • Hexbin plots now use linear color scale (except injection time vs error which uses log scale)

Usage Examples

Stellar Ion Trap (Th-based, default)

mars calibrate \
  --mzml data.mzML \
  --prism-csv report.csv \
  --tolerance 0.3 \
  --output-dir output/

Astral Analyzer (ppm-based)

mars calibrate \
  --mzml data.mzML \
  --prism-csv report.csv \
  --tolerance-ppm 10 \
  --output-dir output/

Large Multi-Replicate Studies

# Mars automatically filters the PRISM library to only the 3 files being processed
mars calibrate \
  --mzml "plasma_samples/*.mzML" \
  --prism-csv full_study_prism_export.csv \
  --tolerance-ppm 10 \
  --output-dir output/

Technical Notes

  • The model still trains on delta_mz (Th) internally, as the XGBoost model works in absolute units
  • PPM conversion is applied at the matching and visualization stages
  • Temperature-based features remain relevant for Stellar data but are typically not present in Astral mzML files

Compatibility

  • Fully backward compatible with v0.1.1
  • Existing workflows using --tolerance (Th) continue to work unchanged
  • New Astral/Orbitrap workflows can use --tolerance-ppm

Space Charge Modeling Improvements

Ions Below Features

Added 6 new features to capture space charge effects from ions below the fragment m/z:

  • ions_below_0_1 - Total ions in (X-1.5, X-0.5] Th range below fragment m/z
  • ions_below_1_2 - Total ions in (X-2.5, X-1.5] Th range below fragment m/z
  • ions_below_2_3 - Total ions in (X-3.5, X-2.5] Th range below fragment m/z
  • adjacent_ratio_below_0_1 - ions_below_0_1 / fragment_ions
  • adjacent_ratio_below_1_2 - ions_below_1_2 / fragment_ions
  • adjacent_ratio_below_2_3 - ions_below_2_3 / fragment_ions

These complement the existing "ions above" features and provide a more complete picture of the local ion environment affecting each fragment.

New QC Plots

Added 6 new QC visualization plots for the ions below features:

  • mars_qc_ions_below_-1.5_-0.5_vs_error.png
  • mars_qc_ions_below_-2.5_-1.5_vs_error.png
  • mars_qc_ions_below_-3.5_-2.5_vs_error.png
  • mars_qc_adjacent_ratio_-1.5_-0.5_vs_error.png
  • mars_qc_adjacent_ratio_-2.5_-1.5_vs_error.png
  • mars_qc_adjacent_ratio_-3.5_-2.5_vs_error.png

Isotope-Centered m/z Bins

The space charge feature bins have been shifted by 0.5 Th to better center on isotope patterns for 1+ charge fragments:

Old Range New Range Description
(X, X+1] (X+0.5, X+1.5] First isotope region above
(X+1, X+2] (X+1.5, X+2.5] Second isotope region above
(X+2, X+3] (X+2.5, X+3.5] Third isotope region above

The bins now reference the monoisotopic fragment m/z (expected m/z from library) rather than the observed m/z for more consistent feature calculation.

Space Charge Features Applied During Calibration

Previously, space charge features (ions_above_*, adjacent_ratio_*) were only computed during model training but set to zero during mzML calibration. Now all 12 space charge features are computed for every peak when writing calibrated mzML files, ensuring the model's learned corrections are fully applied.

Updated Feature Display Names

Feature importance plots and QC plot filenames now use the new bin naming convention:

  • ions_above_0.5_1.5 (was ions_above_0_1)
  • ions_above_1.5_2.5 (was ions_above_1_2)
  • ions_above_2.5_3.5 (was ions_above_2_3)
  • adjacent_ratio_0.5_1.5 (was adjacent_ratio_0_1)
  • etc.

Model Features

The model now supports up to 22 features (when all data is available):

  1. precursor_mz - DIA isolation window center
  2. fragment_mz - Fragment ion m/z
  3. log_tic - Log10 of total ion current
  4. log_intensity - Log10 of observed peak intensity
  5. absolute_time - Unix timestamp of acquisition
  6. injection_time - Ion injection time (seconds)
  7. tic_injection_time - TIC x injection time product
  8. fragment_ions - Fragment intensity x injection time
  9. ions_above_0_1, ions_above_1_2, ions_above_2_3 - Ions in ranges above
  10. ions_below_0_1, ions_below_1_2, ions_below_2_3 - Ions in ranges below
  11. adjacent_ratio_0_1, adjacent_ratio_1_2, adjacent_ratio_2_3 - Ratios above
  12. adjacent_ratio_below_0_1, adjacent_ratio_below_1_2, adjacent_ratio_below_2_3 - Ratios below
  13. rfa2_temp - RF amplifier temperature
  14. rfc2_temp - RF electronics temperature

MARS v0.1.1

13 Jan 08:52

Choose a tag to compare

Mars v0.1.1 Release Notes

Overview

Patch release v0.1.1 focuses on code quality improvements, fixing reported linting issues, and minor bug fixes in the command-line interface.

Changes

CLI Improvements

  • Added missing --max-isolation-window parameter to the mars qc command, allowing consistent filtering between calibration and QC steps.

Code Quality

  • Addressed multiple linting issues identified by ruff.
  • Updated obsolete string formatting to use modern f-strings in mzml.py.
  • Fixed potential logic errors by ensuring strict iterables in zip() calls.
  • Removed unused variables and unnecessary function arguments.
  • Improved code readability and standard compliance.

MARS v0.1.0

13 Jan 08:23

Choose a tag to compare

Mars v0.1.0 Release Notes

Overview

Initial release of Mars (Mass Accuracy Recalibration System), a mass calibration tool for Thermo Stellar unit resolution DIA mass spectrometry data. Mars learns m/z calibration corrections from spectral library fragment matches using an XGBoost model.

Features

Mars uses a machine learning approach to predict m/z corrections based on:

  • Fragment m/z: Mass-dependent calibration bias
  • Peak intensity: Higher intensity peaks provide more reliable calibration
  • Absolute time: Calibration drift over the acquisition run
  • Spectrum TIC: Space charge effects from high ion current
  • Ion injection time: Signal accumulation duration effects
  • Precursor m/z: DIA isolation window-specific effects
  • Adjacent ion population: Ion density in neighboring m/z ranges (0-1, 1-2, 2-3 Th above)
  • Adjacent ion ratios: Relative ion density (adjacent ions / fragment ions)
  • RF temperatures: Thermal effects from RF amplifier (RFA2) and electronics (RFC2)

Fragment Matching

  • Matches library peptides to DIA MS2 spectra using precursor m/z and RT windows
  • Selects the most intense peak within m/z tolerance (not closest)
  • Configurable minimum intensity threshold

PRISM Integration

  • Optional --prism-csv flag for using exact Skyline RT windows (Start Time, End Time)
  • Falls back to ±5 seconds around library RT when PRISM CSV not provided

Batch Processing

  • Process multiple mzML files with glob patterns (--mzml "*.mzML")
  • Process entire directories with --mzml-dir

QC Reports

Generated quality control outputs include:

  • Delta m/z distribution histogram with MAD and RMS statistics (before/after calibration)
  • 2D heatmap visualization (RT × m/z, color = delta)
  • Hexbin density plots (intensity, RT, m/z, injection time, TIC, fragment ions vs mass error)
  • Model feature importance plot
  • Calibration statistics summary

Output Files

File Description
{input}-mars.mzML Recalibrated mzML file
mars_model.pkl Trained XGBoost calibration model
mars_qc_histogram.png Delta m/z distribution (before/after)
mars_qc_heatmap.png 2D heatmap (RT × m/z, color = delta)
mars_qc_intensity_vs_error.png Intensity vs mass error hexbin
mars_qc_rt_vs_error.png RT vs mass error hexbin
mars_qc_mz_vs_error.png Fragment m/z vs mass error hexbin
mars_qc_tic_vs_error.png TIC vs mass error hexbin
mars_qc_injection_time_vs_error.png Injection time vs mass error hexbin
mars_qc_tic_injection_time_vs_error.png TIC×injection time vs mass error hexbin
mars_qc_fragment_ions_vs_error.png Fragment ions vs mass error hexbin
mars_qc_rfa2_temperature_vs_error.png RFA2 temperature vs error (if available)
mars_qc_rfc2_temperature_vs_error.png RFC2 temperature vs error (if available)
mars_qc_feature_importance.png Model feature importance
mars_qc_summary.txt Calibration statistics

Installation

git clone https://github.com/maccoss/mars.git
cd mars
pip install -e .

Or from PyPI:

pip install mars-ms==0.1.0

Requirements

  • Python 3.10+
  • Spectral library in blib format from Skyline
  • mzML files from Thermo Stellar (or similar unit resolution instrument)
  • PRISM CSV (optional): Skyline report with Start Time, End Time, Replicate Name columns

License

MIT