Releases: maccoss/mars
MARS v0.1.4
Mars v0.1.4 Release Notes
Release Date: TBD
Overview
This release includes bug fixes and improvements.
New Features
--mzMLflag alias: The--mzmloption now also accepts--mzML(matching the file extension casing) across all commands (calibrate,qc,apply).- Unquoted wildcards: Shell-expanded wildcards now work without quotes. For example,
mars calibrate --mzml *.mzMLworks the same asmars calibrate --mzml "*.mzML". - Positional file arguments: mzML files can now be passed as positional arguments without the
--mzmlflag, e.g.,mars calibrate *.mzML --prism-csv report.csv. - Repeatable
--mzml: The--mzmloption can be specified multiple times to pass individual files, e.g.,--mzml a.mzML --mzml b.mzML.
Bug Fixes
(No bug fixes this version)
Changes
- The
--mzmloption now usesmultiple=Trueinternally, accepting one or more values. - All three subcommands (
calibrate,qc,apply) accept a trailing[INPUT_FILES]...positional argument for mzML file paths.
Compatibility
- Fully backward compatible with v0.1.3
- Supported spectral library formats:
- blib (BiblioSpec)
- PRISM CSV
- DIA-NN parquet (
report-lib.parquet+report.parquet)
- Output mzML files are compatible with:
- DIA-NN
- SeeMS (ProteoWizard)
- MSConvert
- Skyline
- Other standard mzML readers
Upgrade Notes
pip install --upgrade mars-msMARS v0.1.3
Mars v0.1.3 Release Notes
Release Date: January 2026
Overview
This release adds support for DIA-NN parquet library files and fixes critical compatibility issues with mzML output files. The mzML writer has been completely rewritten to use a passthrough approach that preserves all original file metadata, ensuring compatibility with downstream tools like DIA-NN, SeeMS, and MSConvert.
New Features
DIA-NN Parquet Library Support
Mars now supports loading spectral libraries directly from DIA-NN parquet output files as an alternative to blib or PRISM CSV formats.
Usage:
mars calibrate --mzml input.mzML --library report-lib.parquet --output calibrated.mzMLHow it works:
- Library file (
report-lib.parquet): Contains fragment ion information (m/z, ion types, charges) used for matching - Report file (
report.parquet): Contains per-file retention time windows (RT.Start, RT.Stop) for each precursor - The report file is automatically detected in the same directory as the library file
- If
report.parquetis not found, Mars will exit with an error
Optional filtering:
Use the --diann-report option to specify a different report file location:
mars calibrate --mzml input.mzML --library report-lib.parquet --diann-report /path/to/report.parquetFile type detection:
Mars automatically detects if you accidentally provide the wrong file type (e.g., report.parquet instead of report-lib.parquet) based on column content, not filename. You'll receive a helpful error message pointing to the correct file.
Bug Fixes
Fixed: DIA-NN Compatibility
The previous psims-based writer generated mzML files that DIA-NN could not read. The issue was caused by differences in:
- CV reference IDs: psims uses
cvRef="PSI-MS"while ProteoWizard usescvRef="MS" - Missing metadata: Thermo nativeID format, instrument configuration, and other CV terms were not preserved
- Altered file structure: The psims writer generated a different XML structure than the original
The new passthrough writer preserves the original file structure byte-for-byte, only modifying the m/z binary data for MS2 spectra.
Fixed: SeeMS Metadata Display
SeeMS now correctly displays spectrum metadata in separate columns (Controllertype, Controllernumber, Scan) instead of a combined ID field. This is because the original Thermo nativeID format CV term is now preserved.
Fixed: Broken mzML Output Files
The original lxml-based writer caused several issues:
- Invalid index offsets: The
<indexList>section contained stale byte offsets after XML rewriting - XML formatting changes: Attribute reordering and whitespace changes could break strict parsers
Changes
New Passthrough mzML Writer
The new writer uses a fundamentally different approach:
- Preserves original file exactly - All metadata, CV terms, XML structure, and formatting are kept unchanged
- Only modifies m/z binary data - For MS2 spectra, the m/z array is decoded, calibrated, and re-encoded
- Regenerates index - Byte offsets are recalculated after m/z data changes
- Regenerates checksum - The SHA-1 file checksum is updated
This ensures maximum compatibility with all downstream tools.
Wide-Window MS2 Spectra Handling
- When
--max-isolation-windowis specified, MS2 spectra exceeding that width are left unchanged (not calibrated) - The spectra remain in the output file but with original m/z values
- This differs from the previous behavior where wide-window spectra were excluded entirely
Dependencies
- Added
lxmlfor XML parsing and serialization psimsis no longer used for mzML writing (still available as a dependency)
Technical Details
The new writer workflow:
- Reads the original mzML file as raw bytes
- Parses the mzML content with lxml while preserving structure
- Reads spectrum metadata with pyteomics for calibration calculations
- For each MS2 spectrum:
- Decodes the m/z binary array (base64 + zlib)
- Applies calibration function
- Re-encodes with same compression settings
- Updates the
encodedLengthattribute
- MS1 spectra and chromatograms remain completely unchanged
- Regenerates the index with correct byte offsets
- Recalculates the SHA-1 file checksum
Preserved Metadata
The following are now correctly preserved from the original file:
- Thermo nativeID format CV term
- Thermo RAW format CV term
- Source file references (RAW file path, SHA-1)
- Instrument configuration (Stellar, serial number)
- Sample information
- All spectrum CV parameters (base peak, TIC, filter string, etc.)
- All chromatograms (TIC, pump pressure, etc.)
- XML namespaces and schema locations
Compatibility
- Fully backward compatible with v0.1.2
- Supported spectral library formats:
- blib (BiblioSpec)
- PRISM CSV
- DIA-NN parquet (
report-lib.parquet+report.parquet) NEW
- Output mzML files are compatible with:
- DIA-NN
- SeeMS (ProteoWizard)
- MSConvert
- Skyline
- Other standard mzML readers
- The calibration model format is unchanged
Upgrade Notes
No action required. Simply update to v0.1.3 and re-run calibration to generate valid mzML files.
pip install --upgrade mars-msMARS v0.1.2
Mars v0.1.2 Release Notes
Release Date: January 2026
Overview
This release adds support for high-resolution Orbitrap/Astral analyzer data with PPM-based matching and visualization, plus major performance improvements for large PRISM libraries. Mars can now handle both Stellar Ion Trap data (Th-scale errors) and Astral analyzer data (ppm-scale errors) with automatic detection.
New Features
PPM Tolerance Support
- New
--tolerance-ppmCLI option for fragment matching in ppm (e.g.,--tolerance-ppm 10for ±10 ppm) - When specified, overrides the default
--tolerance(Th) parameter - PPM tolerance scales dynamically with m/z, appropriate for high-resolution Orbitrap data
Delta PPM Metrics
- All match DataFrames now include both
delta_mz(Th) anddelta_ppmcolumns - After calibration,
delta_ppm_calibratedis computed alongsidedelta_mz_calibrated - Logging shows statistics in both units for easier comparison
Adaptive QC Visualization
- Auto-detection of ppm vs Th mode based on MAD (Median Absolute Deviation):
- If MAD < 0.05 Th → ppm mode (high-resolution data)
- If MAD ≥ 0.05 Th → Th mode (unit-resolution data)
- All hexbin QC plots updated with
use_ppmparameter:- Histogram
- Heatmap (RT × fragment m/z)
- Intensity vs error
- RT vs error
- Fragment m/z vs error
- TIC vs error
- Injection time vs error
- TIC×Injection time vs error
- Fragment ions vs error
- Temperature vs error
- Adjacent ion feature plots
- Y-axis limits automatically adjust:
- ppm mode: ±25 ppm
- Th mode: ±0.25 Th
Performance Improvements
Automatic Replicate Filtering
- PRISM library loading now automatically filters to only the replicates matching the mzML files being processed
- Previously, large multi-replicate PRISM exports (e.g., 67M rows) would load entirely; now only relevant rows are processed
- This dramatically reduces load time and memory usage for large studies
Optimized PRISM Library Loading
- Column-selective loading: Only loads required columns, reducing I/O and memory
- Vectorized replicate filtering: Uses pandas string methods instead of row-by-row apply()
- Vectorized fragment parsing: Ion type, number, and loss type parsed in bulk
- Faster iteration: Uses
itertuples()instead ofiterrows()(5-10x faster) - Progress logging: Reports progress every 50,000 peptides for large libraries
Faster mzML Calibration
- Vectorized space charge feature computation: The
_compute_ions_in_range_vectorizedfunction now uses fully vectorized NumPy operations instead of a Python for-loop - Uses
np.searchsortedwith array inputs to compute all intensity range sums simultaneously - Significantly improves calibration speed when writing large mzML files with many spectra
Dependabot Integration
- Added
.github/dependabot.ymlfor automated dependency updates - Monitors both Python (pip) and GitHub Actions dependencies weekly
Bug Fixes
- Fixed auto-detection logic to use proper MAD (Median Absolute Deviation) instead of median(|delta_mz|) for determining visualization mode
- Fixed
DtypeWarningwhen loading large PRISM CSVs with mixed column types - Hexbin plots now use linear color scale (except injection time vs error which uses log scale)
Usage Examples
Stellar Ion Trap (Th-based, default)
mars calibrate \
--mzml data.mzML \
--prism-csv report.csv \
--tolerance 0.3 \
--output-dir output/Astral Analyzer (ppm-based)
mars calibrate \
--mzml data.mzML \
--prism-csv report.csv \
--tolerance-ppm 10 \
--output-dir output/Large Multi-Replicate Studies
# Mars automatically filters the PRISM library to only the 3 files being processed
mars calibrate \
--mzml "plasma_samples/*.mzML" \
--prism-csv full_study_prism_export.csv \
--tolerance-ppm 10 \
--output-dir output/Technical Notes
- The model still trains on
delta_mz(Th) internally, as the XGBoost model works in absolute units - PPM conversion is applied at the matching and visualization stages
- Temperature-based features remain relevant for Stellar data but are typically not present in Astral mzML files
Compatibility
- Fully backward compatible with v0.1.1
- Existing workflows using
--tolerance(Th) continue to work unchanged - New Astral/Orbitrap workflows can use
--tolerance-ppm
Space Charge Modeling Improvements
Ions Below Features
Added 6 new features to capture space charge effects from ions below the fragment m/z:
ions_below_0_1- Total ions in (X-1.5, X-0.5] Th range below fragment m/zions_below_1_2- Total ions in (X-2.5, X-1.5] Th range below fragment m/zions_below_2_3- Total ions in (X-3.5, X-2.5] Th range below fragment m/zadjacent_ratio_below_0_1- ions_below_0_1 / fragment_ionsadjacent_ratio_below_1_2- ions_below_1_2 / fragment_ionsadjacent_ratio_below_2_3- ions_below_2_3 / fragment_ions
These complement the existing "ions above" features and provide a more complete picture of the local ion environment affecting each fragment.
New QC Plots
Added 6 new QC visualization plots for the ions below features:
mars_qc_ions_below_-1.5_-0.5_vs_error.pngmars_qc_ions_below_-2.5_-1.5_vs_error.pngmars_qc_ions_below_-3.5_-2.5_vs_error.pngmars_qc_adjacent_ratio_-1.5_-0.5_vs_error.pngmars_qc_adjacent_ratio_-2.5_-1.5_vs_error.pngmars_qc_adjacent_ratio_-3.5_-2.5_vs_error.png
Isotope-Centered m/z Bins
The space charge feature bins have been shifted by 0.5 Th to better center on isotope patterns for 1+ charge fragments:
| Old Range | New Range | Description |
|---|---|---|
| (X, X+1] | (X+0.5, X+1.5] | First isotope region above |
| (X+1, X+2] | (X+1.5, X+2.5] | Second isotope region above |
| (X+2, X+3] | (X+2.5, X+3.5] | Third isotope region above |
The bins now reference the monoisotopic fragment m/z (expected m/z from library) rather than the observed m/z for more consistent feature calculation.
Space Charge Features Applied During Calibration
Previously, space charge features (ions_above_*, adjacent_ratio_*) were only computed during model training but set to zero during mzML calibration. Now all 12 space charge features are computed for every peak when writing calibrated mzML files, ensuring the model's learned corrections are fully applied.
Updated Feature Display Names
Feature importance plots and QC plot filenames now use the new bin naming convention:
ions_above_0.5_1.5(wasions_above_0_1)ions_above_1.5_2.5(wasions_above_1_2)ions_above_2.5_3.5(wasions_above_2_3)adjacent_ratio_0.5_1.5(wasadjacent_ratio_0_1)- etc.
Model Features
The model now supports up to 22 features (when all data is available):
precursor_mz- DIA isolation window centerfragment_mz- Fragment ion m/zlog_tic- Log10 of total ion currentlog_intensity- Log10 of observed peak intensityabsolute_time- Unix timestamp of acquisitioninjection_time- Ion injection time (seconds)tic_injection_time- TIC x injection time productfragment_ions- Fragment intensity x injection timeions_above_0_1,ions_above_1_2,ions_above_2_3- Ions in ranges aboveions_below_0_1,ions_below_1_2,ions_below_2_3- Ions in ranges belowadjacent_ratio_0_1,adjacent_ratio_1_2,adjacent_ratio_2_3- Ratios aboveadjacent_ratio_below_0_1,adjacent_ratio_below_1_2,adjacent_ratio_below_2_3- Ratios belowrfa2_temp- RF amplifier temperaturerfc2_temp- RF electronics temperature
MARS v0.1.1
Mars v0.1.1 Release Notes
Overview
Patch release v0.1.1 focuses on code quality improvements, fixing reported linting issues, and minor bug fixes in the command-line interface.
Changes
CLI Improvements
- Added missing
--max-isolation-windowparameter to themars qccommand, allowing consistent filtering between calibration and QC steps.
Code Quality
- Addressed multiple linting issues identified by
ruff. - Updated obsolete string formatting to use modern f-strings in
mzml.py. - Fixed potential logic errors by ensuring strict iterables in
zip()calls. - Removed unused variables and unnecessary function arguments.
- Improved code readability and standard compliance.
MARS v0.1.0
Mars v0.1.0 Release Notes
Overview
Initial release of Mars (Mass Accuracy Recalibration System), a mass calibration tool for Thermo Stellar unit resolution DIA mass spectrometry data. Mars learns m/z calibration corrections from spectral library fragment matches using an XGBoost model.
Features
Mars uses a machine learning approach to predict m/z corrections based on:
- Fragment m/z: Mass-dependent calibration bias
- Peak intensity: Higher intensity peaks provide more reliable calibration
- Absolute time: Calibration drift over the acquisition run
- Spectrum TIC: Space charge effects from high ion current
- Ion injection time: Signal accumulation duration effects
- Precursor m/z: DIA isolation window-specific effects
- Adjacent ion population: Ion density in neighboring m/z ranges (0-1, 1-2, 2-3 Th above)
- Adjacent ion ratios: Relative ion density (adjacent ions / fragment ions)
- RF temperatures: Thermal effects from RF amplifier (RFA2) and electronics (RFC2)
Fragment Matching
- Matches library peptides to DIA MS2 spectra using precursor m/z and RT windows
- Selects the most intense peak within m/z tolerance (not closest)
- Configurable minimum intensity threshold
PRISM Integration
- Optional
--prism-csvflag for using exact Skyline RT windows (Start Time,End Time) - Falls back to ±5 seconds around library RT when PRISM CSV not provided
Batch Processing
- Process multiple mzML files with glob patterns (
--mzml "*.mzML") - Process entire directories with
--mzml-dir
QC Reports
Generated quality control outputs include:
- Delta m/z distribution histogram with MAD and RMS statistics (before/after calibration)
- 2D heatmap visualization (RT × m/z, color = delta)
- Hexbin density plots (intensity, RT, m/z, injection time, TIC, fragment ions vs mass error)
- Model feature importance plot
- Calibration statistics summary
Output Files
| File | Description |
|---|---|
{input}-mars.mzML |
Recalibrated mzML file |
mars_model.pkl |
Trained XGBoost calibration model |
mars_qc_histogram.png |
Delta m/z distribution (before/after) |
mars_qc_heatmap.png |
2D heatmap (RT × m/z, color = delta) |
mars_qc_intensity_vs_error.png |
Intensity vs mass error hexbin |
mars_qc_rt_vs_error.png |
RT vs mass error hexbin |
mars_qc_mz_vs_error.png |
Fragment m/z vs mass error hexbin |
mars_qc_tic_vs_error.png |
TIC vs mass error hexbin |
mars_qc_injection_time_vs_error.png |
Injection time vs mass error hexbin |
mars_qc_tic_injection_time_vs_error.png |
TIC×injection time vs mass error hexbin |
mars_qc_fragment_ions_vs_error.png |
Fragment ions vs mass error hexbin |
mars_qc_rfa2_temperature_vs_error.png |
RFA2 temperature vs error (if available) |
mars_qc_rfc2_temperature_vs_error.png |
RFC2 temperature vs error (if available) |
mars_qc_feature_importance.png |
Model feature importance |
mars_qc_summary.txt |
Calibration statistics |
Installation
git clone https://github.com/maccoss/mars.git
cd mars
pip install -e .Or from PyPI:
pip install mars-ms==0.1.0Requirements
- Python 3.10+
- Spectral library in blib format from Skyline
- mzML files from Thermo Stellar (or similar unit resolution instrument)
- PRISM CSV (optional): Skyline report with
Start Time,End Time,Replicate Namecolumns
License
MIT