An autoencoder to perform Organic Aerosol Source apportionment from Aerosol Mass Spectrometry measurements using literature derived reference mass spectra
This repository contains a minimal Source-Based Autoencoder (SourceAE) pipeline for source apportionment using known (fixed) profiles. The model trains only the encoder while keeping the source profiles fixed.
The workflow:
- Load measurements (
Measurement) - Load known source profiles (
F_fixed) - Initialize encoder from the known profiles
- Train encoder only
- Export results and diagnostic plots via
Measurement
The reconstruction follows:
X_hat = relu(X @ W^T) @ F^T
where:
- F — fixed known profiles (not trained)
- W — encoder weights (trained)
- G = relu(X @ W^T) — source contributions
From the project root, run:
python train_sourceae_fixed.py \
--input "INPUT-PATH" \
--output /results \
--fixed_profiles "KNOWN PROFILES PATH" \
--fixed_labels Names of the sources ex(HOA BBOA "less oxidized" "more oxidized") \
--epochs 1000 \
--lr 1e-2| Argument | Description |
|---|---|
--input |
Path to measurement data (.xlsx or .csv) |
--output |
Output prefix/directory used by Measurement |
--fixed_profiles |
Excel file containing known source profiles |
| Argument | Default | Description |
|---|---|---|
--fixed_labels |
HOA CCOA BBOA | Names of profiles to load |
--epochs |
500 | Number of training epochs |
--lr |
1e-2 | Learning rate |
--random_fixed_profiles |
False | Randomly select profiles from library |
After training, the pipeline automatically generates:
- Estimated F (profiles)
- Estimated G (contributions)
- Additional diagnostics
- Source profiles
- Time series of contributions
- Scaled residuals
- Uncertainty visualization
- Ground truth comparison (if available)
All outputs are written using the Measurement utilities.
- Only fixed-profile SourceAE is implemented.
- No free-profile learning is performed.
- The encoder is initialized analytically from the known profiles.
- The model enforces non-negative contributions via ReLU.
Excel errors
Install:
pip install openpyxlNaN loss
Try:
- lowering learning rate (
--lr 1e-3) - reducing epochs
- checking input uncertainties