Machine learning pipeline for astrophysical transient characterization and light curve generation.
ASTRAI provides two tasks:
- Characterization: predict physical parameters (mass, radius, energy, ...) from bolometric light curves.
- Generation: reconstruct bolometric light curves from physical parameters.
PNRR Project - Developed as part of the National Recovery and Resilience Plan at Koexai S.r.l.
pip install -r requirements.txtFor GPU support, install PyTorch with CUDA following the official instructions.
Run preprocessing, characterizer, and generator training in sequence:
python main.py --config configs/default_split.yamlThis is equivalent to running the three stages separately:
python preprocess.py --config configs/default_split.yaml --out preprocessed
python train_characterizer.py --config configs/default_split.yaml --prep preprocessed
python train_generator.py --config configs/default_split.yaml --prep preprocessedSingle model with both branches trained jointly:
python train.py --config configs/default.yaml# Split model
python inference_split.py \
--exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
--exp_gen experiments/generator/YYYYMMDD_HHMMSS
# Unified model
python inference.py --exp experiments/YYYYMMDD_HHMMSSSave predictions to file:
python inference_split.py \
--exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
--exp_gen experiments/generator/YYYYMMDD_HHMMSS \
--output predictions.parquetFits scalers and PCA once on the full dataset, then creates K-Fold splits with LSST-augmented training data.
python preprocess.py --config configs/default_split.yaml --out preprocessedOutput structure:
preprocessed/
x_scaler.pkl, y_scaler.pkl, pca.pkl # Global artifacts
x_raw.npy, y_raw.npy # Raw data (log-transformed params)
fold_1/
x_train_clean_pca.npy # Clean training curves (PCA space)
x_train_aug_pca.npy # LSST-augmented training curves
x_test_pca.npy # Test curves (PCA space)
x_test_clean.npy # Test curves (original space)
y_train_scaled.npy, y_test_scaled.npy # Scaled parameters
y_test.npy # Original test parameters
train_idx.npy, test_idx.npy # Fold indices
fold_2/
...
Trains a SplitMLPRegressor (one independent MLP per physical parameter) on PCA-compressed curves.
python train_characterizer.py --config configs/default_split.yaml --prep preprocessedTrains a MLPWithResiduals to reconstruct PCA-compressed curves from physical parameters.
python train_generator.py --config configs/default_split.yaml --prep preprocessedSingle supernova (SN2018hna):
python infer_sn2018hna.py \
--exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
--exp_gen experiments/generator/YYYYMMDD_HHMMSS \
--csv SN2018hna.csv \
--output sn2018hna_inference.pdfBatch inference on all supernovae in the bol/ directory:
python infer_bol_batch.py \
--exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
--exp_gen experiments/generator/YYYYMMDD_HHMMSS \
--bol_dir bol \
--output_dir plots/batchPer-timestep reconstruction error and best-sample overlay:
python plot_results.py \
--exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
--exp_gen experiments/generator/YYYYMMDD_HHMMSS \
--fold 1 \
--output_dir plots/3-panel reconstruction view for the unified model (original vs augmented vs reconstructed):
python visualize_reconstruction.py \
--exp experiments/YYYYMMDD_HHMMSS \
--top 5Options: --index N for a specific sample, --top N for the N best by characterization RMSE.
All hyperparameters are set via YAML config files in configs/.
| Section | Key Parameters |
|---|---|
data |
format, n_days, n_params, param_names, samples_per_day |
preprocessing |
pca_components (32), n_splits (K-Fold), random_seed |
augmentation |
noise_std (0.05) |
characterizer |
model (width, depth, dropout), training (batch_size, epochs, lr) |
generator |
model (width, depth, dropout), training (batch_size, epochs, lr) |
| Section | Key Parameters |
|---|---|
data |
Same as above |
model |
pca_components, width, depth, dropout |
training |
batch_size, epochs, learning_rate, n_splits |
loss |
alpha_char, alpha_gen (loss weights) |
- parquet: single file with curve columns (
"0","1", ...,"n_days-1") and parameter columns. - npy_csv: separate
.npyfor curves and.csvfor parameters.
Set the format in the config under data.format.
Each training run creates a timestamped directory under experiments/:
experiments/
characterizer/YYYYMMDD_HHMMSS/
best_characterizer.pth # Model weights (best fold by R2)
best_char_x_scaler.pkl # Feature scaler
best_char_y_scaler.pkl # Target scaler
best_char_pca.pkl # PCA transformer
code.zip # Source code snapshot
default_split.yaml # Config used
generator/YYYYMMDD_HHMMSS/
...
Evaluation reports R2, RMSE, RRMSE, and MAE:
- Characterization: per-parameter metrics averaged across physical quantities, with bootstrap confidence intervals.
- Generation: flattened across all time-steps and samples.
For questions or support, contact the development team at Koexai S.r.l.