ASTRAI

Machine learning pipeline for astrophysical transient characterization and light curve generation.

ASTRAI provides two tasks:

Characterization: predict physical parameters (mass, radius, energy, ...) from bolometric light curves.
Generation: reconstruct bolometric light curves from physical parameters.

PNRR Project - Developed as part of the National Recovery and Resilience Plan at Koexai S.r.l.

Installation

pip install -r requirements.txt

For GPU support, install PyTorch with CUDA following the official instructions.

Quick Start

Train (split pipeline, recommended)

Run preprocessing, characterizer, and generator training in sequence:

python main.py --config configs/default_split.yaml

This is equivalent to running the three stages separately:

python preprocess.py --config configs/default_split.yaml --out preprocessed
python train_characterizer.py --config configs/default_split.yaml --prep preprocessed
python train_generator.py --config configs/default_split.yaml --prep preprocessed

Train (unified model)

Single model with both branches trained jointly:

python train.py --config configs/default.yaml

Inference

# Split model
python inference_split.py \
    --exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
    --exp_gen  experiments/generator/YYYYMMDD_HHMMSS

# Unified model
python inference.py --exp experiments/YYYYMMDD_HHMMSS

Save predictions to file:

python inference_split.py \
    --exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
    --exp_gen  experiments/generator/YYYYMMDD_HHMMSS \
    --output predictions.parquet

Pipeline Details

Preprocessing (`preprocess.py`)

Fits scalers and PCA once on the full dataset, then creates K-Fold splits with LSST-augmented training data.

python preprocess.py --config configs/default_split.yaml --out preprocessed

Output structure:

preprocessed/
  x_scaler.pkl, y_scaler.pkl, pca.pkl    # Global artifacts
  x_raw.npy, y_raw.npy                   # Raw data (log-transformed params)
  fold_1/
    x_train_clean_pca.npy                 # Clean training curves (PCA space)
    x_train_aug_pca.npy                   # LSST-augmented training curves
    x_test_pca.npy                        # Test curves (PCA space)
    x_test_clean.npy                      # Test curves (original space)
    y_train_scaled.npy, y_test_scaled.npy # Scaled parameters
    y_test.npy                            # Original test parameters
    train_idx.npy, test_idx.npy           # Fold indices
  fold_2/
    ...

Characterizer Training (`train_characterizer.py`)

Trains a SplitMLPRegressor (one independent MLP per physical parameter) on PCA-compressed curves.

python train_characterizer.py --config configs/default_split.yaml --prep preprocessed

Generator Training (`train_generator.py`)

Trains a MLPWithResiduals to reconstruct PCA-compressed curves from physical parameters.

python train_generator.py --config configs/default_split.yaml --prep preprocessed

Inference on Real Supernovae

Single supernova (SN2018hna):

python infer_sn2018hna.py \
    --exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
    --exp_gen  experiments/generator/YYYYMMDD_HHMMSS \
    --csv SN2018hna.csv \
    --output sn2018hna_inference.pdf

Batch inference on all supernovae in the bol/ directory:

python infer_bol_batch.py \
    --exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
    --exp_gen  experiments/generator/YYYYMMDD_HHMMSS \
    --bol_dir bol \
    --output_dir plots/batch

Visualization

Per-timestep reconstruction error and best-sample overlay:

python plot_results.py \
    --exp_char experiments/characterizer/YYYYMMDD_HHMMSS \
    --exp_gen  experiments/generator/YYYYMMDD_HHMMSS \
    --fold 1 \
    --output_dir plots/

3-panel reconstruction view for the unified model (original vs augmented vs reconstructed):

python visualize_reconstruction.py \
    --exp experiments/YYYYMMDD_HHMMSS \
    --top 5

Options: --index N for a specific sample, --top N for the N best by characterization RMSE.

Configuration

All hyperparameters are set via YAML config files in configs/.

`configs/default_split.yaml` (split pipeline)

Section	Key Parameters
`data`	`format`, `n_days`, `n_params`, `param_names`, `samples_per_day`
`preprocessing`	`pca_components` (32), `n_splits` (K-Fold), `random_seed`
`augmentation`	`noise_std` (0.05)
`characterizer`	`model` (width, depth, dropout), `training` (batch_size, epochs, lr)
`generator`	`model` (width, depth, dropout), `training` (batch_size, epochs, lr)

`configs/default.yaml` (unified model)

Section	Key Parameters
`data`	Same as above
`model`	`pca_components`, `width`, `depth`, `dropout`
`training`	`batch_size`, `epochs`, `learning_rate`, `n_splits`
`loss`	`alpha_char`, `alpha_gen` (loss weights)

Data Formats

parquet: single file with curve columns ("0", "1", ..., "n_days-1") and parameter columns.
npy_csv: separate .npy for curves and .csv for parameters.

Set the format in the config under data.format.

Experiment Tracking

Each training run creates a timestamped directory under experiments/:

experiments/
  characterizer/YYYYMMDD_HHMMSS/
    best_characterizer.pth       # Model weights (best fold by R2)
    best_char_x_scaler.pkl       # Feature scaler
    best_char_y_scaler.pkl       # Target scaler
    best_char_pca.pkl            # PCA transformer
    code.zip                     # Source code snapshot
    default_split.yaml           # Config used
  generator/YYYYMMDD_HHMMSS/
    ...

Metrics

Evaluation reports R2, RMSE, RRMSE, and MAE:

Characterization: per-parameter metrics averaged across physical quantities, with bootstrap confidence intervals.
Generation: flattened across all time-steps and samples.

Support

For questions or support, contact the development team at Koexai S.r.l.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
data		data
models		models
preprocessed		preprocessed
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASTRAI

Installation

Quick Start

Train (split pipeline, recommended)

Train (unified model)

Inference

Pipeline Details

Preprocessing (`preprocess.py`)

Characterizer Training (`train_characterizer.py`)

Generator Training (`train_generator.py`)

Inference on Real Supernovae

Visualization

Configuration

`configs/default_split.yaml` (split pipeline)

`configs/default.yaml` (unified model)

Data Formats

Experiment Tracking

Metrics

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASTRAI

Installation

Quick Start

Train (split pipeline, recommended)

Train (unified model)

Inference

Pipeline Details

Preprocessing (preprocess.py)

Characterizer Training (train_characterizer.py)

Generator Training (train_generator.py)

Inference on Real Supernovae

Visualization

Configuration

configs/default_split.yaml (split pipeline)

configs/default.yaml (unified model)

Data Formats

Experiment Tracking

Metrics

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Preprocessing (`preprocess.py`)

Characterizer Training (`train_characterizer.py`)

Generator Training (`train_generator.py`)

`configs/default_split.yaml` (split pipeline)

`configs/default.yaml` (unified model)

Packages