phygen packages a collection of reference PDE solvers and exports their
trajectories to a consistent time-first HDF5 layout (T, *spatial_dims, C).
Configurations define which equation to solve, how to sample parameters, and
how the resulting data should be organised into train/validation/test splits.
- Config-driven solvers — pick from the bundled adapters (advection, Gray-Scott, turbulence, wave2d) or add your own via the Hydra config system.
- Rich hyper-parameters — control spatial resolution (
spatial_size,downsample), temporal extent (time_steps,time_horizon,dt_high,dt_low), and solver batch sizes directly from the equation profile. - Flexible parameter sampling — declare sweeps with
linspace,range,choices,values, or randomuniformdraws (with optional seeds) to explore a grid of physical regimes. - Multiple dataset splits — generate train/val/test (or arbitrary) splits in one run, each with its own number of trajectories, base seed, and metadata. Selected splits can feed the automatic statistics accumulator.
- Self-contained outputs — every run creates
data/<split>/*.hdf5files plus a matchingstats.yaml, ready for loading withphyloador custom pipelines.
Clone the repository and install in editable mode:
git clone https://github.com/itsakk/PHYGEN.git
cd PHYGEN
pip install -e .The only runtime dependencies are numpy, torch, torchdiffeq, h5py,
PyYAML, tqdm, and hydra-core (see pyproject.toml).
All user settings live in configs/ and are managed by Hydra.
-
Root config (
configs/main.yaml)- Chooses the default equation profile via the
defaultslist. - Sets global options such as the compute
device, the targetoutput_root, and the runtime split to generate (runtime.split).
- Chooses the default equation profile via the
-
Equation profiles (
configs/equation/*.yaml)- Describe solver-specific options inside
generator.options(spatial resolution, time-stepping parameters, batch size, etc.). - Define the output block (
output.root,output.dataset_name,output.compression,output.dtype). - Provide per-split settings, including
num_trajectories,base_seed, and the parameter sweep declaration.
- Describe solver-specific options inside
Each split can specify a parameter_grid powered by
phygen.config.expand_parameter_grid. Supported axis specifications:
linspace: [start, stop, num]range: [start, stop, step]choices: [v1, v2, ...]values: [v1, v2, ...]uniform: {low: a, high: b, num: n, seed: optional}(or the list form[low, high, num, seed])
Axes are combined in a Cartesian product, yielding one HDF5 file per parameter
combination (with num_trajectories samples each). You can mix parameter_grid
with an explicit parameters list for bespoke cases.
With the configuration in place, launch generation using either the module entrypoint or the helper script:
# Full control via Hydra overrides (runs in-place)
python -m phygen.main equation=turbulence runtime.split=train output_root=/tmp/phygen
# Convenience wrapper (writes to ./outputs by default)
./scripts/run_phygen.sh equation=turbulence runtime.split=trainOmit runtime.split to generate every split defined in the config. Outputs are
written under <output_root>/<dataset_name>/data/{train,valid,test} with the
matching stats.yaml at the dataset root.
- New solvers live under
src/phygen/solvers/; subclassBaseSolverAdapterand register the adapter insrc/phygen/registry.py. - Reuse
ParameterSetandSplitConfigutility classes when wiring custom configurations. - Use
scripts/run_phygen.shas the base for environment-specific launchers (cluster job scripts, cloud runners, etc.).
Happy generating!