Minimal PyTorch implementations of:
with a lightweight DiT-based denoiser.
- Minimal and readable implementations
- Flow Matching training and sampling
- Equation-aligned code
- Pure PyTorch implementation
- Educational focus
nanoFM/
├── fm.py
└── model.py
Detailed explanations and mathematical derivations are available in the accompanying blog post:
The blog covers:
- Motivation and core terminology
- Velocity vs score
- Flow models and the goal of learning a velocity field
- Why direct likelihood-based training is expensive
- Flow matching, including conditional and marginal formulations
- Training and Euler-based inference
- Limitations of direct flow matching and ReFlow
- Implementation notes
Download the dataset (Hugging Face) into butterflies/:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="riteshrm/butterflies", repo_type="dataset", local_dir="butterflies"
)The training scripts use torchvision.datasets.ImageFolder, so butterflies/ should be laid out like:
butterflies/
class_0/
class_1/
...
Train and Sample:
python fm.pyNote: the scripts currently assume NUM_CLASSES = 5 and assert it matches the number of folders found under butterflies/.
During training, the script periodically:
- Saves sample grids as
sample_epoch_*.png - Saves checkpoints as
dit_conditional_epoch_*.pth
Most hyperparameters (image size, model size, batch size, number of steps, etc.) are defined at the top of each script.
The DiT backbone implementation in model.py is adapted from:
The DDPM and DDIM implementations were written from scratch with a focus on minimalism and readability.