Skip to content

Karras et al. (2022) diffusion models for PyTorch

License

Notifications You must be signed in to change notification settings

rom1504/k-diffusion

 
 

Repository files navigation

k-diffusion

An implementation of Elucidating the Design Space of Diffusion-Based Generative Models (Karras et al., 2022) for PyTorch. The patching method in Improving Diffusion Model Efficiency Through Patching is implemented as well.

Training:

To train models:

$ ./train.py --config CONFIG_FILE --name RUN_NAME

For instance, to train a model on MNIST:

$ ./train.py --config configs/config_mnist.json --name RUN_NAME

The configuration file allows you to specify the dataset type. Currently supported types are "imagefolder" (finds all images in that folder and its subfolders, recursively), "cifar10" (CIFAR-10), and "mnist" (MNIST). "huggingface" Hugging Face Datasets is also supported.

Multi-GPU and multi-node training is supported with Hugging Face Accelerate. You can configure Accelerate by running:

$ accelerate config

on all nodes, then running:

$ accelerate launch train.py --config CONFIG_FILE --name RUN_NAME

on all nodes.

Enhancements/additional features:

  • k-diffusion supports an experimental model output type, an isotropic Gaussian, which seems to have a lower gradient noise scale and to train faster than Karras et al. (2022) diffusion models.

  • k-diffusion has wrappers for v-diffusion-pytorch, OpenAI diffusion, and CompVis diffusion models allowing them to be used with its samplers and ODE/SDE.

  • k-diffusion models support progressive growing.

  • k-diffusion implements DPM-Solver, which produces higher quality samples at the same number of function evalutions as Karras Algorithm 2, as well as supporting adaptive step size control. It also implements a linear multistep sampler (comparable to PLMS).

  • k-diffusion supports CLIP guided sampling from unconditional diffusion models (see sample_clip_guided.py).

  • k-diffusion supports log likelihood calculation (not a variational lower bound) for native models and all wrapped models.

  • k-diffusion can calculate, during training, the FID and KID vs the training set.

  • k-diffusion can calculate, during training, the gradient noise scale (1 / SNR), from An Empirical Model of Large-Batch Training, https://arxiv.org/abs/1812.06162).

To do:

  • Anything except unconditional image diffusion models

  • Latent diffusion

About

Karras et al. (2022) diffusion models for PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%