Skip to content

janosh/torch-mnf

Repository files navigation

Torch MNF

Tests pre-commit.ci status GitHub Repo Size

PyTorch implementation of Multiplicative Normalizing Flows [1].

With flow implementations courtesy of Andrej Karpathy.

Files of Interest

New here? Check out the example notebooks:

Interested in the implementation? See

MNF Results

MNIST

Rotating an MNIST 9 by 180° in steps of 20°, the MNF LeNet (left) does not produce overconfident predictions on out-of-sample data unlike the regular LeNet (right), indicating it captures its own uncertainty well. The violin distributions in the top plot were generated by the MNF LeNet predicting each image 500 times. The predictions run in parallel so this is fast. Both models trained for 3 epochs on MNIST with Adam. The MNF model has 696,950 trainable parameters, the regular LeNet 258,582.

MNF Lenet Regular LeNet
RNVP Point Flow RNVP x to 2 and z to x

Flow Results

Real Non-Volume Preserving Flows

Flow: [RNVP, RNVP, RNVP, RNVP, RNVP, RNVP, RNVP, RNVP, RNVP]

Final loss: 0.47

Trained for 1400 steps with Adam (lr=1e-4, wd=1e-5) Parameters: 22,914
RNVP Point Flow RNVP x to 2 and z to x

Masked Autoregressive Flow

Flow: [MAF, MAF, MAF, MAF, MAF, MAF, MAF, MAF, MAF]

Final loss: 36.21

Trained for 1400 steps with Adam (lr=1e-4, wd=1e-5) Parameters: 12,348
MAF Point Flow MAF x to 2 and z to x

Neural Spline Flow Autoregressive Layer

Flow: [ActNormFlow, Glow, NSF_AR, ActNormFlow, Glow, NSF_AR, ActNormFlow, Glow, NSF_AR]

Final loss: 19.13

Trained for 1400 steps with Adam (lr=1e-4, wd=1e-5) Parameters: 3,012
NSF-AR Point Flow NSF-AR x to 2 and z to x

Neural Spline Flow Coupling Layer

Flow: [ActNormFlow, Glow, NSF_CL, ActNormFlow, Glow, NSF_CL, ActNormFlow, Glow, NSF_CL]

Final loss: 6.06

Trained for 1400 steps with Adam (lr=1e-4, wd=1e-5) Parameters: 5,844
NSF-CL Point Flow NSF-CL x to 2 and z to x

References

  1. MNF: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks | Christos Louizos, Max Welling (Mar 2017) | 1703.01961

  2. VI-NF: Variational Inference with Normalizing Flows | Danilo Rezende, Shakir Mohamed (May 2015) | 1505.05770

  3. MADE: Masked Autoencoder for Distribution Estimation | Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle (Jun 2015) | 1502.03509

  4. NICE: Non-linear Independent Components Estimation | Laurent Dinh, David Krueger, Yoshua Bengio (Oct 2014) | 1410.8516

  5. RNVP: Density estimation using Real NVP | Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio (May 2016) | 1605.08803

  6. MAF: Masked Autoregressive Flow for Density Estimation | George Papamakarios, Theo Pavlakou, Iain Murray (Jun 2018) | 1705.07057

  7. IAF: Improving Variational Inference with Inverse Autoregressive Flow | Diederik Kingma et al. (Jun 2016) | 1606.04934

  8. NSF: Neural Spline Flows | Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios (Jun 2019) | 1906.04032

Debugging Tips

A great method of checking for infinite or NaN gradients is

for name, param in model.named_parameters():
    print(name, torch.isfinite(param.grad).all())
    print(name, torch.isnan(param.grad).any())

There's also torch.autograd.detect_anomaly() used as context manager:

with torch.autograd.detect_anomaly():
    x = torch.rand(10, 10, requires_grad=True)
    out = model(x)
    out.backward()

and torch.autograd.set_detect_anomaly(True). See here for an issue that used these tools.

Requirements

requirements.txt created with pipreqs .. Find new dependencies manually with pipreqs --diff requirements.txt.