# Anomaly detection using RealNVP normalising flows

Normalising flows define a bijective mapping between two distributions. We want to detect anomalous situations through images, but determining outliers and estimating probabilities in the complex, multimodal distribution of image space is challenging. We intend to train normalising flows to map the complex distribution in image space to a simpler distribution like a Gaussian distribution, where we can easily determine outliers.

While the original work on RealNVP proposes a convolutional variant of the normalising flow that directly accepts images as input, we instead will use a variant of RealNVP that operates on 1-D tensors encoded from the image by a ResNet encoder. This allows us to 1) reduce the dimensionality of the latent space and keep more salient information, and 2) capitalise on the ResNet encoder's powerful mid-level features learned from pre-training on a much more extensive dataset than ours (i.e. ImageNet).

In [18]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms

import numpy as np
from tqdm import tqdm
from matplotlib import pyplot as plt

from models.realnvp import Flow, LatentMaskedAffineCoupling, NormalisingFlow, MLP
from datasets.datasets import GazeboSimDataset

## Training

Specify the hyperparameters.

In [10]:
n_bottleneck = 512 # ResNet-18 output FC layer dimensions
n_flows = 16
n_epochs = 10
batch_size = 64

enable_cuda = True
device = torch.device('cuda' if torch.cuda.is_available() and enable_cuda else 'cpu')

Initialise the ResNet-18 encoder.

In [11]:
import torchvision.models as models
resnet18_encoder = models.resnet18(pretrained=True)

Specify flow layers.

In [12]:
# Specify the hyperparameters
n_flows = 12

# Specify the flow layers
b = torch.tensor(n_bottleneck // 2 * [0, 1] + n_bottleneck % 2 * [0])
flows = []
for i in range(n_flows):
    st_net = MLP(n_bottleneck, 1024, 3)
    if i % 2 == 0:
        flows += [LatentMaskedAffineCoupling(b, st_net)]
    else:
        flows += [LatentMaskedAffineCoupling(1 - b, st_net)]

prior = torch.distributions.normal.Normal(loc=0.0, scale=1.0)
        
# Construct the normalising flow
nf = NormalisingFlow(flows, prior, device).to(device)

Load the in-distribution dataset for training.

In [25]:
train_dataset = GazeboSimDataset('/home/joel/Downloads/images-arm/data/middle')
train_partition_len = int(np.floor(0.8 * len(train_dataset)))
val_partition_len = len(train_dataset) - train_partition_len
train_set, val_set = torch.utils.data.random_split(train_dataset, [train_partition_len, val_partition_len])

train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=True, drop_last=False)

print("Loaded data.")
print("Train set: ", len(train_set))
print("Val set: ", len(val_set))

Loaded data.
Train set:  151
Val set:  38


In [29]:
optimizer =  torch.optim.Adam(nf.parameters(), lr=1e-4, weight_decay=1e-4)

for epoch in range(n_epochs):
    progressbar = tqdm(enumerate(train_loader), total=len(train_loader))
    for batch_n, x in progressbar:
        x = x.to(device)
        optimizer.zero_grad()
        z = resnet18_encoder(x)
        z, log_det = nf(z)
        log_prob_z = nf.get_prior_log_prob(z)
        loss = -log_prob_z.mean() - log_det.mean()
        loss.backward()
        optimizer.step()
        progressbar.update()
    progressbar.close()
    print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_n * len(x), len(train_loader.dataset),
                       100. * batch_n / len(train_loader),
                       loss.item()))
    
torch.save(nf.state_dict(), "../../saved_models/realnvp_16l_10p.ckpt")

  0%|          | 0/3 [00:00<?, ?it/s]


RuntimeError: Input type (torch.cuda.ByteTensor) and weight type (torch.FloatTensor) should be the same

## Testing