<a href="https://colab.research.google.com/github/kluo9/Deap-Learning/blob/main/Anomaly_Detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook presents unsupervised anomaly detection in images.
The main purpose is to train a model to determine whether the given image is similar with the training data.

The training data includes 100000 human faces, and the testing data has about 10000 from the same distribution with training data (label 0), and about 10000 from another distribution (anomalies, label 1).

The method is to train an autoencoder with the training data. 
Ideally the autoencoder should have small reconstruction error in the training set. 
During inference, we can use reconstruction error as anomaly score. Anomaly score can be seen as the degree of abnormality of an image. An image from unseen distribution should have higher reconstruction error.

The evaluation is ROC AUC score.

In [15]:
# Training progress bar
!pip install -q qqdm

In [16]:
import random
import numpy as np
import torch
from torch import nn
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler, TensorDataset
import torchvision.transforms as transforms
import torch.nn.functional as F
from torch.autograd import Variable
import torchvision.models as models
from torch.optim import Adam, AdamW
from qqdm import qqdm, format_str
import pandas as pd
from google.colab import files

Download data

In [17]:
uploaded = files.upload()

Saving kaggle.json to kaggle (1).json


In [18]:
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [19]:
! kaggle competitions download -c ml2022spring-hw8

ml2022spring-hw8.zip: Skipping, found more recently modified local copy (use --force to force download)


In [20]:
!unzip /content/ml2022spring-hw8.zip

Archive:  /content/ml2022spring-hw8.zip
replace data/testingset.npy? [y]es, [n]o, [A]ll, [N]one, [r]ename: A
  inflating: data/testingset.npy     
  inflating: data/trainingset.npy    


In [21]:
train = np.load('/content/data/trainingset.npy', allow_pickle=True)
test = np.load('/content/data/testingset.npy', allow_pickle=True)

print(train.shape)
print(test.shape)

(100000, 64, 64, 3)
(19636, 64, 64, 3)


Set the random seed to a certain value for reproducibility.

In [None]:
def same_seeds(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

same_seeds(48763)

# Autoencoder