# Variational Autoencoder: VAE

In this notebook we explore the using of VAEs for anomaly detection. For this purpose, we will use the following procedure:

1. We use the train set for the training process. We use a non-contaminated version of the public test set for validation (to decide early stopping and avoid the overfitting over the training set).
2. The performance of the model was measured over the private test set.


TODO: 
1. Check if the performance improves when using QuantileTransform instead of MinMaxScaler.
2. Check if the performance improves when using only curves with more than 20 detections in both bands.
3. Check if the performance improves when more importance to the most important features of the Supervised RF-Detector is given.

In [1]:
import matplotlib.pyplot as plt 
from sklearn.manifold import TSNE
from sklearn.metrics import f1_score

import torch
from torch.utils import data
from torch.utils.data import DataLoader
import torch.nn as nn
from torch.nn import functional as F

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

from barbar import Bar

from preprocess_singlein import get_mnist
from vae.train import TrainerAE
from vae.test import eval

ImportError: cannot import name 'TrainerAE'

In [None]:
class Args:
    batch_size = 200
    num_epochs = 350
    lr = 1e-4
    patience = 100
    lr_milestones = [250]
    latent_dim = 32
    anormal_class = 5

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    
args = Args() # Parsing all the arguments for the training
dataloader_train, dataloader_val, dataloader_test = get_mnist(args)
vae = TrainerVAE(args, dataloader_train, dataloader_val, device)

## Training

In [None]:
vae.train()

## Learning curve

In [None]:
def plot_loss(values, values_t, metric):
    plt.plot(np.arange(len(values)), values, c='k', label='train')
    plt.plot(np.arange(len(values_t)), values_t, c='b', label='test')
    plt.title('Variational Autoencoder {}'.format(metric))
    plt.ylabel(metric)
    plt.xlabel('Epoch')
    plt.legend(loc='best')
    plt.grid(True)

plot_loss(np.array(vae.reconst)/1000, np.array(vae.reconst_t), 'Reconstruction')

## Evaluation

In [None]:
dataloader_test, _, _ = get_ALeRCE_data(args.batch_size, 'test', mode='test',scaler=scaler)
labels1, labels2, scores, latents = eval(vae.model, dataloader_test, device)

In [None]:
x_embedded = TSNE(n_components=2).fit_transform(latents)

In [None]:
plt.figure(figsize=(8,8))
cmap = plt.get_cmap('jet', 4)
plt.scatter(x_embedded[:, 0][labels2==0], x_embedded[:, 1][labels2==0],
            s=15, alpha=0.5, marker='.')
plt.scatter(x_embedded[:, 0][labels2!=0], x_embedded[:, 1][labels2!=0], 
            c=labels2[labels2!=0].reshape(-1,),
            s=150, cmap=cmap, marker='*')

plt.grid(True)

In [None]:
scores_in = scores[labels1==0]
scores_out = scores[labels1==1]

In [None]:
plt.hist(scores_in, bins=50, color='b', alpha=0.3, density=True, label='Inlier')
plt.hist(scores_out, bins=20, color='r', alpha=0.3, density=True, label='Outlier')