# Anomaly Detection

In this notebook we'll introduce two types of anomaly detection and learn how to perform such analyses using Normalizing Flows. 

In [None]:
# standard python stuff
import os
import sys
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt

# stuff for torch+nflows
import torch
from torch import nn
from tqdm import tqdm
from torch.nn.modules import Module

# optimizer from torch
from torch import optim

# base Flow to construct model
from nflows.flows.base import Flow
#base distribution to use
from nflows.distributions.normal import StandardNormal
# the MADE coupling layer
from nflows.transforms.autoregressive import MaskedAffineAutoregressiveTransform
# this adds a RandomPermutation to add variance between layers
from nflows.transforms.permutations import RandomPermutation
# this will combine modules
from nflows.transforms.base import CompositeTransform

# Anomaly Detection:

Anomaly detection is in some sense self-explanatory: given a dataset $X$ we want to find a subset $X'$ which is "anomalous" or different. This could be to detect malicious outliers: a spam filter, a banking fraud detector, simply badly measured samples, etc. Or it could be to detect gold: high-reward stocks or options, good fits for a sporting team, etc.

The task will define is the anomaly is good or bad. Additionally, it may define the type of anomaly we seek. There are roughly two types of anomalies: out-of-density and over-densities. Usually, anomaly detection is an **unsupervised** task, where we do not have labels to train our models and simply try to understand the data and find anomalies within. Some methods are **semi-supervised** because they use some noisy labels to get a better sense of what an anomaly is.

## Out-of-density

In out-of-density cases, we are really looking for some outliers, things that are far away from most of the data. In 1D, this is very easy to visualize

In [None]:
N=10000
X1 = st.norm().rvs(int(0.99*N))
X2 = st.norm(loc=10,scale=0.1).rvs(int(0.01*N))
X=np.hstack([X1,X2])
plt.hist(X,histtype='step')
#plt.yscale('log')

You can see how there is a small subset of data which is far away from the rest. This would be our anomaly. Again, we do not have labels here. You might say that this is very easy: just look at the data and that's it. However, 1D is very misleading. As you increase the number of dimensions, you not only lose visualization but *every* point is in some sense far away from the rest. This is **the curse of dimensionality**.

However, we can use Normalizing Flows to our advantange here. We can simply define the outliers as the points with lowest probability. Thus, our **anomaly score** is simply Log $p(X)$. We can use our anomaly score to select events in the usual way:

Anomalous events ($\alpha$) = {$x$ | Log $p(x) \leq \alpha$ }

Where $\alpha$ is the parametric choice that selects how anomalous do we want our anomalous events to be. You can think that $\alpha$ defines events whose probability are lower or equal to $e^\alpha$.

## Exercise:

Given the 2D dataset 

In [None]:
N=50000
X1 = st.multivariate_normal(mean=[1.5,-1.5],cov=[[1,0.1],[0.1,0.5]]).rvs(int(0.999*N))
X2 = st.multivariate_normal(mean=[-1.,1.],cov=[[0.7,-0.2],[-0.2,1.2]]).rvs(int(0.001*N))
X=np.vstack([X1,X2])
Y=np.hstack([np.ones(len(X1)),-np.ones(len(X2))])
print(X.shape)
plt.scatter(X[:,0],X[:,1],c=Y,alpha=0.5)

Train a Normalizing Flow: this includes implementing a batch-size to deal with the higher number of events (not done in the previous notebook, but can be done simply by sampling a subset of X at each iteration) and more importantly **evaluating** whether the flow has trained succesfully or not. An interesting question here is whether we need to flow to match the exact dataset or just the bulk of the dataset so we get anomalous events. The degree of precision depends on the application.

Use the anomaly score to select "intereting" events. Produce summary plots. Some suggestions: A useful metric (which would not be available in real data) is the fraction of selected events of X2 as a function of a $\alpha$. Another is a scatter plot as above but where the color is the anomaly score.