## Unsupervised Anomaly Detection
```In this exercise you will use concepts you know, and maybe some concepts you are about to meet, in order to find anomalies in dataset of credit cards transactions.
We will think about this problem as one think of real anomaly detecting problems: your goal will be to choose the 1,000 most anomalous samples from the dataset - the samples you suspect to be the anomaly samples. In real life problems, those samples will be handed to a human researcher for verification. Obviously, if you give him a lot of regular samples, he will get angry.```

```~Ittai Haran```

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

```Load the dataset. You can see it's labeled: It's for allowing you to test yourself. Note that in real life problems, you won't have it. Normalize the dataset as you see fit.```

In [None]:
df = pd.read_csv('resources/creditcard.csv')
df.head()

```Your first task is to formulate a method for evaluating your anomalies. Write an evaluation method, which will help you compare between different ways to detect anomalies. Notice that this isn't a classification method, and regard your true goal: to mark the 1,000 most anomalous samples.```

In [None]:
def evaluate_method(y_true, grades):
    # y_true is the class: 0 for regular, 1 for anomaly
    # the grades should indicate how anomalous you think the sample is - as higher the grade, the sample is more suspiciuos
    return 

```We can now examine different methods for anomaly detecting. For each method, evaluate it, and compare it to the other methods.```

```The first one we will try is to grade the samples by their distance from the 'mean sample', in units of standard deviation. You can also think about the features as independent gaussian distributions and grade a sample by its distance from the gaussian's mean, for every feature.```

```What hidden assumption you took during "training"? what part of the data you trained on?```

```Try using PCA: project the dataset into a lower dimensional space, and than use the "inverse" transformation (why ""?) to get approximated samples. Compare the samples you got to the samples you started with.```

```Read about one class SVM. Use it to evaluate your samples. Notice that this algorithm is very slow compared to those you tried earlier. Consider training it only on a fraction of the samples.
Hint: you can use the decision function directly to get the distance of the sample from the decision boundary.```

```Now try clustering your data, and use the distance from the clusters (you will have to define it) to grade the samples. Think about changing your normalization method when trying to cluster. Here you also might want to consider to train on a fraction of the samples.```

```Try combining the grades you got from different methods into a single grade. Did you get a better detector? why or why not?```

```Now we will experience with Deep Auto Encoders. The idea is to create a neural network that gets the samples as input, and try to predict the very same samples: The difficulty comes from the fact that the networks gets narrower, and so having an information bottleneck. The grade each sample will get is the reconstruction error - the difference between the output and the input. You can read more about Auto Encoders in the literature.```

In [None]:
from keras.layers import Dense, Input
from keras.models import Model

```Try thinking about other methods to detect anomalies in your data, and find a way to get better results. If you want to know more about Auto Encoders, talk to your tutor about you implementing a Variational Auto Encoder.```