# Anomaly detection

This notebook contains the code for anomaly detection.

The code is adapted from Andrew Ng's 'Machine Learning Specialiation' course to accomplish the homework assignment. 


# Outline
- [ 1 - Import Packages](#1)
- [ 2 - Load the Data](#2)
- [ 3 - Main Functions](#3)
- [ 4 - Train the Agent](#5)
- [ 5 - Visualize the results](#4)

<a name="1"></a>
## 1 - Import Packages

In [None]:
import numpy as np

<a name="3"></a>
## 3 - Main Functions

In [None]:
def estimate_gaussian(X): 
    """
    Calculates mean and variance of all features 
    in the dataset
    
    Args:
        X (ndarray): (m, n) Data matrix
    
    Returns:
        mu (ndarray): (n,) Mean of all features
        var (ndarray): (n,) Variance of all features
    """

    m, n = X.shape
    mu = np.mean(X, axis=0)
    var = np.var(X, axis=0)
      
    return mu, var

In [None]:
def select_threshold(y_val, p_val): 
    """
    Finds the best threshold to use for selecting outliers 
    based on the results from a validation set (p_val) 
    and the ground truth (y_val)
    
    Args:
        y_val (ndarray): Ground truth on validation set
        p_val (ndarray): Results on validation set
        
    Returns:
        epsilon (float): Threshold chosen 
        F1 (float):      F1 score by choosing epsilon as threshold
    """ 

    best_epsilon = 0
    best_F1 = 0
    F1 = 0  
    step_size = (max(p_val) - min(p_val)) / 1000
    for epsilon in np.arange(min(p_val), max(p_val), step_size):
        predictions = (p_val < epsilon) # true (value is 1) if anomaly  
        fp = sum((predictions == 1) & (y_val == 0)) 
        fn = sum((predictions == 0) & (y_val == 1))
        tp = sum((predictions == 1) & (y_val == 1))
        
        prec = tp / (fp + tp)
        rec = tp / (tp + fn)
        F1 = 2 * prec * rec / (prec + rec) 
        
        if F1 > best_F1:
            best_F1 = F1
            best_epsilon = epsilon
        
    return best_epsilon, best_F1

<a name="4"></a>
## 4 - Train the Agent

In [None]:
p_val = multivariate_gaussian(X_val, mu, var)
epsilon, F1 = select_threshold(y_val, p_val)