# Anomaly Detection

- Anomaly detection, or outlier detection, is the task of identifying instances that match the unexpected pattern in the data. We can take the example of credit card fraud detection, where we have to identify the fraud transaction from the normal transaction.



- ![](https://res.cloudinary.com/engineering-com/image/upload/w_640,h_640,c_limit,q_auto,f_auto/anomaly_detection_example_ujoo9m.jpg)



# Density Estimation

- Density estimation is one of the techniques used in anomaly detection.
- Suppose we have a dataset of {x1,x2,x3,....xn} and we have to find the probability of the data point x.
- We make a decision boundary, and if the data point lies inside the boundary, then it is normal; otherwise, it is an anomaly.



Let \( A(x) \) be a function that determines whether a data point is an anomaly or not, where \( x \) is the data point and \( \epsilon \) is the threshold value.

Then,

\[ A(x) = 
  \begin{cases} 
   \text{"Anomaly"} & \text{if } x < \epsilon \\
   \text{"Normal"} & \text{if } x \geq \epsilon 
  \end{cases}
\]

This function \( A(x) \) takes a data point \( x \) as input and returns "Anomaly" if \( x \) is less than the threshold \( \epsilon \), and "Normal" otherwise.

# Use Cases of Anomaly Detection

- Fraud Detection
- Manufacturing defects 
- Health Monitoring
- Intrusion Detection
- System Health Monitoring

There are many approaches for anomaly detection, here im focusing on the gaussian distribution based anomaly detection.

# Gaussian /Normal Distribution
- Propality of x is determinded by gaussian with mean mu and variance sigma^2
- ![](https://cdn.clutchprep.com/core_topic_visuals/inline_images/5vZPHTfBTHyo0WkfhaGH_Screen%20Shot%202018-01-02%20at%205.37.48%20PM.png)
- If we plot the f(x) then it will be bell shaped curve.

**Gaussian distribution examples on different values of mean and variance.**
![](https://michael-franke.github.io/intro-data-analysis/I2DA_files/figure-html/ch-app-01-normal-distribution-density-1.png)

# Single Variable Gaussian Distribution

## Paramerter Estimation


Dataset : X = {x1,x2,x3,....xn}


the equation for the mean is:

$$\mu_i = \frac{1}{m} \sum_{j=1}^m x_i^{(j)}$$

and for the variance you will use:
$$\sigma_i^2 = \frac{1}{m} \sum_{j=1}^m (x_i^{(j)} - \mu_i)^2$$

   $$ p(x ; \mu,\sigma ^2) = \frac{1}{\sqrt{2 \pi \sigma ^2}}\exp^{ - \frac{(x - \mu)^2}{2 \sigma ^2} }$$

   where $\mu$ is the mean and $\sigma^2$ is the variance.
   

Above equation of sigma and mu are also technically called as Maximum Likelihood Estimation.

# Multivariate Gaussian Distribution

- Multiple Feature Density Estimation:
- Training set: {multiple features}
- Each feature has 'n' samples.

We can calculate the probability of each feature's sample by using the following formula:

$$ p(x)=p(x_1 ; \mu_1,\sigma_1 ^2) * p(x_2 ; \mu_2,\sigma_2 ^2) * p(x_3 ; \mu_3,\sigma_3 ^2) * p(x_4 ; \mu_4,\sigma_4 ^2) *....* p(x_n ; \mu_n,\sigma_n ^2)$$

We will multiply the pobability of each feature because
- **Density Estimation Methods**: Methods like Gaussian Mixture Models (GMM) often assume that the features are independent of each other, similar to a class label.

- **Product Rule of Probabilities**: When events are independent, the joint probability of these events occurring is calculated by multiplying their individual probabilities. This is also known as the product rule of probabilities.

- **Example**: Suppose we have a probability of x1=1/10, and a probability of x2=1/10, then the joint probability of x1 and x2 is 1/10 * 1/10 = 1/100. That means if we look at the density estimation function, both of them mean the same, and while multiplying them, we are just calculating the joint probability of them.

# Anomaly detection algorithm 
- Choose n features x that you think might be indicative of anamolous example
- Fit the parameters $\mu_1, \mu_2,....\mu_n, \sigma_1^2, \sigma_2^2,....\sigma_n^2$
- The joint probability of a data point `x = (x1, x2, ..., xn)` is given by the product of the individual probabilities of each feature:


$$ p(x) = \prod_{i=1}^{n}  \frac{1}{\sqrt{2 \pi \sigma_i ^2}}\exp^{ - \frac{(x_i - \mu_i)^2}{2 \sigma_i ^2} } $$

where:
- The symbol `∏` represents the product of a sequence of expressions.

# Evaluation of the Anomaly Detection System

- Evaluating anomaly detection methods involves using labeled data or anomalous data in validation sets.
- For evaluation purposes, we will use data in a similar manner as we do in supervised learning, i.e., we will have a training set and a cross-validation set.
- In the training set, a common approach is to include all the data from which we want to identify anomalies.
- In the validation set, we will include the anomalous data. Then, we can perform the following metrics:
- True Positive, False Positive, False Negative, and True Negative (i.e., Confusion Matrix)
- Precision, Recall, and F1 Score

## When to Use Anomaly Detection

- Anomaly detection is useful when we have a large number of normal data points and a very small number of anomalous data points.
- We use anomaly detection when we need to predict or find outliers in unseen datasets or datasets that were not used during training.
- Anomaly detection is also beneficial when we need to predict future data that might differ from the current data.

# Choosing what feature to use for anomaly detection while training the model.
- Non gaussian features: If the features are not gaussian then we can use transformation like log, square root, cube root etc.

# Practical Implementation of Anomaly Detection
[here](unsupervised_learning/anomaly_detection/notebooks/practical.ipynb)
