# Anomaly Detection and Normal Distribution

## Overview of Anomaly Detection

Anomaly detection is the process of identifying data points that do not conform to the expected pattern or distribution. These anomalies can represent:
- Fraudulent transactions
- Faulty equipment readings
- Network intrusions
- Outliers in general data

---

## Role of Normal Distribution in Anomaly Detection

Anomaly detection assumes that the features of "normal" data are distributed in a certain way, often following a **Gaussian (normal) distribution**. Here's how it is applied:

### Step 1: Fit a Normal Distribution

For each feature in the dataset (e.g., CPU usage, memory consumption), we:
1. Estimate the **mean** ($\mu$) and **variance** ($\sigma^2$) of the feature.
2. Assume the data is drawn from a normal distribution, modeled as:

$$
p(x_i) = \frac{1}{\sqrt{2\pi\sigma_i^2}} e^{-\frac{(x_i - \mu_i)^2}{2\sigma_i^2}}
$$

Where:
- $x_i$: A feature value
- $\mu_i$: Mean of the feature
- $\sigma_i^2$: Variance of the feature

---

### Step 2: Compute the Probability of a Data Point

To determine if a data point $x = (x_1, x_2, \dots, x_n)$ is anomalous:
- Compute the joint probability $p(x)$ assuming independence of features:

$$
p(x) = \prod_{i=1}^{n} p(x_i)
$$

Where $p(x_i)$ is the probability of each feature value $x_i$ under the normal distribution.

---

### Step 3: Define an Anomaly Threshold

- Set a threshold $\epsilon$ for the probability.
- If:

$$
p(x) < \epsilon
$$

classify the point as an **anomaly**; otherwise, classify it as **normal**.

---

## Why Normal Distribution?

The normal distribution is commonly used in anomaly detection because:
1. **Many natural phenomena follow a normal distribution** (e.g., heights, weights, sensor readings).
2. It is mathematically convenient and computationally efficient to calculate probabilities.
3. When features are approximately Gaussian, this method works well to model typical behaviour.

---

## Practical Example: Monitoring Server Metrics

Suppose you're monitoring server metrics for anomalies:
- **Feature 1**: CPU usage
- **Feature 2**: Memory usage

### Steps:
1. Collect "normal" data (e.g., server metrics during normal operation).
2. Fit a Gaussian distribution for each feature:
   - Calculate $\mu_{CPU}$, $\sigma_{CPU}^2$, $\mu_{Memory}$, and $\sigma_{Memory}^2$.
3. Compute $p(x)$ for new data points:

$$
p(x) = p(CPU) \cdot p(Memory)
$$

4. Compare $p(x)$ with the threshold $\epsilon$:
   - If $p(x) < \epsilon$, raise an alert for an anomaly.

---

## Notes 

key points:

1. **Probability Threshold ($\epsilon$)**:
   - Picking the right threshold is critical.
   - Too high or too low a threshold can result in too many false positives or negatives.

2. **Multivariate Gaussian**:
   - If features are correlated, use a **multivariate Gaussian distribution** instead of assuming independence.

---

## Summary

- **Normal Distribution** helps model the expected behaviour of data.
- **Anomalies** are identified as data points that have a low probability under the assumed Gaussian distribution.
- This statistical approach is simple, efficient, and widely used in real-world anomaly detection systems.

