## Density Estimation

## Overview 

Density estimation is a technique used to estimate the probability distribution of a dataset. It involves estimating the underlying distribution function from a set of data points, which may or may not come from a known distribution.

Consider out training set: ${{ \vec x^{(1)}, \vec x^{(2)},.... ,\vec x^{(m)}}}$  

Each examples $\vec x^{(i)}$ has $n$ features.

Our probability function will be, 

$p({\vec x}) = p(x_1; \mu_1, \sigma_1^2) *  p(x_2; \mu_2, \sigma_2^2) * \cdot \cdot \cdot * p(x_n; \mu_n, \sigma_n^2)   $


$= \prod_{j=1}^n p(x_j; \mu_j, \sigma_j^2)$


**Defination found in Books >>>**




## Parametric Density Estimation

In **parametric density estimation**, we assume that the data follows a certain distribution. For example, if we assume the data follows a **normal distribution**, we estimate the parameters (mean $\mu$ and variance $\sigma^2$) that define that distribution. The probability density function (PDF) for a normal distribution is given by:

$$
f(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)
$$

Where:
- $x$ is a data point.
- $\mu$ is the mean.
- $\sigma^2$ is the variance.

The parameters $\mu$ and $\sigma^2$ can be estimated from the data using methods like **maximum likelihood estimation (MLE)**.

---


## The Final Algorithm

1. Choose $n$ features $x_i$ that is indicative if anoamlous examples.
2. Fit parameters $\mu_1, \mu_2, .....,\mu_n, \sigma_1, ......,\sigma_n$  
   where,
   $\mu_j = \frac{1}{m}\sum_{i=1}^m x_j^{(i)}$  
   $\sigma_j^2 = \frac{1}{m}\sum_{i=1}^m (x_j^{(i)} - \mu_j)^2$

3. Given new examples $x$, compute $p(x)$  
   $p(x)$ = $= \prod_{j=1}^n p(x_j; \mu_j, \sigma_j^2)$ = $\prod_{j=1}^n\frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)$

4. Anomaly if $p(x) < \epsilon$

## The real-number evaluation

When we are developing a learning algorithm, say choosing different features or trying different values of the parameters like $\epsilon$, making decisions about whether or not to change a feature in a certain way or to increase or decrease epsilon or other parameters, making those decisions is much easier if you have a way of evaluating the learning algorithm

*Real number evaluation* means having a method to evaluate the system’s performance during development. For example, by adjusting parameters like *epsilon* or changing features, you can evaluate whether the algorithm’s performance has improved. This makes it easier to make quick adjustments.

- In anomaly detection, even though most data is unlabeled, we can use a small number of labeled anomalies (e.g., data points with `y=1` indicating anomalies) and normal examples (`y=0`).

## Training and Evaluation Setup
1. **Training Set:** You train the anomaly detection algorithm using mostly normal data (`y=0`). This is an unsupervised learning algorithm.
2. **Cross-Validation and Test Sets:** You create a cross-validation set and a test set, both of which should have a mix of normal and anomalous examples.
   - **Example:** 
     - Training set: 6,000 normal engines
     - Cross-validation set: 2,000 normal engines + 10 anomalies
     - Test set: 2,000 normal engines + 10 anomalies
   - The cross-validation set helps you tune parameters (like epsilon) to avoid overfitting.

## Tuning Parameters with Cross-Validation
- After training the model, you evaluate it on the cross-validation set to see how well it detects anomalies.
- By adjusting the *epsilon* parameter, you can fine-tune the system. A higher epsilon makes the model stricter, while a lower epsilon makes it more lenient.

## Handling Imbalanced Data
- Anomaly detection usually involves imbalanced datasets, where anomalies (`y=1`) are much fewer than normal examples (`y=0`).
- **Alternative Metrics:** Traditional accuracy metrics might not be suitable. Instead, use:
  - **Precision:** The proportion of true positives among all flagged anomalies.
  - **Recall:** The proportion of true positives among all actual anomalies.
  - **F1 Score:** The harmonic mean of precision and recall, balancing the two metrics.

## Alternative Approach for Small Datasets
- If the dataset is small, especially in terms of anomalies, you might not have enough data to split into a training set, cross-validation set, and a separate test set.
- In this case, use just the training set and cross-validation set.
   - Training set: 6,000 normal engines
   - Cross-validation set: 4,000 normal engines + all anomalies
   - The downside is that you won’t have a separate test set to evaluate how well your model will generalize to future data, which may increase the risk of overfitting.

## Model Evaluation Process
- After training on the training set, compute the probability of each example in the cross-validation or test set being an anomaly using the model.
- If the probability is less than epsilon, it’s flagged as anomalous (`y=1`), and if it’s greater than or equal to epsilon, it’s considered normal (`y=0`).
- You then compare the predictions with actual labels to evaluate the model's performance.

## Why Use Unsupervised Learning for Anomaly Detection?
- Even if you have some labeled anomalies, anomaly detection typically uses an unsupervised learning algorithm. This is because anomalies are rare, and there may not be enough labeled examples to train a supervised model.
- The approach mixes unsupervised and supervised techniques, where you fine-tune an unsupervised algorithm using a small number of labeled anomalies.

## Conclusion
By combining unsupervised learning with a small number of labeled anomalies, this method allows for efficient model development. It ensures that the system can be quickly tuned and evaluated for real-world applications.