# üîç Local Outlier Factor (LOF) for Anomaly Detection

Hello guys üëã  

Today, we are going to continue our discussion on **Anomaly Detection**.  
In this video, we‚Äôll see how we can perform **anomaly detection** or **find important outliers** using **Local Outlier Factor (LOF)**.

---

## üß© Key Concepts

Before we dive into LOF, let‚Äôs define two important terms:

- **<span style="color:#ff595e;">Local Outlier</span>**
- **<span style="color:#ffca3a;">Global Outlier</span>**

We need to clearly understand the difference between these two.

---

### üî¥ Global Outlier

A **global outlier** is a point that lies **completely far away** from all the clusters.  
It is **globally different** from the rest of the data.

Example:  
If a single point lies far outside all other groups, it is a **global outlier**.

---

### üü† Local Outlier

A **local outlier** lies **close to a cluster** but doesn‚Äôt fit perfectly within it.  
It is ‚Äúlocally‚Äù inconsistent compared to its nearby data points.

Example:  
Points near a cluster boundary but not exactly following its density.

---

## ‚öôÔ∏è Why Use Local Outlier Factor?

In previous videos, we used:

- **DBSCAN Clustering** üåÄ  
- **Isolation Forest** üå≤  

Both can detect outliers effectively.  
However, to specifically detect **local outliers**, we use **LOF** ‚Äî which focuses on **local density**.

---

## üß† Understanding the Concept

**Local Outlier Factor (LOF)** is based on the idea of **K-Nearest Neighbors (KNN)**.

It measures how isolated a data point is compared to its neighbors.

---

### üìò Key Idea

For a given data point \( x \):

1. Find its **k nearest neighbors**.  
2. Compute the **local density** around \( x \).  
3. Compare the density of \( x \) with the densities of its neighbors.

If \( x \)‚Äôs density is **much lower** than that of its neighbors,  
then \( x \) is likely a **local outlier**.

---

### ‚öóÔ∏è Mathematical Intuition

Let‚Äôs say \( k = 5 \).  
We take 5 nearest neighbors for each point.

We calculate the **average distance** from the point to its \( k \)-neighbors:

$$
\text{AvgDist}(x) = \frac{1}{k} \sum_{i=1}^{k} d(x, x_i)
$$

- If **distance is small**, the **local density** is **high**.  
- If **distance is large**, the **local density** is **low**.

---

Now, we compare densities:

$$
\text{LOF}(x) = \frac{\text{Average local density of neighbors}}{\text{Local density of point } x}
$$

If  

$$
\text{LOF}(x) > 1
$$  

then \( x \) is an **outlier**.  
The **higher** the value, the **stronger** the outlier.

---

## üßÆ Intuitive Summary

| Condition | Meaning |
|------------|----------|
| <span style="color:#2a9d8f;">LOF(x) ‚âà 1</span> | Normal data point |
| <span style="color:#ffb703;">LOF(x) > 1</span> | Slightly anomalous |
| <span style="color:#d62828;">LOF(x) >> 1</span> | Strong outlier |

---

## üîç Concept of Local Density

<span style="color:#06d6a0;">Local Density</span> represents how closely packed the neighbors are around a point.

If a point has:
- **More neighbors nearby ‚Üí High local density**
- **Fewer neighbors nearby ‚Üí Low local density**

So, we compare:
$$
\text{Local Density of x} \quad \text{vs} \quad \text{Local Density of its Neighbors}
$$

If \( x \) has **significantly lower local density**, it‚Äôs a **local outlier**.

---

## üí° Algorithm Steps (Simplified)

1. Choose **k** (number of neighbors).  
2. For each point:
   - Find its **k nearest neighbors**.
   - Compute **average distance**.
   - Calculate **local density**.
3. Compare **densities**.
4. Compute **LOF score**.
5. Label points with **high LOF** as **outliers**.

---

## üíª Implementation Task (For You üéØ)

In **scikit-learn**, LOF is available as:

```python
from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
y_pred = lof.fit_predict(X)


n_neighbors: number of neighbors (k)

contamination: expected proportion of outliers

Output: -1 ‚Üí outlier, 1 ‚Üí normal point

Example Dataset

Try applying LOF on a dataset like make_moons or make_circles:

In [None]:
from sklearn.datasets import make_moons
from sklearn.neighbors import LocalOutlierFactor
import matplotlib.pyplot as plt

X, _ = make_moons(n_samples=500, noise=0.05)

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)
y_pred = lof.fit_predict(X)

plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='coolwarm')
plt.title("Local Outlier Factor (LOF) - Anomaly Detection")
plt.show()


You‚Äôll see red points (‚àí1) ‚Üí outliers,
and blue points (+1) ‚Üí normal samples.