# **Problem Statement**  
## **30. Detect outliers in a dataset using Isolation Forest.**

Detect outliers (anomalies) in a dataset using the Isolation Forest algorithm.

The goal is to:
- Identify data points that deviate significantly from the majority
- Label them as inliers or outliers

### Constraints & Example Inputs/Outputs

### Constraints
- Dataset contains numerical features
- Outliers are rare compared to normal points
- No class labels required (unsupervised)

### Example Input:
```python
X = [[1], [2], [2], [3], [100]]

```

Expected Output:
```python
Outlier detected: 100
Labels: [1, 1, 1, 1, -1]

1 → Normal
-1 → Outlier
```

### Solution Approach

**Step 1: Understand Isolation Forest**
- Randomly selects a feature
- Randomly selects a split value
- Outliers get isolated faster than normal points

**Step 2: Prepare Dataset**
- Convert input into NumPy array
- Scale data if necessary (optional)

**Step 3: Apply Outlier Detection**
- Brute Force: Distance / statistical rules
- Optimized: Isolation Forest

**Step 4: Interpret Results**
- -1 → Outlier
- 1 → Normal data point

**Step 5: Validate with Test Cases**
- Known outliers
- Edge cases
- Multiple dimensions

### Solution Code

In [2]:
# Approach 1: Brute Force Approach (Statistical Z-Score)
"""
Logic
- Compute mean & standard deviation
- Points with |z| > threshold → outliers
"""
import numpy as np

def detect_outliers_zscore(X, threshold=3):
    mean = np.mean(X)
    std = np.std(X)
    
    z_scores = (X - mean) / std
    outliers = np.abs(z_scores) > threshold
    
    return outliers


In [3]:
# Brute Force Example
X = np.array([1, 2, 2, 3, 100])
outliers = detect_outliers_zscore(X)

outliers


array([False, False, False, False, False])

### Alternative Solution

In [4]:
# Approach 2: Optimized Approach: Isolation Forest
from sklearn.ensemble import IsolationForest

def isolation_forest_outlier_detection(X, contamination=0.2, random_state=42):
    model = IsolationForest(
        n_estimators=100,
        contamination=contamination,
        random_state=random_state
    )
    
    model.fit(X)
    predictions = model.predict(X)
    
    return predictions

# Create Dataset
X = np.array([
    [1],
    [2],
    [2],
    [3],
    [100]
])


In [5]:
# Run Isolation Forest
predictions = isolation_forest_outlier_detection(X)
predictions


array([ 1,  1,  1,  1, -1])

### Alternative Approaches

**Brute Force**
- Z-score
- IQR (Interquartile Range)
- Manual thresholding

**Optimized**
- Isolation Forest ✅
- Local Outlier Factor (LOF)
- One-Class SVM
- Autoencoders (deep learning)

### Test Case

In [6]:
# Test Case 1: Single Obvious Outlier
X_test1 = np.array([[10], [12], [11], [9], [100]])

preds = isolation_forest_outlier_detection(X_test1, contamination=0.2)
preds


array([ 1,  1,  1,  1, -1])

In [7]:
# Test Case 2: No Outliers
X_test2 = np.array([[10], [11], [12], [13], [14]])

preds = isolation_forest_outlier_detection(X_test2, contamination=0.01)
preds


array([-1,  1,  1,  1,  1])

In [8]:
# Test Case 3: Multiple Outliers
X_test3 = np.array([[1], [2], [3], [100], [120]])

preds = isolation_forest_outlier_detection(X_test3, contamination=0.4)
preds


array([ 1,  1,  1, -1, -1])

In [9]:
# Test Case 4: Multi-Dimensional Data
X_test4 = np.array([
    [1, 1],
    [2, 2],
    [3, 3],
    [100, 100]
])

preds = isolation_forest_outlier_detection(X_test4, contamination=0.25)
preds



array([ 1,  1,  1, -1])

## Complexity Analysis

### Brute Force (Z-score)
- Time: O(n)
- Space: O(1)

### Isolation Forest
- Time: O(t × n × log n)
- Space: O(t × n)

Where:
- t = number of trees
- n = number of samples

#### Thank You!!