# Edge Case Detection Notebook
Detect statistical outliers, anomalies, and classify edge cases in your dataset.

**Features:**
- Statistical outlier detection (IQR, Z-score, isolation forest)
- Pattern anomaly identification using clustering
- Data distribution analysis (KS tests, chi-square)
- Edge case classification and labeling
- Visualization of detected edge cases

---

*See shared_utilities.py for common functions.*

In [None]:
# Setup: Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import IsolationForest
from sklearn.cluster import DBSCAN

## 1. Load Your Data
Upload your dataset as a CSV file.

In [None]:
# Load data
df = pd.read_csv('your_data.csv')
df.head()

## 2. Statistical Outlier Detection
Use IQR, Z-score, and Isolation Forest to find outliers.

In [None]:
# IQR method
def detect_outliers_iqr(data, column):
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower = Q1 - 1.5 * IQR
    upper = Q3 + 1.5 * IQR
    return data[(data[column] < lower) | (data[column] > upper)]
outliers_iqr = detect_outliers_iqr(df, 'feature_column')
outliers_iqr

In [None]:
# Z-score method
def detect_outliers_zscore(data, column, threshold=3):
    mean = data[column].mean()
    std = data[column].std()
    z_scores = (data[column] - mean) / std
    return data[np.abs(z_scores) > threshold]
outliers_zscore = detect_outliers_zscore(df, 'feature_column')
outliers_zscore

In [None]:
# Isolation Forest
iso = IsolationForest(contamination=0.05)
df['anomaly'] = iso.fit_predict(df[['feature_column']])
anomalies = df[df['anomaly'] == -1]
anomalies

## 3. Pattern Anomaly Identification
Use clustering to find unusual patterns.

In [None]:
# DBSCAN clustering
db = DBSCAN(eps=0.5, min_samples=5)
df['cluster'] = db.fit_predict(df[['feature_column']])
sns.scatterplot(x='feature_column', y='another_column', hue='cluster', data=df)
plt.show()

## 4. Data Distribution Analysis
Visualize and test data distributions.

In [None]:
# Histogram and KDE
sns.histplot(df['feature_column'], kde=True)
plt.show()

In [None]:
# Kolmogorov-Smirnov test
from scipy.stats import ks_2samp
stat, p_value = ks_2samp(df['feature_column'], np.random.normal(df['feature_column'].mean(), df['feature_column'].std(), len(df)))
print('KS test p-value:', p_value)

## 5. Edge Case Classification
Label and document detected edge cases.

In [None]:
# Label edge cases
df['edge_case'] = ((df['anomaly'] == -1) | (np.abs((df['feature_column'] - df['feature_column'].mean()) / df['feature_column'].std()) > 3)).astype(int)
df[df['edge_case'] == 1]

## 6. Visualization of Detected Edge Cases
Plot edge cases for review.

In [None]:
# Visualize edge cases
sns.scatterplot(x='feature_column', y='another_column', hue='edge_case', data=df)
plt.title('Edge Case Visualization')
plt.show()

## 7. Documentation & Next Steps
Document findings and export edge cases for further analysis.

In [None]:
# Export edge cases
df[df['edge_case'] == 1].to_csv('detected_edge_cases.csv', index=False)