## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [2]:
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest

# Example data with missing values
data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(data, columns=["Age", "Income"])

# Handle missing values (simple imputation)
df["Income"].fillna(df["Income"].median(), inplace=True)

# Initialize IsolationForest for anomaly detection
clf = IsolationForest(n_estimators=100, contamination=0.2)

# Fit the model
clf.fit(df)

# Predict anomalies (1 = normal, -1 = anomaly)
df["Anomaly"] = clf.predict(df)

# Print results
print("Detected Anomalies:")
print(df)

# Trigger alert if anomalies are detected
if df["Anomaly"].eq(-1).any():  # If any anomaly is detected
    print("ALERT: Data quality issues detected due to anomalies.")
else:
    print("Data quality is good.")

Detected Anomalies:
  Age    Income  Anomaly
0  25   50000.0        1
1  30   60000.0        1
2  35   75000.0        1
3  40   67500.0        1
4  45  100000.0       -1
ALERT: Data quality issues detected due to anomalies.
