## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [None]:
# Write your code from here
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
import great_expectations as ge

# Step 1: Prepare the data
data = np.array([
    [25, 50000],
    [30, 60000],
    [35, 75000],
    [40, np.nan],       # Contains a missing value
    [45, 100000]
])

# Convert to DataFrame
df = pd.DataFrame(data, columns=['age', 'income'])

# Step 2: Handle missing values (e.g., imputation or drop)
try:
    df_cleaned = df.dropna()
    if df_cleaned.empty:
        raise ValueError("No valid data available after dropping missing values.")
except Exception as e:
    print(f"Data Cleaning Error: {e}")
    df_cleaned = pd.DataFrame(columns=['age', 'income'])

# Step 3: Anomaly Detection using Isolation Forest
try:
    if not df_cleaned.empty:
        model = IsolationForest(contamination=0.2, random_state=42)
        model.fit(df_cleaned)
        df_cleaned['anomaly'] = model.predict(df_cleaned)
    else:
        print("Skipping model training due to lack of data.")
except Exception as e:
    print(f"Anomaly Detection Error: {e}")

# Step 4: Integrate with Great Expectations for alerting
try:
    gdf = ge.from_pandas(df_cleaned)
    # You can define additional expectations here as needed
    anomalies = df_cleaned[df_cleaned['anomaly'] == -1]
    if not anomalies.empty:
        print("Anomalies Detected:")
        print(anomalies)
    else:
        print("No anomalies detected.")
except Exception as e:
    print(f"Great Expectations Error: {e}")
