## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [1]:
# Write your code from here
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
from great_expectations.core.expectation_suite import ExpectationSuite
from great_expectations.execution_engine.pandas_execution_engine import PandasExecutionEngine
from great_expectations.validator.validator import Validator

# Step 1: Create sample data
data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, np.nan], [45, 100000]])
df = pd.DataFrame(data, columns=["age", "income"])

# Handle missing values (fill income with median)
df["income"].fillna(df["income"].median(), inplace=True)

# Step 2: Anomaly detection using Isolation Forest
model = IsolationForest(contamination=0.2, random_state=42)
df["anomaly"] = model.fit_predict(df)

# Step 3: Setup Great Expectations manually
engine = PandasExecutionEngine()
suite = ExpectationSuite("anomaly_suite")

# This is the correct way to pass data
validator = Validator(
    execution_engine=engine,
    batches=[
        {
            "batch_spec": {},
            "batch_data": df,
            "batch_markers": {},
            "batch_definition": {},
        }
    ],
    expectation_suite=suite,
)

# Expect no anomalies (i.e., anomaly == 1 for all rows)
validator.expect_column_values_to_be_in_set("anomaly", value_set=[1])

# Validate the data
results = validator.validate()
print("\nValidation Summary:")
print(results["statistics"])




Batch objects provided to BatchManager must be formal Great Expectations Batch typed objects.


AttributeError: 'dict' object has no attribute 'id'