### Task 1: Automated Data Profiling

**Steps**:
1. Using Pandas-Profiling
    - Generate a profile report for an existing CSV file.
    - Customize the profile report to include correlations.
    - Profile a specific subset of columns.
2. Using Great Expectations
    - Create a basic expectation suite for your data.
    - Validate data against an expectation suite.
    - Add multiple expectations to a suite.

In [1]:
# Write your code from here

import pandas as pd
import numpy as np
import logging
import smtplib
from email.mime.text import MIMEText
from ydata_profiling import ProfileReport
from sklearn.ensemble import IsolationForest
from great_expectations.data_context import get_context
from great_expectations.checkpoint import SimpleCheckpoint

# -------- CONFIGURATION --------
EMAIL_ALERT = "your_email@example.com"
SMTP_SERVER = "smtp.example.com"
SMTP_PORT = 587
SMTP_USER = "your_email@example.com"
SMTP_PASSWORD = "your_password"

# -------- LOGGING SETUP --------
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# -------- GENERATE SAMPLE DATA --------
np.random.seed(42)
df = pd.DataFrame({
    'id': range(1, 101),
    'age': np.random.normal(30, 10, 100).astype(int),
    'salary': np.random.normal(50000, 15000, 100).astype(int),
    'dept': np.random.choice(['HR', 'Engineering', 'Sales'], 100)
})

# -------- PANDAS PROFILING --------
profile = ProfileReport(df, title="Sample Data Profile", explorative=True)
profile.to_file("profile_report.html")

# Subset with correlations
subset = df[['age', 'salary']]
profile_subset = ProfileReport(subset, title="Subset Profile", correlations={"pearson": True})
profile_subset.to_file("subset_profile.html")

# -------- GREAT EXPECTATIONS --------
context = get_context()
suite_name = "sample_suite"
context.add_or_update_expectation_suite(suite_name)

datasource_name = "memory_datasource"
context.add_datasource(
    name=datasource_name,
    class_name="Datasource",
    execution_engine={"class_name": "PandasExecutionEngine"},
    data_connectors={
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"]
        }
    },
)

batch_request = {
    "datasource_name": datasource_name,
    "data_connector_name": "default_runtime_data_connector_name",
    "data_asset_name": "in_memory_data",
    "runtime_parameters": {"batch_data": df},
    "batch_identifiers": {"default_identifier_name": "default_id"},
}

validator = context.get_validator(batch_request=batch_request, expectation_suite_name=suite_name)

# Add expectations
validator.expect_column_values_to_be_between("age", min_value=0, max_value=120)
validator.expect_column_values_to_be_unique("id")
validator.expect_column_values_to_not_be_null("salary")
validator.save_expectation_suite(discard_failed_expectations=False)

# Run checkpoint
checkpoint = SimpleCheckpoint(
    name="sample_checkpoint",
    data_context=context,
    validations=[{"batch_request": batch_request, "expectation_suite_name": suite_name}],
)
results = checkpoint.run()

# -------- ALERTING SYSTEM --------
def send_email_alert(subject, body):
    msg = MIMEText(body)
    msg['Subject'] = subject
    msg['From'] = SMTP_USER
    msg['To'] = EMAIL_ALERT

    with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
        server.starttls()
        server.login(SMTP_USER, SMTP_PASSWORD)
        server.send_message(msg)
    logging.info("Email alert sent.")

if not results["success"]:
    logging.error("🚨 Data validation failed.")
    send_email_alert("🚨 Data Quality Alert", "Great Expectations found data issues.")
else:
    logging.info("✅ Data passed all expectations.")

# -------- AI: ISOLATION FOREST --------
def run_isolation_forest(df, columns):
    model = IsolationForest(contamination=0.05, random_state=42)
    X = df[columns].select_dtypes(include=['number']).dropna()
    if X.empty:
        return []
    preds = model.fit_predict(X)
    anomalies = df[preds == -1]
    return anomalies

anomalies = run_isolation_forest(df, df.columns)
if not anomalies.empty:
    logging.warning(f"⚠️ AI detected {len(anomalies)} anomalies.")
    anomalies.to_csv("anomalies.csv", index=False)
    send_email_alert("⚠️ AI Anomaly Alert", f"{len(anomalies)} anomalies detected.")

# -------- AI: RULE-BASED OUTLIERS --------
def detect_outliers(column):
    if column.std() == 0:
        return []
    z = (column - column.mean()) / column.std()
    return column[abs(z) > 3]

outlier_report = {}
for col in df.select_dtypes(include=['number']).columns:
    outliers = detect_outliers(df[col])
    if not outliers.empty:
        outlier_report[col] = len(outliers)

if outlier_report:
    logging.warning(f"📉 Rule-based outliers found: {outlier_report}")
    send_email_alert("📉 Rule-based Outliers", str(outlier_report))



ImportError: cannot import name 'SimpleCheckpoint' from 'great_expectations.checkpoint' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/checkpoint/__init__.py)

### Task 2: Real-time Monitoring of Data Quality

**Steps**:
1. Setting up Alerts for Quality Drops
    - Use the logging library to set up a basic alert on failed expectations.
    - Implementing alerts using email notifications.
    - Using a dashboard like Grafana for visual alerts.
        - Note: Example assumes integration with a monitoring system
        - Alert setup would involve creating a data source and alert rule in Grafana

In [None]:
# Write your code from here

### Task 3: Using AI for Data Quality Monitoring
**Steps**:
1. Basic AI Models for Monitoring
    - Train a simple anomaly detection model using Isolation Forest.
    - Use a simple custom function based AI logic for outlier detection.
    - Creating a monitoring function that utilizes a pre-trained machine learning model.

In [None]:
# Write your code from here