## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [2]:
import os
import great_expectations as ge
from great_expectations.data_context import FileDataContext
from great_expectations.exceptions import GreatExpectationsError

# === Configuration ===
GE_PATH = "great_expectations"  # Change if your context is elsewhere
CSV_PATH = "data/customers.csv"  # Path to your dataset
CHECKPOINT_NAME = "customer_checkpoint"
SUITE_NAME = "customer_suite"
DATA_ASSET_NAME = "customers.csv"  # Should match your inferred asset name
DATASOURCE_NAME = "my_datasource"
CONNECTOR_NAME = "default_inferred_data_connector_name"


# === Functions ===

def validate_file_exists(filepath):
    if not os.path.exists(filepath):
        raise FileNotFoundError(f"File not found: {filepath}")

def load_context(ge_path):
    try:
        return FileDataContext(ge_path)
    except Exception as e:
        raise RuntimeError(f"Failed to load Great Expectations context: {e}")

def get_validator(context):
    batch_request = {
        "datasource_name": DATASOURCE_NAME,
        "data_connector_name": CONNECTOR_NAME,
        "data_asset_name": DATA_ASSET_NAME,
    }

    try:
        validator = context.get_validator(
            batch_request=batch_request,
            expectation_suite_name=SUITE_NAME
        )
        return validator
    except Exception as e:
        raise RuntimeError(f"Failed to get validator: {e}")

def define_expectations(validator):
    # Example expectations
    validator.expect_column_to_exist("Date")
    validator.expect_column_values_to_match_regex("Date", r"^\d{4}-\d{2}-\d{2}$")
    validator.expect_column_values_to_be_between("Age", 18, 100)
    validator.expect_column_values_to_be_between("Income", 10000, 1000000)

    validator.save_expectation_suite()

def run_checkpoint(context):
    try:
        result = context.run_checkpoint(checkpoint_name=CHECKPOINT_NAME)
        if not result["success"]:
            raise ValueError("Data validation failed!")
        else:
            print("✅ Validation passed.")
    except GreatExpectationsError as e:
        raise RuntimeError(f"GE-specific error: {e}")
    except Exception as e:
        raise RuntimeError(f"General validation error: {e}")

# === Main Execution ===

if __name__ == "__main__":
    try:
        print("📦 Validating file...")
        validate_file_exists(CSV_PATH)

        print("🧠 Loading Great Expectations context...")
        context = load_context(GE_PATH)

        print("📊 Getting validator...")
        validator = get_validator(context)

        print("✅ Defining expectations...")
        define_expectations(validator)

        print("🚀 Running checkpoint...")
        run_checkpoint(context)

        print("📈 Building Data Docs...")
        context.build_data_docs()
        print("📂 Open Data Docs at: great_expectations/uncommitted/data_docs/local_site/index.html")

    except Exception as e:
        print(f"❌ Error occurred: {e}")


📦 Validating file...
❌ Error occurred: File not found: data/customers.csv


2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [3]:
# write your code from here