## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [None]:
# write your code from here

In [1]:
import great_expectations as ge
from great_expectations.data_context import FileDataContext
import os

# Step 1: Create a temporary GE data context (or use ge init for real project)
context_dir = "great_expectations"
if not os.path.exists(context_dir):
    context = ge.get_context()
else:
    context = FileDataContext(context_dir)

# Step 2: Load the sample dataset
import pandas as pd
df = pd.read_csv("customers.csv")

# Step 3: Create GE Dataset from pandas DataFrame
ge_df = ge.from_pandas(df)

# Step 4: Profile dataset to auto-generate expectations (age and income stats)
suite_name = "profiling_suite"
suite = context.create_expectation_suite(suite_name, overwrite_existing=True)
ge_df.profile(expectation_suite=suite)

# Step 5: Save the expectation suite
context.save_expectation_suite(suite)

# Step 6: Print the expectations for Age and Income
for exp in suite.expectations:
    if exp.kwargs['column'] in ['Age', 'Income']:
        print(exp)


FileNotFoundError: [Errno 2] No such file or directory: 'customers.csv'

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [None]:
# write your code from here

In [2]:
from great_expectations.dataset import PandasDataset

class MyCustomDataset(PandasDataset):
    pass

# Reload the sample CSV
df = pd.read_csv("customers.csv")
ge_df = MyCustomDataset(df)

# Step 1: Add validation rule for 'Date' column using regex (YYYY-MM-DD)
ge_df.expect_column_values_to_match_regex("Date", r"^\d{4}-\d{2}-\d{2}$")

# Step 2: Validate and print results
results = ge_df.validate()
print("Validation success:", results["success"])

# Optional: View detailed validation output
from pprint import pprint
pprint(results["results"])


ModuleNotFoundError: No module named 'great_expectations.dataset'