## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [1]:
import great_expectations as gx
import pandas as pd

# 1. Load the dataset using Great Expectations and create a Data Context
# Assuming your CSV file is in the same directory as this script, or provide the full path.
csv_file_path = "customer_data.csv"  # Replace with your actual CSV file name

context = gx.DataContext()

# Create a Pandas DataFrame Data Source
datasource_name = "my_pandas_datasource"
datasource = context.sources.add_pandas(name=datasource_name)

# Create a Data Asset for your CSV file
data_asset_name = "customer_data_asset"
data_asset = datasource.add_csv_asset(name=data_asset_name, filepath=csv_file_path)

batch_request = data_asset.build_batch_request()

# 2. Generate a Data Asset to inspect the summary statistics.
# Great Expectations will automatically infer a default Expectation Suite if you don't specify one.
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name="profiling_expectation_suite",  # You can name your suite
)

print(f"Created validator for data asset: {validator.active_batch.data_asset.name}")
print(f"Using Expectation Suite: {validator.expectation_suite.name}")

# Run the default profiler to get summary statistics
profiler_result = validator.profile(columns=["Age", "Income"])

# Print the profiling results (optional - more detailed in Data Docs)
print("\nProfiling Results for 'Age' and 'Income':")
for column_result in profiler_result["results"]:
    if column_result["expectation_config"]["kwargs"].get("column") in ["Age", "Income"]:
        print(f"\nColumn: {column_result['expectation_config']['kwargs'].get('column')}")
        for result in column_result["result"]:
            print(f"  - {result['expectation']}: {result['details']}")

# 3. View the generated Expectation Suite to analyze data distributions.
# The profiling process automatically generates expectations about your data.
# You can now open the Data Docs to see a user-friendly visualization of these expectations,
# including distribution charts (histograms, etc.) for the 'Age' and 'Income' columns.

print("\nTo view the generated Expectation Suite with data distribution visualizations:")
print(f"- Navigate to your Great Expectations Data Context directory (where your `great_expectations.yml` is).")
print("- Run the command: `great_expectations docs build`")
print("- Open the generated `index.html` file in your `great_expectations/uncommitted/data_docs/local_site/index.html` directory in your web browser.")
print("- Find your Expectation Suite named 'profiling_expectation_suite' and inspect the expectations generated for the 'Age' and 'Income' columns.")
print("- Look for expectations like 'expect_column_values_to_be_in_type_list', 'expect_column_min_to_be_between', 'expect_column_max_to_be_between', 'expect_column_mean_to_be_between', and importantly, 'expect_column_values_to_be_in_quantile_ranges' and 'expect_column_value_counts' which provide insights into the distribution.")

AttributeError: module 'great_expectations' has no attribute 'DataContext'

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [None]:
# write your code from here