## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations

### Profile a JSON dataset with product sales data to check for null values in the 'ProductID' and 'Price' fields.
- Create an expectation suite and connect it to the data context.
- Use the `expect_column_values_to_not_be_null` expectation to profile these fields.
- Review the summary to identify any unexpected null values.

In [None]:
# write your code from here

In [None]:
import great_expectations as ge
from great_expectations.data_context import BaseDataContext
import pandas as pd
import os
import json

# Step 1: Create sample JSON data
json_data = [
    {"ProductID": 101, "Price": 20.5},
    {"ProductID": 102, "Price": 35.0},
    {"ProductID": None, "Price": 15.0},     # Null ProductID
    {"ProductID": 104, "Price": None}       # Null Price
]

# Save as JSON file
json_file = "sales_data.json"
with open(json_file, "w") as f:
    json.dump(json_data, f)

# Load JSON into DataFrame
df = pd.read_json(json_file)

# Convert to GE DataFrame
ge_df = ge.from_pandas(df)

# Step 2: Create expectation suite
suite_name = "product_sales_suite"
context = ge.get_context()
context.create_expectation_suite(suite_name, overwrite_existing=True)

# Step 3: Add expectations to check for nulls
ge_df.expect_column_values_to_not_be_null("ProductID")
ge_df.expect_column_values_to_not_be_null("Price")

# Step 4: Validate and review results
results = ge_df.validate(expectation_suite_name=suite_name)
print("\n✅ Validation Success:", results["success"])
print("🔍 Null Value Checks Summary:")
for res in results["results"]:
    print(f"- {res['expectation_config']['expectation_type']} on {res['expectation_config']['kwargs']['column']}: {'✅' if res['success'] else '❌'}")


2. Writing Validation Rules for Data Ingestion

### Define validation rules for an API data source to confirm that 'Status' field contains only predefined statuses ('Active', 'Inactive').

- Apply `expect_column_values_to_be_in_set` to check field values during data ingestion.
- Execute the validation and review any mismatches.

In [None]:
# write your code from here

In [None]:
# Simulated API response data
api_data = [
    {"UserID": 1, "Status": "Active"},
    {"UserID": 2, "Status": "Inactive"},
    {"UserID": 3, "Status": "Pending"},     # Invalid status
    {"UserID": 4, "Status": "Active"}
]

# Convert API data to DataFrame
api_df = pd.DataFrame(api_data)

# Convert to GE DataFrame
ge_api_df = ge.from_pandas(api_df)

# Step 1: Create validation suite
api_suite_name = "status_validation_suite"
context.create_expectation_suite(api_suite_name, overwrite_existing=True)

# Step 2: Add rule for predefined values
valid_statuses = ["Active", "Inactive"]
ge_api_df.expect_column_values_to_be_in_set("Status", valid_statuses)

# Step 3: Run validation
api_validation_results = ge_api_df.validate(expectation_suite_name=api_suite_name)

# Output results
print("\n✅ API Validation Success:", api_validation_results["success"])
print("🔍 Status Value Validation Summary:")
for res in api_validation_results["results"]:
    print(f"- {res['expectation_config']['expectation_type']} on {res['expectation_config']['kwargs']['column']}: {'✅' if res['success'] else '❌'}")
    if not res["success"]:
        unexpected_values = res["result"]["unexpected_list"]
        print(f"  ❌ Unexpected values found: {unexpected_values}")
