### Using Great Expectations for Automated Data Checks
**Objective**: Use Great Expectations to perform data validation steps on a dataset.

**Task 1**: Validate Column Existence

**Steps**:
- Load your dataset using a Pandas DataFrame.
- Use Great Expectations to setup an expectation suite.
- Create an expectation to confirm that a specific column (e.g., customer_id ) exists in your dataset.
- Run the expectation and observe the results.

In [None]:
# write your code from here

In [1]:
import great_expectations as ge
from great_expectations.data_context import BaseDataContext
import pandas as pd
import os
import json

# Step 1: Create sample JSON data
json_data = [
    {"ProductID": 101, "Price": 20.5},
    {"ProductID": 102, "Price": 35.0},
    {"ProductID": None, "Price": 15.0},     # Null ProductID
    {"ProductID": 104, "Price": None}       # Null Price
]

# Save as JSON file
json_file = "sales_data.json"
with open(json_file, "w") as f:
    json.dump(json_data, f)

# Load JSON into DataFrame
df = pd.read_json(json_file)

# Convert to GE DataFrame
ge_df = ge.from_pandas(df)

# Step 2: Create expectation suite
suite_name = "product_sales_suite"
context = ge.get_context()
context.create_expectation_suite(suite_name, overwrite_existing=True)

# Step 3: Add expectations to check for nulls
ge_df.expect_column_values_to_not_be_null("ProductID")
ge_df.expect_column_values_to_not_be_null("Price")

# Step 4: Validate and review results
results = ge_df.validate(expectation_suite_name=suite_name)
print("\n✅ Validation Success:", results["success"])
print("🔍 Null value Checks Summary:")
for res in results["results"]:
    print(f"- {res['expectation_config']['expectation_type']} on {res['expectation_config']['kwargs']['column']}: {'✅' if res['success'] else '❌'}")


ImportError: cannot import name 'BaseDataContext' from 'great_expectations.data_context' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/data_context/__init__.py)

**Task 2**: Validate Column Data Types

**Steps**:
- Using the same dataset setup, create an expectation to check that a numeric column
(e.g., purchase_amount ) contains only float values.
- Identify a numeric column in your dataset.
- Use Great Expectations to create and validate an expectation that checks the column's data type is correct.
- Run your expectation and check if it passes for your data.

In [None]:
# write your code from here

**Task 3**: Validate Range of Values

**Steps**:
- Set an expectation using Great Expectations to ensure that a column (e.g., age ) values
are between 18 and 65.
- Identify a column in your dataset where values fall within a specific range.
- Implement a range-based expectation to check this column and validate your dataset.
- Observe and interpret the result of your expectation.

In [None]:
# write your code from here

In [2]:
# Simulated API response data
api_data = [
    {"UserID": 1, "Status": "Active"},
    {"UserID": 2, "Status": "Inactive"},
    {"UserID": 3, "Status": "Pending"},     # Invalid status
    {"UserID": 4, "Status": "Active"}
]

# Convert API data to DataFrame
api_df = pd.DataFrame(api_data)

# Convert to GE DataFrame
ge_api_df = ge.from_pandas(api_df)

# Step 1: Create validation suite
api_suite_name = "status_validation_suite"
context.create_expectation_suite(api_suite_name, overwrite_existing=True)

# Step 2: Add rule for predefined values
valid_statuses = ["Active", "Inactive"]
ge_api_df.expect_column_values_to_be_in_set("Status", valid_statuses)

# Step 3: Run validation
api_validation_results = ge_api_df.validate(expectation_suite_name=api_suite_name)

# Output results
print("\n✅ API Validation Success:", api_validation_results["success"])
print("🔍 Status Value Validation Summary:")
for res in api_validation_results["results"]:
    print(f"- {res['expectation_config']['expectation_type']} on {res['expectation_config']['kwargs']['column']}: {'✅' if res['success'] else '❌'}")
    if not res["success"]:
        unexpected_values = res["result"]["unexpected_list"]
        print(f"  ❌ Unexpected values found: {unexpected_values}")


NameError: name 'pd' is not defined