## Automate Data Quality Checks with Great Expectations
**Introduction**: In this activity, you will learn how to automate data quality checks using the Great Expectations framework. This includes setting up expectations and generating validation reports.

### Task 1: Setup and Initial Expectations

1. Objective: Set up Great Expectations and create initial expectations for a dataset.
2. Steps:
    - Install Great Expectations using pip.
    - Initialize a data context.
    - Create basic expectations on a sample dataset.
    - Eg., Implement a basic setup and expectation for column presence and type.

In [3]:
pip install great_expectations

Note: you may need to restart the kernel to use updated packages.


In [4]:
import pandas as pd
import great_expectations as gx
import io

# Our sample CSV data
csv_data = """Name,Email,Age
Alice,alice@example.com,30
Bob,bob@example.com,25
Charlie,charlie@example.com,
David,,40
Eve,eve.example.com,35
"""

# Load the data into a pandas DataFrame
df = pd.read_csv(io.StringIO(csv_data))

# Get the Great Expectations Data Context
context = gx.get_context()

# Create a Great Expectations Datasource (if you haven't already)
# For a simple in-memory DataFrame, you can skip this and directly create a Batch
datasource_name = "my_pandas_datasource"
if datasource_name not in context.list_datasources():
    pandas_datasource = context.add_pandas(name=datasource_name)

# Create a Batch of data
batch_kwargs = {"dataframe": df, "datasource": datasource_name, "name": "my_data_batch"}
batch = context.get_batch(**batch_kwargs)

# Create an Expectation Suite (a collection of expectations)
expectation_suite_name = "my_initial_expectation_suite"
suite = context.create_expectation_suite(
    expectation_suite_name=expectation_suite_name, overwrite_existing=True
)

# Add expectations to the suite

# Expect the 'Name' column to be present
suite.expect_column_to_exist("Name")

# Expect the 'Email' column to be present
suite.expect_column_to_exist("Email")

# Expect the 'Age' column to be present
suite.expect_column_to_exist("Age")

# Expect the 'Age' column to be of integer type (though it might have a NaN, so we handle that)
suite.expect_column_values_to_be_in_type_list(column="Age", type_list=["INTEGER", "FLOAT"])

# Save the Expectation Suite
context.save_expectation_suite(expectation_suite=suite, expectation_suite_name=expectation_suite_name)

print(f"Expectation suite '{expectation_suite_name}' created and saved.")

AttributeError: 'EphemeralDataContext' object has no attribute 'add_pandas'

### Task 2: Validate Datasets and Generate Reports

1. Objective: Validate a dataset against defined expectations and generate a report.
2. Steps:
    - Execute the validation process on the dataset.
    - Review the validation results and generate a report.
    - Eg., Validate completeness and consistency expectations, and view the results.


In [None]:
# Write your code from here

### Task 3: Advanced Expectations and Scheduling

1. Objective: Create advanced expectations for conditional checks and automate the validation.
2. Steps:
    - Define advanced expectations based on complex conditions.
    - Use scheduling tools to automate periodic checks.
    - E.g., an expectation that customer IDs must be unique and schedule a daily check.

In [None]:
# Write your code from here