### Using Great Expectations for Automated Data Checks
**Objective**: Use Great Expectations to perform data validation steps on a dataset.

**Task 1**: Validate Column Existence

**Steps**:
- Load your dataset using a Pandas DataFrame.
- Use Great Expectations to setup an expectation suite.
- Create an expectation to confirm that a specific column (e.g., customer_id ) exists in your dataset.
- Run the expectation and observe the results.

In [4]:
# write your code from here

**Task 2**: Validate Column Data Types

**Steps**:
- Using the same dataset setup, create an expectation to check that a numeric column
(e.g., purchase_amount ) contains only float values.
- Identify a numeric column in your dataset.
- Use Great Expectations to create and validate an expectation that checks the column's data type is correct.
- Run your expectation and check if it passes for your data.

In [5]:
# write your code from here

**Task 3**: Validate Range of Values

**Steps**:
- Set an expectation using Great Expectations to ensure that a column (e.g., age ) values
are between 18 and 65.
- Identify a column in your dataset where values fall within a specific range.
- Implement a range-based expectation to check this column and validate your dataset.
- Observe and interpret the result of your expectation.

In [6]:
import pandas as pd
import great_expectations as ge

# Load your dataset using Pandas
data = {'customer_id': [1, 2, 3, 4, 5],
        'purchase_amount': [10.5, 20.3, 5.0, 15.7, 22.1],
        'age': [25, 30, 18, 45, 60],
        'product_type': ['A', 'B', 'A', 'C', 'B']}
df = pd.DataFrame(data)

# Create a Great Expectations context
context = ge.get_context()

# Create a datasource (if you haven't already)
datasource_name = "my_pandas_datasource"
if datasource_name not in context.list_datasources():
    context.add_pandas(name=datasource_name, dataframe=df)

# Create a batch of data
batch_kwargs = {"datasource": datasource_name, "name": "my_data_batch"}
batch = context.get_batch(batch_kwargs)

# Create an expectation suite (if you haven't already)
expectation_suite_name = "my_expectation_suite"
if expectation_suite_name not in context.list_expectation_suite_names():
    context.create_expectation_suite(expectation_suite_name=expectation_suite_name)

validator = context.get_validator(
    batch_request=batch.as_batch_request(),
    expectation_suite_name=expectation_suite_name,
)
print(f"Using validator for data: {validator.active_batch_request.batch_spec}")

print("\n--- Task 1: Validate Column Existence ---")
column_to_check = "customer_id"
expectation_result_column_exists = validator.expect_column_to_exist(column=column_to_check)
print(f"Expectation to check if column '{column_to_check}' exists: {expectation_result_column_exists.success}")

print("\n--- Task 2: Validate Column Data Types ---")
column_to_check_type = "purchase_amount"
expected_data_type = "float"
expectation_result_column_type = validator.expect_column_values_to_be_of_type(
    column=column_to_check_type, type_=expected_data_type
)
print(f"Expectation to check if column '{column_to_check_type}' is of type '{expected_data_type}': {expectation_result_column_type.success}")

print("\n--- Task 3: Validate Range of Values ---")
column_to_check_range = "age"
min_value = 18
max_value = 65
expectation_result_column_range = validator.expect_column_values_to_be_between(
    column=column_to_check_range, min_value=min_value, max_value=max_value
)
print(f"Expectation to check if values in column '{column_to_check_range}' are between {min_value} and {max_value}: {expectation_result_column_range.success}")

# Save the expectations (optional, but recommended for persistent validation)
validator.save_expectation_suite()

AttributeError: 'EphemeralDataContext' object has no attribute 'add_pandas'