### Using Great Expectations for Automated Data Checks
**Objective**: Use Great Expectations to perform data validation steps on a dataset.

**Task 1**: Validate Column Existence

**Steps**:
- Load your dataset using a Pandas DataFrame.
- Use Great Expectations to setup an expectation suite.
- Create an expectation to confirm that a specific column (e.g., customer_id ) exists in your dataset.
- Run the expectation and observe the results.

In [None]:
# write your code from here
import great_expectations as ge
import pandas as pd
import os

# Sample DataFrame creation
data = {
    "customer_id": [101, 102, 103],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35]
}
df = pd.DataFrame(data)

# Convert pandas DataFrame to a Great Expectations DataFrame
ge_df = ge.from_pandas(df)

suite_name = "column_existence_suite"

project_root_dir = os.getcwd()
context = ge.get_context(context_root_dir=project_root_dir)

try:
    context.create_expectation_suite(expectation_suite_name=suite_name, overwrite_existing=True)
except Exception as e:
    print(f"Expectation suite creation error: {e}")

# Add expectation to check if 'customer_id' column exists
ge_df.expect_table_columns_to_include(column_list=["customer_id"])

# Save the expectation suite
context.save_expectation_suite(ge_df.get_expectation_suite(), suite_name)

# Run validation manually using the DataFrame and the expectation suite
validation_results = ge_df.validate(expectation_suite=ge_df.get_expectation_suite())

print("Validation results:")
print(f"Success: {validation_results.success}")
for result in validation_results.results:
    if result.expectation_config.expectation_type == "expect_table_columns_to_include":
        print(f"Expectation: {result.expectation_config.expectation_type}")
        print(f"Columns checked: {result.expectation_config.kwargs['column_list']}")
        print(f"Success: {result.success}")

context.build_data_docs()


**Task 2**: Validate Column Data Types

**Steps**:
- Using the same dataset setup, create an expectation to check that a numeric column
(e.g., purchase_amount ) contains only float values.
- Identify a numeric column in your dataset.
- Use Great Expectations to create and validate an expectation that checks the column's data type is correct.
- Run your expectation and check if it passes for your data.

In [None]:
# write your code from here
import great_expectations as ge
import pandas as pd
import os

data = {
    "purchase_amount": [23.5, 45.0, 15.75, 100.0],
    "customer_id": [101, 102, 103, 104]
}
df = pd.DataFrame(data)

ge_df = ge.from_pandas(df)

suite_name = "column_dtype_suite"

project_root_dir = os.getcwd()
context = ge.get_context(context_root_dir=project_root_dir)

try:
    context.create_expectation_suite(expectation_suite_name=suite_name, overwrite_existing=True)
except Exception as e:
    print(f"Expectation suite creation error: {e}")

# Create expectation to check purchase_amount column values are floats
ge_df.expect_column_values_to_be_of_type("purchase_amount", "float")

context.save_expectation_suite(ge_df.get_expectation_suite(), suite_name)

validation_results = ge_df.validate(expectation_suite=ge_df.get_expectation_suite())

print("Validation results:")
print(f"Success: {validation_results.success}")
for result in validation_results.results:
    if result.expectation_config.expectation_type == "expect_column_values_to_be_of_type":
        print(f"Expectation: {result.expectation_config.expectation_type}")
        print(f"Column: {result.expectation_config.kwargs['column']}")
        print(f"Expected type: {result.expectation_config.kwargs['type_']}")
        print(f"Success: {result.success}")

context.build_data_docs()


**Task 3**: Validate Range of Values

**Steps**:
- Set an expectation using Great Expectations to ensure that a column (e.g., age ) values
are between 18 and 65.
- Identify a column in your dataset where values fall within a specific range.
- Implement a range-based expectation to check this column and validate your dataset.
- Observe and interpret the result of your expectation.

In [None]:
# write your code from here
import great_expectations as ge
import pandas as pd
import os

data = {
    "age": [22, 30, 45, 60, 17, 70],
    "customer_id": [101, 102, 103, 104, 105, 106]
}
df = pd.DataFrame(data)

ge_df = ge.from_pandas(df)

suite_name = "age_range_suite"

project_root_dir = os.getcwd()
context = ge.get_context(context_root_dir=project_root_dir)

try:
    context.create_expectation_suite(expectation_suite_name=suite_name, overwrite_existing=True)
except Exception as e:
    print(f"Expectation suite creation error: {e}")

ge_df.expect_column_values_to_be_between("age", min_value=18, max_value=65)

context.save_expectation_suite(ge_df.get_expectation_suite(), suite_name)

validation_results = ge_df.validate(expectation_suite=ge_df.get_expectation_suite())

print("Validation results:")
print(f"Success: {validation_results.success}")
for result in validation_results.results:
    if result.expectation_config.expectation_type == "expect_column_values_to_be_between":
        print(f"Expectation: {result.expectation_config.expectation_type}")
        print(f"Column: {result.expectation_config.kwargs['column']}")
        print(f"Range: {result.expectation_config.kwargs['min_value']} - {result.expectation_config.kwargs['max_value']}")
        print(f"Success: {result.success}")

context.build_data_docs()
