### Using Great Expectations for Automated Data Checks
**Objective**: Use Great Expectations to perform data validation steps on a dataset.

**Task 1**: Validate Column Existence

**Steps**:
- Load your dataset using a Pandas DataFrame.
- Use Great Expectations to setup an expectation suite.
- Create an expectation to confirm that a specific column (e.g., customer_id ) exists in your dataset.
- Run the expectation and observe the results.

In [2]:
import pandas as pd
import great_expectations as ge

def load_data():
    """Safely load data and handle missing or empty cases."""
    try:
        # Sample dataset
        data = {
            'customer_id': [101, 102, 103],
            'name': ['Alice', 'Bob', 'Charlie'],
            'age': [25, 30, 35]
        }
        df = pd.DataFrame(data)
        if df.empty:
            raise ValueError("DataFrame is empty.")
        return df
    except Exception as e:
        print(f"Error loading data: {e}")
        return None

def validate_column_existence(df, column_name):
    """Validates that the given column exists in the DataFrame."""
    if column_name not in df.columns:
        raise KeyError(f"Column '{column_name}' not found in the dataset.")
    
    ge_df = ge.from_pandas(df)
    ge_df.create_expectation_suite("column_validation_suite", overwrite_existing=True)
    ge_df.expect_table_columns_to_include([column_name])
    results = ge_df.validate()
    return results

def run_validation_pipeline():
    df = load_data()
    if df is not None:
        try:
            result = validate_column_existence(df, 'customer_id')
            print("Validation Results:")
            print(result)
        except Exception as e:
            print(f"Validation failed: {e}")
    else:
        print("No data to validate.")

def test_validation():
    """Basic unit test simulation."""
    df_good = load_data()
    assert df_good is not None, "Data should load properly."

    # Valid column
    try:
        result = validate_column_existence(df_good, 'customer_id')
        assert result["success"] is True, "Column should exist and validation should pass."
        print("Test passed: customer_id column exists.")
    except Exception as e:
        print(f"Test failed: {e}")

    # Invalid column
    try:
        validate_column_existence(df_good, 'nonexistent_column')
        print("Test failed: Nonexistent column should raise error.")
    except KeyError:
        print("Test passed: Correctly handled missing column.")

# Run the validation and tests
run_validation_pipeline()
test_validation()


Validation failed: module 'great_expectations' has no attribute 'from_pandas'
Test failed: module 'great_expectations' has no attribute 'from_pandas'
Test passed: Correctly handled missing column.


**Task 2**: Validate Column Data Types

**Steps**:
- Using the same dataset setup, create an expectation to check that a numeric column
(e.g., purchase_amount ) contains only float values.
- Identify a numeric column in your dataset.
- Use Great Expectations to create and validate an expectation that checks the column's data type is correct.
- Run your expectation and check if it passes for your data.

In [3]:
# write your code from here

**Task 3**: Validate Range of Values

**Steps**:
- Set an expectation using Great Expectations to ensure that a column (e.g., age ) values
are between 18 and 65.
- Identify a column in your dataset where values fall within a specific range.
- Implement a range-based expectation to check this column and validate your dataset.
- Observe and interpret the result of your expectation.

In [4]:
# write your code from here