### Implementing Basic Data Validation with Great Expectations
**Description**: Set up a simple data validation using Great Expectations to check the completeness of a dataset.

**Steps**:
1. Installation
2. Initialize Great Expectations
3. Create a Data Context in Python
4. Create an Expectation Suite
5. Load Sample Data and Validate Completeness
6. Run Validations

In [None]:
# write your code from here


In [1]:
pip install great_expectations


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import great_expectations as ge
from great_expectations.core.batch import BatchRequest
from great_expectations.data_context import get_context

# Step 1: Create sample data (simulated dataset with some missing values)
df = pd.DataFrame({
    "patient_id": [1, 2, 3, 4, 5],
    "age": [25, 38, None, 45, 60],
    "cholesterol": [180, 190, 210, None, 230],
    "diagnosis": ["diabetes", "none", "hypertension", "diabetes", None]
})

# Step 2: Set up an in-memory data context (no filesystem use)
context = get_context()

# Step 3: Define a data source name and add it to the context
datasource_name = "pandas_datasource"
context.add_datasource(
    name=datasource_name,
    class_name="Datasource",
    execution_engine={"class_name": "PandasExecutionEngine"},
    data_connectors={
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        }
    },
)

# Step 4: Create Expectation Suite
suite_name = "completeness_suite"
context.add_or_update_expectation_suite(expectation_suite_name=suite_name)

# Step 5: Create batch request
batch_request = BatchRequest(
    datasource_name=datasource_name,
    data_connector_name="default_runtime_data_connector_name",
    data_asset_name="healthcare_data",
    runtime_parameters={"batch_data": df},
    batch_identifiers={"default_identifier_name": "default_id"},
)

# Step 6: Get validator and add expectations
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=suite_name,
)

# Add completeness (non-null) expectations
validator.expect_column_values_to_not_be_null("patient_id")
validator.expect_column_values_to_not_be_null("age")
validator.expect_column_values_to_not_be_null("cholesterol")
validator.expect_column_values_to_not_be_null("diagnosis")

# Save the suite with added expectations
validator.save_expectation_suite(discard_failed_expectations=False)

# Step 7: Run validation
results = context.run_validation_operator(
    "action_list_operator", assets_to_validate=[validator]
)

# Display results
import json
print(json.dumps(results["run_results"], indent=2, default=str))


DataContextError: Datasource is not a FluentDatasource