### GX Core Workflow - Pandas and Ephemeral Data Context
<img src="demo1 hierarchy.png"  width="400" height="400" style="align:right; margin-left:10px;" />

- Import Required Libraries
- Create GX Context (Ephemeral)
- Create DataSource for Pandas DataFrame
- Create Data Asset 
- Create Batch Definition 
- Create Expectations
- Create Expectations Suite and Add expectations
- Create Validation Definition (Expectation Suite, Batch Definition)
- Load Dataset in Pandas Dataframe
- Define Batch Parameters
- Run Validation Definition (Batch Parameters)
- Print Result

### Import Libraries

In [1]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeableNote: you may need to restart the kernel to use updated packages.



In [2]:
pip install great_expectations

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [3]:
import sys
import great_expectations as gx
import pandas as pd

# only works on python 3.9-3.12
print("System Version -",sys.version)
print(gx.__version__)
print("Panda Version -",pd.__version__)

System Version - 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
1.5.1
Panda Version - 2.1.4


### Creating GX Context 

In [4]:
context = gx.get_context(mode="ephemeral")


### Creating DataSource

In [5]:
data_source_name ='temperature_data'
data_source = context.data_sources.add_pandas(name=data_source_name)


### Creating DataAsset


In [6]:
data_asset_name = 'temperature_entity_asset'
data_asset = data_source.add_dataframe_asset(name = data_asset_name)


### Creating BatchDefinition

In [7]:
batch_definition_name ='full_batch'
batch_definition = data_asset.add_batch_definition_whole_dataframe(batch_definition_name)


### Creating Expectations

In [8]:
expectation_temp = gx.expectations.ExpectColumnValuesToBeBetween(column="Temperature", 
                                                            max_value=45, min_value=10)


In [9]:
expectation_city = gx.expectations.ExpectColumnDistinctValuesToBeInSet(column="City", 
                                                            value_set=["Mumbai", "Delhi"])


### Creating ExpectationSuite and adding Expectations

In [10]:
expectation_suite_name = "temperature_data_suite"
expectation_suite_ref = gx.ExpectationSuite(name=expectation_suite_name)
expectation_suite = context.suites.add(expectation_suite_ref)


In [11]:
#Adding Expectation to ExpectationSuite    
expectation_suite.add_expectation(expectation_temp)
expectation_suite.add_expectation(expectation_city)


ExpectColumnDistinctValuesToBeInSet(id='07f04538-cd43-4faf-b46d-0c45dd9076ca', meta=None, notes=None, result_format=<ResultFormat.BASIC: 'BASIC'>, description=None, catch_exceptions=False, rendered_content=None, windows=None, batch_id=None, column='City', row_condition=None, condition_parser=None, value_set=['Mumbai', 'Delhi'])

### Creating Validation Definition

In [12]:
validation_def_name = "temperature_data_validation"
validation_definition_ref = gx.ValidationDefinition( data=batch_definition,
                                             suite=expectation_suite,
                                             name=validation_def_name)


In [13]:
validation_definition = context.validation_definitions.add(validation_definition_ref)


### Reading Data in Pandas DataFrame

In [14]:
data_df = pd.read_csv('temperature.csv')
data_df.head()


Unnamed: 0,Date,City,Temperature
0,8/1/2024,Mumbai,28
1,8/2/2024,Mumbai,30
2,8/3/2024,Mumbai,32
3,8/4/2024,Mumbai,31
4,8/5/2024,Mumbai,33


### Creating BatchParameter and running Validation

In [15]:
batch_parameters = {"dataframe": data_df}


In [16]:
validation_result = validation_definition.run(batch_parameters=batch_parameters)


Calculating Metrics:   0%|          | 0/11 [00:00<?, ?it/s]

### Displaying Validation Result

In [17]:
print(validation_result)


{
  "success": true,
  "results": [
    {
      "success": true,
      "expectation_config": {
        "type": "expect_column_values_to_be_between",
        "kwargs": {
          "batch_id": "temperature_data-temperature_entity_asset",
          "column": "Temperature",
          "min_value": 10.0,
          "max_value": 45.0
        },
        "meta": {},
        "id": "895df0a0-46dc-418e-92a2-df60c6f9d1be"
      },
      "result": {
        "element_count": 62,
        "unexpected_count": 0,
        "unexpected_percent": 0.0,
        "partial_unexpected_list": [],
        "missing_count": 0,
        "missing_percent": 0.0,
        "unexpected_percent_total": 0.0,
        "unexpected_percent_nonmissing": 0.0,
        "partial_unexpected_counts": [],
        "partial_unexpected_index_list": []
      },
      "meta": {},
      "exception_info": {
        "raised_exception": false,
        "exception_traceback": null,
        "exception_message": null
      }
    },
    {
      "success"