# Cookbook 3: Validate data with GX Core and GX Cloud

This cookbook showcases a data validation workflow characteristic of vetting existing data in an organization's data stores. It could be representative of two groups within an organization enforcing a publisher-subscriber data contract, or users ensuring that data meets the requirements for its intended use.

[Cookbook 1](Cookbook_1_Validate_data_during_ingestion_happy_path.ipynb) and [Cookbook 2](Cookbook_2_Validate_data_during_ingestion_take_action_on_failures.ipynb) explored GX Core workflows that were run within a data pipeline, orchestrated by Airflow. This cookbook introduces [GX Cloud](https://greatexpectations.io/gx-cloud) as an additional tool to store and visualize data validation results and features a hybrid workflow using GX Core, GX Cloud, and Airflow.

This cookbook builds on [Cookbook 1: Validate data during ingestion (happy path)](Cookbook_1_Validate_data_during_ingestion_happy_path.ipynb) and [Cookbook 2: Validate data during ingestion (take action on failures)](Cookbook_2_Validate_data_during_ingestion_take_action_on_failures.ipynb) and focuses on how data validation failures can be programmatically handled in the pipeline based on GX Validation Results. This cookbook assumes basic familiarity with GX Core workflows; for a step-by-step explanation of the GX data validation workflow, refer to [Cookbook 1](Cookbook_1_Validate_data_during_ingestion_happy_path.ipynb) and [Cookbook 2](Cookbook_2_Validate_data_during_ingestion_take_action_on_failures.ipynb).

## Imports

The GX Core content of this cookbook uses the `great_expectations` library.

The `tutorial_code` module contains helper functions used within this notebook and the associated Airflow pipeline.

The `airflow_dags` submodule is included so that you can inspect the code used in the related Airflow DAG directly from this notebook.

In [None]:
import os

import great_expectations as gx
import great_expectations.expectations as gxe
import IPython
import pandas as pd

import tutorial_code as tutorial
import airflow_dags.cookbook2_validate_and_handle_invalid_data as dag

## The GX data quality platform

The Great Expectations data quality platform is comprised by:
* [GX Cloud](https://greatexpectations.io/gx-cloud), a fully managed SaaS solution, with web portal, and
* [GX Core](https://github.com/great-expectations/great_expectations), the open source Python framework.

GX Cloud and GX Core can be used separately for a cloud-only or programmatic-only approach ([Cookbook 1](Cookbook_1_Validate_data_during_ingestion_happy_path.ipynb) and [Cookbook 2](Cookbook_2_Validate_data_during_ingestion_take_action_on_failures.ipynb) are an example of a Core-only workflow). However, using GX Core and GX Cloud *together* provides a solution in which GX Cloud serves as a single source of truth for data quality definition and application, and GX Core enables flexible integration of data validation into existing data stacks. Together, GX Cloud and GX Core enable you to achieve data quality definition, monitoring, and management using UI-based workflows, programmatic workflows, or hybrid workflows.

The diagram below depicts different ways you might opt to use the platform (but is not exhaustive):

In [None]:
IPython.display.Image("img/diagrams/gx_cloud_core_architecture.png", width=900)

## Cookbook workflow

In this cookbook, you will use GX Core, GX Cloud, and Airflow to define data quality for sample data, run data validation, and explore the results of data validation.
1. Define your Data Asset and Expectations programmatically with GX Core
2. Store the GX workflow configuration in your GX Cloud organization
3. Trigger data validation from an Airflow pipeline
4. Explore data validation results in GX Cloud


![Cookbook 3 workflow](https://placehold.co/600x300 "Cookbook 3 workflow")

## Verify GX Cloud credentials are defined

This cookbook persists validation results to GX Cloud, and requires that valid GX Cloud organization credentials are provided.

In [None]:
tutorial.cloud.check_for_gx_cloud_credentials_exist()

```{warning} GX Cloud credential error
If `tutorial.cloud.check_for_gx_cloud_credentials_exist()` rasies a `ValueError` indicating that `GX_CLOUD_ORGANIZATION_ID` or `GX_CLOUD_ACCESS_TOKEN` is undefined, ensure that you have provided your GX Cloud organization id and access token when starting Docker compose.
```

## Examine source data

In [None]:
pd.read_sql_query(
    "select * from customer_profile limit 5", con=tutorial.db.get_cloud_postgres_engine()
)

## Determine Expectations/data tests

## GX validation workflow

### Ephemeral and Cloud Data Contexts

All GX workflows start with the creation of a Data Context. A Data Context is the Python object that serves as an entrypoint for the GX Core Python library, and it also manages the settings and metadata for your GX workflow.

* An **Ephemeral Data Context** stores the configuration of your GX workflow in memory.

  ```
  context = gx.get_context(mode="ephemeral")
  ```

* A **Cloud Data Context** stores the configuration of your GX workflow in GX Cloud. When creating a Cloud Data Context, you need to provide credentials for the specific GX Cloud organization that you want to use.

  ```
  context = gx.get_context(
      mode="cloud",
      cloud_organization_id="<my-gx-cloud-org-id>",
      cloud_access_token="<my-gx-cloud-access-token>"
  )
  ```

The `gx.get_context()` call will auto-discover your GX Cloud organization id and access token credentials if they are available as the `GX_CLOUD_ORGANIZATION_ID` and `GX_CLOUD_ACCESS_TOKEN` environment variables, respectively.

In [None]:
context = gx.get_context()

if (os.getenv("GX_CLOUD_ORGANIZATION_ID", None) is not None) and (os.getenv("GX_CLOUD_ACCESS_TOKEN", None) is not None):
    assert isinstance(context, gx.data_context.CloudDataContext)
    print("GX Cloud credentials found, created CloudDataContext.")