# Validation Playground

**Watch** a [short tutorial video](https://greatexpectations.io/videos/getting_started/integrate_expectations) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data)

#### This notebook assumes that you created at least one expectation suite in your project.
#### Here you will learn how to validate data in a SQL database against an expectation suite.


We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [1]:
import json
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.datasource.types import BatchKwargs
import datetime

2021-01-05T14:21:14-0600 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.


## 1. Get a DataContext
This represents your **project** that you just created using `great_expectations init`.

In [2]:
context = ge.data_context.DataContext()

## 2. Choose an Expectation Suite

List expectation suites that you created in your project

In [3]:
context.list_expectation_suite_names()

['orders.count', 'taxi.demo']

In [4]:
expectation_suite_name = "orders.count" # TODO: set to a name from the list above

## 3. Load a batch of data you want to validate

To learn more about `get_batch`, see [this tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#load-a-batch-of-data-to-validate)


In [5]:
# list datasources of the type SqlAlchemyDatasource in your project
# [datasource['name'] for datasource in context.list_datasources() if datasource['class_name'] == 'SqlAlchemyDatasource']
[datasource['name'] for datasource in context.list_datasources()]

['my_snowflake_db']

In [6]:
datasource_name = "my_snowflake_db" # TODO: set to a datasource name from above

In [7]:
# If you would like to validate an entire table or view in your database's default schema:
# batch_kwargs = {'table': "orders", 'datasource': datasource_name}

# If you would like to validate an entire table or view from a non-default schema in your database:
# batch_kwargs = {'table': "yellow_tripdata_staging", "schema": "ge_tutorials", 'datasource': datasource_name}


# Put Limitantes en la Query, filters and Limits are in this row

# If you would like to validate the result set of a query:
batch_kwargs = {'query': "SELECT * FROM orders WHERE COUNTRY_CODE = 'CO'", 
                'datasource': datasource_name,
                'include_config':True}

batch = context.get_batch(batch_kwargs, expectation_suite_name)

2021-01-05T14:21:20-0600 - INFO - Creating temporary table ge_tmp_52de3645


## 4. Validate the batch with Validation Operators

`Validation Operators` provide a convenient way to bundle the validation of
multiple expectation suites and the actions that should be taken after validation.

When deploying Great Expectations in a **real data pipeline, you will typically discover these needs**:

* validating a group of batches that are logically related
* validating a batch against several expectation suites such as using a tiered pattern like `warning` and `failure`
* doing something with the validation results (e.g., saving them for a later review, sending notifications in case of failures, etc.).

[Read more about Validation Operators in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#save-validation-results)

In [8]:
# This is an example of invoking a validation operator that is configured by default in the great_expectations.yml file

"""
Create a run_id. The run_id must be of type RunIdentifier, with optional run_name and run_time instantiation
arguments (or a dictionary with these keys). The run_name can be any string (this could come from your pipeline
runner, e.g. Airflow run id). The run_time can be either a dateutil parsable string or a datetime object.
Note - any provided datetime will be assumed to be a UTC time. If no instantiation arguments are given, run_name will
be None and run_time will default to the current UTC datetime.
"""

run_id = {
  "run_name": "some_string_that_uniquely_identifies_this_run",  # insert your own run_name here
  "run_time": datetime.datetime.now(datetime.timezone.utc)
}

results = context.run_validation_operator(
            "action_list_operator",
            assets_to_validate=[batch]) \
        .list_validation_results()[0] \
        .to_json_dict()



2021-01-05T14:22:28-0600 - INFO - Setting run_name to: 20210105T202228.472302Z
2021-01-05T14:22:28-0600 - INFO - 	1 expectation(s) included in expectation_suite.


In [9]:
results

{'results': [{'success': False,
   'exception_info': {'raised_exception': False,
    'exception_message': None,
    'exception_traceback': None},
   'result': {'observed_value': {'self': 126740295, 'other': 357158048}},
   'meta': {},
   'expectation_config': {'expectation_type': 'expect_table_row_count_to_equal_other_table',
    'meta': {},
    'kwargs': {'other_table_name': 'orders',
     'result_format': {'result_format': 'SUMMARY'}}}}],
 'success': False,
 'meta': {'great_expectations_version': '0.13.4',
  'expectation_suite_name': 'orders.count',
  'run_id': {'run_time': '2021-01-05T20:22:28.472302+00:00',
   'run_name': '20210105T202228.472302Z'},
  'batch_kwargs': {'query': "SELECT * FROM orders WHERE COUNTRY_CODE = 'CO'",
   'datasource': 'my_snowflake_db',
   'include_config': True},
  'batch_markers': {'ge_load_time': '20210105T202120.261752Z'},
  'batch_parameters': None,
  'validation_time': '20210105T202228.473707Z'},
 'statistics': {'evaluated_expectations': 1,
  'success

**Fields to extract**

- Context
  - DT (Timestamp)
  - Table name
  - Country Code (String)
- Expectations result
  - success (boolean)
  - name (string)
  - results (json)

In [10]:
# Expectations result
success = results['success']
name_expectation = test['expectation_config']['expectation_type']
result_json = results

NameError: name 'test' is not defined

# Results

In [None]:
results['results']

In [None]:
results['success']

## 5. View the Validation Results in Data Docs

Let's now build and look at your Data Docs. These will now include an **data quality report** built from the `ValidationResults` you just created that helps you communicate about your data with both machines and humans.

[Read more about Data Docs in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

In [None]:
context.open_data_docs()

## Congratulations! You ran Validations!

## Next steps:

### 1. Read about the typical workflow with Great Expectations:

[typical workflow](https://docs.greatexpectations.io/en/latest/getting_started/typical_workflow.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.