# Validation Playground

**Watch** a [short tutorial video](https://greatexpectations.io/videos/getting_started/integrate_expectations) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data)

#### This notebook assumes that you created at least one expectation suite in your project.
#### Here you will learn how to validate data in a SQL database against an expectation suite.


We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [1]:
import json
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.datasource.types import BatchKwargs
from datetime import datetime

2020-02-27T15:11:46-0800 - INFO - Great Expectations logging enabled at INFO level by JupyterUX module.


## 1. Get a DataContext
This represents your **project** that you just created using `great_expectations init`.

In [2]:
context = ge.data_context.DataContext()

## 2. Choose an Expectation Suite


In [3]:
# list expectation suites that you created in your project

for expectation_suite_id in context.list_expectation_suites():
    print(expectation_suite_id.expectation_suite_name)

state_abbreviations_file.critical
count_providers_by_state.critical
npi_small_file.critical
npi_small_db_table.critical


In [10]:
expectation_suite_name_file = "npi_small_file.critical" # TODO: set to a name from the list above

## 3. Load a batch of data you want to validate

To learn more about `get_batch`, see [this tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#load-a-batch-of-data-to-validate)


In [11]:
# list datasources of the type SqlAlchemyDatasource in your project
[datasource['name'] for datasource in context.list_datasources()]

['datawarehouse', 'input_files']

In [12]:
datasource_name_file = "input_files"# TODO: set to a datasource name from above

In [14]:
batch_kwargs_file = {"path": '/Users/eugenemandel/projects/ge_tutorials/data/npi_small.csv', 'datasource': datasource_name_file}
batch_file = context.get_batch(batch_kwargs_file, expectation_suite_name_file)
batch_file.head()

Unnamed: 0,NPI,Entity_Type_Code,Organization_Name,Last_Name,First_Name,State,Taxonomy_Code
0,1457900839,2.0,TEXAS CLINIC OF CHIROPRACTIC,,,TX,111N00000X
1,1255519047,1.0,,BRYANT-JONES,MARIA,FL,261QH0700X
2,1366091746,1.0,,JONES,EBONY,DC,3747P1801X
3,1275182651,1.0,,ORNELAS,LUPE,CA,101YA0400X
4,1194371344,1.0,,WINTERS,STACY,MD,363L00000X


In [17]:
expectation_suite_name_db = "npi_small_db_table.critical"
datasource_name_file_db = "datawarehouse"


# If you would like to validate an entire table or view in your database's default schema:
batch_kwargs_db = {'table': "npi_small", 'datasource': datasource_name_file_db}

# # If you would like to validate an entire table or view from a non-default schema in your database:
# batch_kwargs = {'table': "YOUR_TABLE", "schema": "YOUR_SCHEMA", 'datasource': datasource_name}

# If you would like to validate the result set of a query:
# batch_kwargs = {'query': 'SELECT YOUR_ROWS FROM YOUR_TABLE', 'datasource': datasource_name}



batch_db = context.get_batch(batch_kwargs_db, expectation_suite_name_db)
batch_db.head()

2020-02-27T15:17:22-0800 - INFO - 	3 expectation(s) included in expectation_suite.


Unnamed: 0,npi,entity_type_code,organization_name,last_name,first_name,state,taxonomy_code
0,1457900839,2.0,TEXAS CLINIC OF CHIROPRACTIC,,,TX,111N00000X
1,1255519047,1.0,,BRYANT-JONES,MARIA,FL,261QH0700X
2,1366091746,1.0,,JONES,EBONY,DC,3747P1801X
3,1275182651,1.0,,ORNELAS,LUPE,CA,101YA0400X
4,1194371344,1.0,,WINTERS,STACY,MD,363L00000X


## 4. Validate the batch

[Read more about the validate method in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#validate-the-batch)


In [18]:
# validation_result = batch.validate()

# if validation_result["success"]:
#     print("This data meets all expectations in {}".format(expectation_suite_name))
# else:
#     print("This data does not meet some expectations in {}".format(expectation_suite_name))

## 4.a. OPTIONAL: Review the JSON validation results

Don't worry - this blob of JSON is meant for machines. Continue on or skip this to see this in Data Docs!

In [None]:
#validation_result

## 5. Validation Operators

The `validate` method evaluates one batch of data against one expectation suite and returns a dictionary of validation results. This is sufficient when you explore your data and get to know Great Expectations.
When deploying Great Expectations in a **real data pipeline, you will typically discover additional needs**:

* validating a group of batches that are logically related
* validating a batch against several expectation suites such as using a tiered pattern like `warning` and `failure`
* doing something with the validation results (e.g., saving them for a later review, sending notifications in case of failures, etc.).

`Validation Operators` provide a convenient abstraction for both bundling the validation of multiple expectation suites and the actions that should be taken after the validation.

[Read more about Validation Operators in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#save-validation-results)

In [20]:
# This is an example of invoking a validation operator that is configured by default in the great_expectations.yml file

#Generate a run id, a timestamp, or a meaningful string that will help you refer to validation results. We recommend they be chronologically sortable.
# Let's make a simple sortable timestamp. Note this could come from your pipeline runner (e.g., Airflow run id).
run_id = datetime.utcnow().isoformat().replace(":", "") + "Z"

results = context.run_validation_operator(
    "action_list_operator", 
    assets_to_validate=[batch_file, batch_db], 
    run_id=run_id)



2020-02-27T15:19:26-0800 - INFO - 	6 expectation(s) included in expectation_suite.
2020-02-27T15:19:27-0800 - INFO - 	3 expectation(s) included in expectation_suite.


## 6. View the Validation Results in Data Docs

Let's now build and look at your Data Docs. These will now include an **data quality report** built from the `ValidationResults` you just created that helps you communicate about your data with both machines and humans.

[Read more about Data Docs in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

In [None]:
context.open_data_docs()

## Congratulations! You ran Validations!

## Next steps:

### 1. Read about the typical workflow with Great Expectations:

[typical workflow](https://docs.greatexpectations.io/en/latest/getting_started/typical_workflow.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.