# Validation Playground

**Watch** a [short tutorial video](https://greatexpectations.io/videos/getting_started/integrate_expectations) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data)

#### This notebook assumes that you created at least one expectation suite in your project.
#### Here you will learn how to validate data loaded into a Pandas DataFrame against an expectation suite.


We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [37]:
import json
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.datasource.types import BatchKwargs
from datetime import datetime

## 1. Get a DataContext
This represents your **project** that you just created using `great_expectations init`.

In [38]:
context = ge.data_context.DataContext()

## 2. Choose an Expectation Suite


In [39]:
# list expectation suites that you created in your project

for expectation_suite_id in context.list_expectation_suites():
    print(expectation_suite_id.expectation_suite_name)

name
npi
.ipynb_checkpoints.name-checkpoint


In [63]:
expectation_suite_name = 'name' # TODO: set to a name from the list above

## 3. Load a batch of data you want to validate

To learn more about `get_batch`, see [this tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#load-a-batch-of-data-to-validate)


In [64]:
# list datasources of the type PandasDatasource in your project
[datasource['name'] for datasource in context.list_datasources() if datasource['class_name'] == 'PandasDatasource']

['files_datasource']

In [65]:
datasource_name = 'files_datasource' # TODO: set to a datasource name from above

In [69]:
# If you would like to validate a file on a filesystem:
#batch_kwargs = {'path': "YOUR_FILE_PATH", 'datasource': datasource_name}

# If you already loaded the data into a Pandas Data Frame:
import pandas as pd
df = pd.read_csv('../../../npidata_pfile_20190902-20190908.csv', low_memory=False)
batch_kwargs = {'dataset': df, 'datasource': datasource_name}


batch = context.get_batch(batch_kwargs, expectation_suite_name)
batch.head()

Unnamed: 0,NPI,Entity Type Code,Replacement NPI,Employer Identification Number (EIN),Provider Organization Name (Legal Business Name),Provider Last Name (Legal Name),Provider First Name,Provider Middle Name,Provider Name Prefix Text,Provider Name Suffix Text,...,Healthcare Provider Taxonomy Group_6,Healthcare Provider Taxonomy Group_7,Healthcare Provider Taxonomy Group_8,Healthcare Provider Taxonomy Group_9,Healthcare Provider Taxonomy Group_10,Healthcare Provider Taxonomy Group_11,Healthcare Provider Taxonomy Group_12,Healthcare Provider Taxonomy Group_13,Healthcare Provider Taxonomy Group_14,Healthcare Provider Taxonomy Group_15
0,1154982635,2.0,,<UNAVAIL>,"AGAPE BEHAVIORAL HEALTH, PLLC",,,,,,...,,,,,,,,,,
1,1285282392,1.0,,,,YIM,JONATHAN,,,,...,,,,,,,,,,
2,1093363103,1.0,,,,ANDERSON,AARON,,,,...,,,,,,,,,,
3,1720636830,1.0,,,,GROSS,ERIN,E,,,...,,,,,,,,,,
4,1548818651,1.0,,,,REILLY,CASEY,,,,...,,,,,,,,,,


## 4. Validate the batch

[Read more about the validate method in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#validate-the-batch)


In [70]:
validation_result = batch.validate()

if validation_result["success"]:
    print("This data meets all expectations in {}".format(expectation_suite_name))
else:
    print("This data does not meet some expectations in {}".format(expectation_suite_name))

2020-02-29T12:45:03+0000 - INFO - 	6 expectation(s) included in expectation_suite.
This data meets all expectations in name


## 4.a. OPTIONAL: Review the JSON validation results

Don't worry - this blob of JSON is meant for machines. Continue on or skip this to see this in Data Docs!

In [50]:
validation_result

{
  "evaluation_parameters": {},
  "success": false,
  "statistics": {
    "evaluated_expectations": 2,
    "successful_expectations": 1,
    "unsuccessful_expectations": 1,
    "success_percent": 50.0
  },
  "results": [
    {
      "exception_info": {
        "raised_exception": true,
        "exception_message": "TypeError: expect_column_to_exist() got an unexpected keyword argument 'column_list'",
        "exception_traceback": "Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.7/site-packages/great_expectations/data_asset/data_asset.py\", line 225, in wrapper\n    return_obj = func(self, **evaluation_args)\nTypeError: expect_column_to_exist() got an unexpected keyword argument 'column_list'\n"
      },
      "success": false,
      "expectation_config": {
        "expectation_type": "expect_column_to_exist",
        "kwargs": {
          "column_list": [
            "NPI"
          ]
        },
        "meta": {
          "SampleExpectationsDatasetProfiler": {
  

## 5. Validation Operators

The `validate` method evaluates one batch of data against one expectation suite and returns a dictionary of validation results. This is sufficient when you explore your data and get to know Great Expectations.
When deploying Great Expectations in a **real data pipeline, you will typically discover additional needs**:

* validating a group of batches that are logically related
* validating a batch against several expectation suites such as using a tiered pattern like `warning` and `failure`
* doing something with the validation results (e.g., saving them for a later review, sending notifications in case of failures, etc.).

`Validation Operators` provide a convenient abstraction for both bundling the validation of multiple expectation suites and the actions that should be taken after the validation.

[Read more about Validation Operators in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#save-validation-results)

In [48]:
# This is an example of invoking a validation operator that is configured by default in the great_expectations.yml file

#Generate a run id, a timestamp, or a meaningful string that will help you refer to validation results. We recommend they be chronologically sortable.
# Let's make a simple sortable timestamp. Note this could come from your pipeline runner (e.g., Airflow run id).
run_id = datetime.utcnow().isoformat().replace(":", "") + "Z"

results = context.run_validation_operator(
    "action_list_operator", 
    assets_to_validate=[batch], 
    run_id=run_id)



2020-02-29T12:32:30+0000 - INFO - 	2 expectation(s) included in expectation_suite.
2020-02-29T12:32:30+0000 - ERROR - Error running action with name store_evaluation_params
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/great_expectations/validation_operators/validation_operators.py", line 171, in _run_actions
    data_asset=batch
  File "/opt/conda/lib/python3.7/site-packages/great_expectations/validation_operators/actions.py", line 32, in run
    return self._run(validation_result_suite, validation_result_suite_identifier, data_asset, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/great_expectations/validation_operators/actions.py", line 184, in _run
    self.data_context.store_evaluation_parameters(validation_result_suite)
  File "/opt/conda/lib/python3.7/site-packages/great_expectations/data_context/data_context.py", line 773, in store_evaluation_parameters
    self._compile_evaluation_parameter_dependencies()
  File "/opt/conda/lib/pyth

FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/work/great_expectations/expectations/ipynb_checkpoints/name-checkpoint.json'

## 6. View the Validation Results in Data Docs

Let's now build and look at your Data Docs. These will now include an **data quality report** built from the `ValidationResults` you just created that helps you communicate about your data with both machines and humans.

[Read more about Data Docs in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

In [49]:
context.open_data_docs()

## Congratulations! You ran Validations!

## Next steps:

### 1. Read about the typical workflow with Great Expectations:

[typical workflow](https://docs.greatexpectations.io/en/latest/getting_started/typical_workflow.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.