# Validation Playground

**Watch** a [short tutorial video](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#video) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation)

We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [None]:
import json
import great_expectations as ge
from great_expectations.profile import ColumnsExistProfiler
import great_expectations.jupyter_ux
from great_expectations.datasource.types import BatchKwargs
from datetime import datetime

## 1. Get a DataContext
This represents your project that you just created using `great_expectations init`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-datacontext-object)

In [None]:
context = ge.data_context.DataContext()

## 2. List the csvs in your folder

The `DataContext` will now introspect your pandas `Datasource` and list the csvs it finds. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#data-assets)

In [None]:
ge.jupyter_ux.list_available_data_asset_names(context)

## 3. Pick a csv and set the expectation suite name

Internally, Great Expectations represents csvs and dataframes as `DataAsset`s and uses this notion to link them to `Expectation Suites`. To learn more about `DataAssets` and how their names are built, see [the reference](https://docs.great_expectations.io/en/latest/reference/data_context_reference.html#data-asset-names). 

In [None]:
data_asset_name = "YOUR_CSV_FILENAME_ABOVE" # TODO: replace with your value!
normalized_data_asset_name = context.normalize_data_asset_name(data_asset_name)
normalized_data_asset_name

We recommend naming your first expectation suite for a table `warning`. Later, as you identify some of the expectations that you add to this suite as critical, you can move these expectations into another suite and call it `failure`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#choose-data-asset-and-expectation-suite)

In [None]:
expectation_suite_name = "warning" # TODO: replace with your value!

**Notes**

- In a real pipeline you wouldn't create expectations - but we need some to play with validation. 
- For this tutorial `overwrite_existing` is set to `True` so if you happen to have expectations for this table already created you can skip this step

In [None]:
context.create_expectation_suite(data_asset_name=normalized_data_asset_name, expectation_suite_name=expectation_suite_name, overwrite_existing=False);

## 5. Load a batch of data you want to use to create `Expectations`

To learn more about `get_batch` with other data types (such as existing pandas dataframes, SQL tables or Spark), see [this tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-batch)

In [None]:
batch_kwargs = context.yield_batch_kwargs(data_asset_name)
batch = context.get_batch(normalized_data_asset_name, expectation_suite_name, batch_kwargs)
batch.head()

**Note** In a real pipeline you wouldn't create `Expectations`, however, to **play with validation you must have an expectation suite**, so we will create a **very basic suite and save it**. To create a more interesting suite, open the `create_expectations.ipynb` notebook.

In [None]:
ColumnsExistProfiler().profile(batch)
batch.save_expectation_suite()

## 5. Get a pipeline run id

Generate a run id, a timestamp, or a meaningful string that will help you refer to validation results. We recommend they be chronologically sortable.
[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#set-a-run-id)

In [None]:
# Let's make a simple sortable timestamp. Note this could come from your pipeline runner.
run_id = datetime.utcnow().isoformat().replace(":", "") + "Z"
run_id

## 6. Validate the batch

This is the "workhorse" of Great Expectations. Call it in your pipeline code after loading data and just before passing it to your computation.

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#validate)



In [None]:
validation_result = batch.validate(run_id=run_id)

if validation_result["success"]:
    print("This data meets all expectations for {}".format(str(data_asset_name)))
else:
    print("This data is not a valid batch of {}".format(str(data_asset_name)))

## OPTIONAL: Review the JSON validation results

Don't worry - this blob of JSON is meant for machines. Continue on or skip this to see this in Data Docs! If you'd like to learn more about validation results you can [read more in the docs](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#review-validation-results)

In [None]:
# print(json.dumps(validation_result, indent=4))

## 7. Validation Operators

The `validate` method evaluates one batch of data against one expectation suite and returns a dictionary of validation results. This is sufficient when you explore your data and get to know Great Expectations.
When deploying Great Expectations in a **real data pipeline, you will typically discover additional needs**:

* validating a group of batches that are logically related
* validating a batch against several expectation suites such as using a tiered pattern like `warning` and `failure`
* doing something with the validation results (e.g., saving them for a later review, sending notifications in case of failures, etc.).

`Validation Operators` provide a convenient abstraction for both bundling the validation of multiple expectation suites and the actions that should be taken after the validation.

[Read more about Validation Operators](https://docs.greatexpectations.io/en/latest/features/validation_operators_and_actions.html?utm_source=notebook&utm_medium=integrate_validation)


In [None]:
# This is an example of invoking a validation operator that is configured by default in the great_expectations.yml file

results = context.run_validation_operator(
    assets_to_validate=[batch],
    run_id=run_id,
    validation_operator_name="action_list_operator",
)

## 8. View the Validation Results in Data Docs

Let's now build and look at your Data Docs. These will now include an **data quality report** built from the `ValidationResults` you just created that helps you communicate about your data with both machines and humans.

In [None]:
context.open_data_docs()

## Congratulations! You ran Validations!

## Next steps:

### 1. Author more interesting Expectations

Here we used some **extremely basic** `Expectations`. To really harness the power of Great Expectations you can author much more interesting and specific `Expectations` to protect your data pipelines and defeat pipeline debt. Go to [create_expectations.ipynb](create_expectations.ipynb) to see how!

### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.