# Validation Playground

**Watch** a [short tutorial video](https://greatexpectations.io/videos/getting_started/integrate_expectations) or **read** [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data)

We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [None]:
import json
import great_expectations as ge
from great_expectations.profile import ColumnsExistProfiler
import great_expectations.jupyter_ux
from great_expectations.datasource.types import BatchKwargs
from datetime import datetime

## 1. Get a DataContext
This represents your **project** that you just created using `great_expectations init`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#get-a-datacontext-object)

In [None]:
context = ge.data_context.DataContext()

## 2. List the CSVs in your folder

The `DataContext` will now introspect your pyspark `Datasource` and list the CSVs it finds. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#list-data-assets)

In [None]:
ge.jupyter_ux.list_available_data_asset_names(context)

## 3. Pick a csv and the expectation suite

Internally, Great Expectations represents csvs and dataframes as `DataAsset`s and uses this notion to link them to `Expectation Suites`. [Read more about the validate method in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#pick-a-data-asset-and-expectation-suite)


In [None]:
data_asset_name = "ONE_OF_THE_CSV_DATA_ASSET_NAMES_FROM_ABOVE" # TODO: replace with your value!
normalized_data_asset_name = context.normalize_data_asset_name(data_asset_name)
normalized_data_asset_name

We recommend naming your first expectation suite for a table `warning`. Later, as you identify some of the expectations that you add to this suite as critical, you can move these expectations into another suite and call it `failure`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=integrate_validation#choose-data-asset-and-expectation-suite)

In [None]:
expectation_suite_name = "warning" # TODO: replace with your value!

#### 3.a. If you don't have an expectation suite, let's create a simple one

You need expectations to validate your data. Expectations are grouped into Expectation Suites. 

If you don't have an expectation suite for this data asset, the notebook's next cell will create a suite of very basic expectations, so that you have some expectations to play with. The expectation suite will have `expect_column_to_exist` expectations for each column.

If you created an expectation suite for this data asset, you can skip executing the next cell (if you execute it, it will do nothing).

To create a more interesting suite, open the [create_expectations.ipynb](create_expectations.ipynb) notebook.



In [None]:
try:
    context.get_expectation_suite(normalized_data_asset_name, expectation_suite_name)
except great_expectations.exceptions.DataContextError:
    context.create_expectation_suite(data_asset_name=normalized_data_asset_name, expectation_suite_name=expectation_suite_name, overwrite_existing=True);
    batch_kwargs = context.yield_batch_kwargs(data_asset_name)
    batch = context.get_batch(normalized_data_asset_name, expectation_suite_name, batch_kwargs)
    ColumnsExistProfiler().profile(batch)
    batch.save_expectation_suite()
    expectation_suite = context.get_expectation_suite(normalized_data_asset_name, expectation_suite_name)
    context.build_data_docs()


## 4. Load a batch of data you want to validate

To learn more about `get_batch` with other data types (such as existing pandas dataframes, SQL tables or Spark), see [this tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#load-a-batch-of-data-to-validate)


In [None]:
batch_kwargs = context.yield_batch_kwargs(data_asset_name)
batch = context.get_batch(normalized_data_asset_name, expectation_suite_name, batch_kwargs)
batch.head()

## 5. Get a pipeline run id

Generate a run id, a timestamp, or a meaningful string that will help you refer to validation results. We recommend they be chronologically sortable.
[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/pipeline_integration.html?utm_source=notebook&utm_medium=validate_data#set-a-run-id)

In [None]:
# Let's make a simple sortable timestamp. Note this could come from your pipeline runner.
run_id = datetime.utcnow().isoformat().replace(":", "") + "Z"
run_id

## 6. Validate the batch

This is the "workhorse" of Great Expectations. Call it in your pipeline code after loading data and just before passing it to your computation.

[Read more about the validate method in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#validate-the-batch)


In [None]:
validation_result = batch.validate(run_id=run_id)

if validation_result["success"]:
    print("This data meets all expectations for {}".format(str(data_asset_name)))
else:
    print("This data is not a valid batch of {}".format(str(data_asset_name)))

## 6.a. OPTIONAL: Review the JSON validation results

Don't worry - this blob of JSON is meant for machines. Continue on or skip this to see this in Data Docs!

In [None]:
# print(json.dumps(validation_result, indent=4))

## 7. Validation Operators

The `validate` method evaluates one batch of data against one expectation suite and returns a dictionary of validation results. This is sufficient when you explore your data and get to know Great Expectations.
When deploying Great Expectations in a **real data pipeline, you will typically discover additional needs**:

* validating a group of batches that are logically related
* validating a batch against several expectation suites such as using a tiered pattern like `warning` and `failure`
* doing something with the validation results (e.g., saving them for a later review, sending notifications in case of failures, etc.).

`Validation Operators` provide a convenient abstraction for both bundling the validation of multiple expectation suites and the actions that should be taken after the validation.

[Read more about Validation Operators in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#save-validation-results)

In [None]:
# This is an example of invoking a validation operator that is configured by default in the great_expectations.yml file

results = context.run_validation_operator(
    assets_to_validate=[batch],
    run_id=run_id,
    validation_operator_name="action_list_operator",
)

## 8. View the Validation Results in Data Docs

Let's now build and look at your Data Docs. These will now include an **data quality report** built from the `ValidationResults` you just created that helps you communicate about your data with both machines and humans.

[Read more about Data Docs in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/validate_data.html?utm_source=notebook&utm_medium=validate_data#view-the-validation-results-in-data-docs)

In [None]:
context.open_data_docs()

## Congratulations! You ran Validations!

## Next steps:

### 1. Author more interesting Expectations

Here we used some **extremely basic** `Expectations`. To really harness the power of Great Expectations you can author much more interesting and specific `Expectations` to protect your data pipelines and defeat pipeline debt. Go to [create_expectations.ipynb](create_expectations.ipynb) to see how!

### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.