# Author Expectations

Watch a [short tutorial video](https://greatexpectations.io/videos/getting_started/create_expectations?utm_source=notebook&utm_medium=create_expectations) or read [the written tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations)

We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [1]:
import json
import os
import great_expectations as ge
import great_expectations.jupyter_ux
import pandas as pd

2020-02-05T10:53:42-0500 - INFO - Great Expectations logging enabled at INFO level by JupyterUX module.


## 1. Get a DataContext
This represents your **project** that you just created using `great_expectations init`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-a-datacontext-object)

In [2]:
context = ge.data_context.DataContext()

2020-02-05T10:53:42-0500 - INFO - Using project config: /Users/sam/code/demo_pipeline/great_expectations/great_expectations.yml


## 2. List the CSVs in your folder

The `DataContext` will now introspect your pandas `Datasource` and list the CSVs it finds. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#list-data-assets)

In [3]:
great_expectations.jupyter_ux.list_available_data_asset_names(context)

Inspecting your data sources. This may take a moment...


## 3. Pick a CSV and set the expectation suite name

Internally, Great Expectations represents CSVs and dataframes as `DataAsset`s and uses this notion to link them to `Expectation Suites`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#pick-a-data-asset-and-set-the-expectation-suite-name)


In [17]:
data_asset_name = "abbr-name-list" # TODO: replace with your value!
normalized_data_asset_name = context.normalize_data_asset_name(data_asset_name)
normalized_data_asset_name

NormalizedDataAssetName(datasource='data__dir', generator='default', generator_asset='abbr-name-list')

We recommend naming your first expectation suite for a table `warning`. Later, as you identify some of the expectations that you add to this suite as critical, you can move these expectations into another suite and call it `failure`.

In [18]:
expectation_suite_name = "warning" # TODO: replace with your value!

## 4. Create a new empty expectation suite

In [19]:
context.create_expectation_suite(data_asset_name=data_asset_name, expectation_suite_name=expectation_suite_name, overwrite_existing=True)

{'data_asset_name': 'data__dir/default/abbr-name-list',
 'meta': {'great_expectations.__version__': '0.8.7'},
 'expectations': []}

## 5. Load a batch of data you want to use to create `Expectations`

To learn more about `get_batch` with other data types (such as existing pandas dataframes, SQL tables or Spark), see [this tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#load-a-batch-of-data-to-create-expectations)

In [20]:
batch_kwargs = context.yield_batch_kwargs(data_asset_name)

Load a batch of data and take a peek at the first few rows.

In [21]:
batch = context.get_batch(data_asset_name, expectation_suite_name, batch_kwargs)
batch.head()

Unnamed: 0,name,abbreviation
0,Alabama,AL
1,Alaska,AK
2,American Samoa,AS
3,Arizona,AZ
4,Arkansas,AR


#### Optionally, customize and review batch options

`BatchKwargs` are extremely flexible - to learn more [read the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#load-a-batch-of-data-to-create-expectations)

Here are the batch kwargs used to load your batch

In [9]:
batch.batch_kwargs

{'path': '/Users/sam/code/demo_pipeline/great_expectations/../data/npi_small_2019.csv',
 'partition_id': 'npi_small_2019',
 'reader_options': {'sep': None, 'engine': 'python'}}

In [39]:
# The datasource can add and store additional identifying information to ensure you can track a batch through
# your pipeline
batch.batch_id

{'timestamp': 1580850319.201453,
 'fingerprint': '4162c5c618a552f5649f82dfb677eaae'}

## 6. Author Expectations

With a batch, you can add expectations by calling specific expectation methods. They all begin with `.expect_` which makes autocompleting easy.

See available expectations in the [expectation glossary](https://docs.greatexpectations.io/en/latest/glossary.html?utm_source=notebook&utm_medium=create_expectations).
You can also see available expectations by hovering over data elements in the HTML page generated by profiling your dataset.

Below is an example expectation that checks if the values in the batch's first column are null.

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/tutorials/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#author-expectations)

In [None]:
data_asset = 'npi_small_2019'
batch = context.get_batch(ge.DataAsset('npi_small_2019'))
batch.expect_..

In [22]:
batch.get_table_columns()

['name', 'abbreviation']

Add more expectations here. **Hint** start with `batch.expect_` and hit tab for Jupyter's autocomplete to see all the expectations!

In [24]:
batch.expect_column_values_to_be_unique('name')

{'success': True,
 'result': {'element_count': 59,
  'missing_count': 0,
  'missing_percent': 0.0,
  'unexpected_count': 0,
  'unexpected_percent': 0.0,
  'unexpected_percent_nonmissing': 0.0,
  'partial_unexpected_list': []}}

## 7. Review and save your Expectations

Expectations that are `True` on this data batch are added automatically. Let's view all the expectations you created in machine-readable JSON.

In [13]:
batch.get_expectation_suite()

2020-02-05T10:55:21-0500 - INFO - 	1 expectation(s) included in expectation_suite. result_format settings filtered.


{'data_asset_name': 'data__dir/default/npi_small_2019',
 'meta': {'great_expectations.__version__': '0.8.7'},
 'expectations': [{'expectation_type': 'expect_column_values_to_be_unique',
   'kwargs': {'column': 'NPI'}}],
 'data_asset_type': 'Dataset'}

    
    
If you decide not to save some expectations that you created, use [remove_expectaton method](https://docs.greatexpectations.io/en/latest/module_docs/data_asset_module.html?highlight=remove_expectation&utm_source=notebook&utm_medium=create_expectations#great_expectations.data_asset.data_asset.DataAsset.remove_expectation). You can also choose not to filter expectations that were `False` on this batch.


The following method will save the expectation suite as a JSON file in the `great_expectations/expectations` directory of your project:
    

In [15]:
batch.save_expectation_suite()

2020-02-05T10:55:28-0500 - INFO - 	1 expectation(s) included in expectation_suite. result_format settings filtered.


## 8. View the Expectations in Data Docs

Let's now build and look at your Data Docs. These will now include an **Expectation Suite Overview** built from the expectations you just created that helps you communicate about your data with both machines and humans.

In [16]:
context.build_data_docs()
context.open_data_docs()

## Congratulations! You created and saved Expectations

## Next steps:

### 1. Play with Validation

Validation is the process of checking if new batches of this data meet to your expectations before they are processed by your pipeline. Go to [validation_playground.ipynb](validation_playground.ipynb) to see how!


### 2. Explore the documentation & community

You are now among the elite data professionals who know how to build robust descriptions of your data and protections for pipelines and machine learning models. Join the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack) to see how others are wielding these superpowers.