In [None]:
import json
import os
import great_expectations as ge
import great_expectations.jupyter_ux
import pandas as pd

# Author Expectations



Watch a [short tutorial video](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#video) or read [the written tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations)

We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)


## 1. Get a DataContext.
This represents your project that you set up using `great_expectations init`. [Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-datacontext-object)


In [1]:
context = ge.data_context.DataContext()

NameError: name 'ge' is not defined

## 2. List data assets in your project

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#data-assets)


In [None]:
great_expectations.jupyter_ux.list_available_data_asset_names(context)

## 3. Pick a data asset and set the expectation suite name

Your data_asset_name will consist of three components: a datasource, generator, and generator_asset, but you can usually provide only some of them as long as the name is not ambiguous. See [more in the reference](https://docs.great_expectations.io/en/latest/reference/data_context_reference.html#data-asset-names). 

In [None]:
data_asset_name = "npidata_pfile" # TODO: replace with your value!
data_asset_name = context.normalize_data_asset_name(data_asset_name)

We recommend you name your first expectation suite for a given data asset `warning`. Later, as you identify some of the expectations that you add to this suite as critical, you can move these expectations into another suite and call it `failure`.

In [None]:
expectation_suite_name = "warning" # TODO: replace with your value!

## 4. Create the new expectation suite

In [None]:
context.create_expectation_suite(data_asset_name=data_asset_name, expectation_suite_name=expectation_suite_name)

## 5. Load a batch of data from the data asset you want to validate

Learn about `get_batch` in [this tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-batch)

BatchKwargs define the specific batch of data a datasource should fetch. Depending on the datasource, batch kwargs might include a filepath, table name, SQL query, or even an existing DataFrame loaded outside of Great Expectations. Choose the best batch_kwargs for your situation.

In [None]:
# If you're working with data loaded from files, and GE listed and profiled your files correctly:
batch_kwargs = context.yield_batch_kwargs(data_asset_name)

# If you would like to validate data in a database, using an entire table or view:
# batch_kwargs = {'table': 'name_of_table_to_validate'}  # Add a 'schema' key if you need to specify that explicitly

# If you would like to validate data in a database, using a query to construct a temporary table:
# batch_kwargs = {'query': 'SELECT YOUR_ROWS FROM YOUR_TABLE'}

# If you would like to control reading of data outside of Great Expectations, and provide a pre-built Dataframe:
# df = spark.read.csv(...)
# df = pd.readcsv(...)
# batch_kwargs = {'dataset': df}

In [None]:
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name,
                          batch_kwargs)
batch.head()

#### Optionally, customize and review batch options
BatchKwargs are extremely flexible and allow you to specify additional information to use when building the batch, such as filetypes, delimiters, headers, or other parameters. You can add additional batch_kwargs when you build the kwargs or when you call `get_batch`.
[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#reader-options)


In [None]:
# See the batch kwargs used to load your batch
batch.batch_kwargs

In [None]:
# The datasource can add and store additional identifying information to ensure you can track a batch through
# your pipeline
batch.batch_id

## 6. Author Expectations

With a batch, you can add expectations by calling specific expectations.

See available expectations in the [expectation glossary](https://docs.greatexpectations.io/en/latest/glossary.html?utm_source=notebook&utm_medium=create_expectations).
You can also see available expectations by hovering over data elements in the HTML page generated by profiling your dataset.

Below is an example expectation that checks if the values in the batch's first column are null.

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#create-expectations)

In [None]:
column_name = batch.get_table_columns()[0]
batch.expect_column_values_to_not_be_null(column_name)

Add more expectations here. **Hint** start with `batch.expect_` and hit tab for Jupyter's autocomplete to see all the expectations!

In [None]:
batch.expect_

## 7. Review and save your Expectations

Expectations that are `True` on this data batch are added automatically. To view all the expectations you added so far about this data asset, run the cell below.

In [None]:
batch.get_expectation_suite()

    
    
If you decide not to save some expectations that you created, use [remove_expectaton method](https://docs.greatexpectations.io/en/latest/module_docs/data_asset_module.html?highlight=remove_expectation&utm_source=notebook&utm_medium=create_expectations#great_expectations.data_asset.data_asset.DataAsset.remove_expectation). You can also choose not to filter expectations that were `False` on this batch.


The following method will save the expectation suite as a JSON file in the `great_expectations/expectations` directory of your project:
    

In [None]:
batch.save_expectation_suite()

## Congratulations! You created and saved expectations for at least one of your data assets.

## Next steps:

### 1. Data Docs
Jump back to the command line and run `great_expectations build-docs` to see your Data Docs. These are created from the expectations you just made and help you understand and communicate about your data.
### 2. Validation
Validation is the process of checking if new batches of this data meet to your expectations before they are processed by your pipeline.
### Go to [integrate_validation_into_pipeline.ipynb](integrate_validation_into_pipeline.ipynb) to see how!
