In [None]:
import json
import os
import great_expectations as ge
import great_expectations.jupyter_ux
import pandas as pd

# Author Expectations



Watch a [short tutorial video](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#video) or read [the written tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations)

We'd love it if you **reach out for help on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)


## 1. Get a DataContext.
This represents your project that you set up using `great_expectations init`. [Read more in the tutorial](https://great-expectations.readthedocs.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-datacontext-object)


In [None]:
context = ge.data_context.DataContext()

## 2. List data assets in your project

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#data-assets)


In [None]:
great_expectations.jupyter_ux.list_available_data_asset_names(context)

## 3. Pick a data asset & set the expectation suite name
We recommend you name your first expectation suite for a given data asset `warning`. Later, as you identify some of the expectations that you add to this suite as critical, you can move these expectations into another suite and call it `failure`.

In [None]:
data_asset_name = "REPLACE ME!" # TODO: replace with your value!
expectation_suite_name = "warning" # TODO: replace with your value!

## 4. Create the new expectation suite

In [None]:
context.create_expectation_suite(data_asset_name=data_asset_name, expectation_suite_name=expectation_suite_name)

## 5. Load a batch of data from the data asset you want to validate

Learn about `get_batch` in [this tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-batch)

__Quick Guide:__

##### If you want to validate data in Pandas Dataframes or in Spark Dataframes:

* A. If GE listed and profiled your files correctly:

```python
data_asset_name = CHOOSE FROM THE LIST ABOVE
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name,
                          context.yield_batch_kwargs(data_asset_name))
```
* B. Otherwise (you want to control the logic of reading the data):

```python
df = load the data into a dataframe, e.g., df = SparkDFDataset(spark.read.csv... or pd.read_csv(...
data_asset_name = COME UP WITH A NAME - THIS WILL CREATE A NEW DATA ASSET.
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name, 
                          df)
```


##### If you want to validate data in a database:

* A. To validate an existing table:

```python
data_asset_name = 'CHOOSE THE NAME OF YOUR TABLE FROM THE LIST OF DATA ASSETS ABOVE'
batch = context.get_batch(data_asset_name, 
                        expectation_suite_name='my_suite'
                        BatchKwargs(table=data_asset_name)) 
```

* B. To validate a query result set:

```python
data_asset_name = 'NAME YOUR QUERY (E.G., daily_users_query) - THIS WILL CREATE A NEW DATA ASSET'
batch = context.get_batch(data_asset_name, 
                        expectation_suite_name='my_suite',
                        BatchKwargs(query='SQL FOR YOUR QUERY'))
```





In [None]:
# COPY THE APPROPRIATE CODE SNIPPET FROM THE CELL ABOVE
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name,
                          context.yield_batch_kwargs(data_asset_name))
batch.head()

#### Optionally, customize batch options (for example for csvs: delimiters, header, etc) in `get_batch`
This is how you can see which data batch was loaded
[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#reader-options)


In [None]:
batch._batch_kwargs

## 6. Author Expectations

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#create-expectations)

See available expectations in the [expectation glossary](https://docs.greatexpectations.io/en/latest/glossary.html?utm_source=notebook&utm_medium=create_expectations).
You can also see available expectations by hovering over data elements in the HTML page generated by profiling your dataset.

Here is an example expectation that checks if the values in the batch's first column are null.

In [None]:
column_name = batch.get_table_columns()[0]
batch.expect_column_values_to_not_be_null(column_name)

Add more expectations here. **Hint** start with `batch.expect_` and hit tab for Jupyter's autocomplete to see all the expectations!

In [None]:
batch.expect_

## 7. Review and save your Expectations

Expectations that are `True` on this data batch are added automatically. To view all the expectations you added so far about this data asset, run the cell below.

In [None]:
batch.get_expectation_suite()

    
    
If you decide not to save some expectations that you created, use [remove_expectaton method](https://docs.greatexpectations.io/en/latest/module_docs/data_asset_module.html?highlight=remove_expectation&utm_source=notebook&utm_medium=create_expectations#great_expectations.data_asset.data_asset.DataAsset.remove_expectation)


The following method will save the expectation suite as a JSON file in the `great_expectations/expectations` directory of your project:
    

In [None]:
batch.save_expectation_suite()

## Congratulations! You created and saved expectations for at least one of your data assets.

## Next steps:

### 1. Data Docs
Jump back to the command line and run `great_expectations build-docs` to see your Data Docs. These are created from the expectations you just made and help you understand and communicate about your data.
### 2. Validation
Validation is the process of checking if new batches of this data meet to your expectations before they are processed by your pipeline.
### Go to [integrate_validation_into_pipeline.ipynb](integrate_validation_into_pipeline.ipynb) to see how!
