In [None]:
import json
import os
import great_expectations as ge
import great_expectations.jupyter_ux
import pandas as pd

# Author Expectations



[**Watch a short tutorial video**](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#video)

[**Read more in the tutorial**](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations)

**Reach out for help on** [**Great Expectations Slack**](https://greatexpectations.io/slack)


### Get a DataContext object
[Read more in the tutorial](https://great-expectations.readthedocs.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-datacontext-object)




In [None]:
context = ge.data_context.DataContext()

### List data assets in your project

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#data-assets)


In [None]:
great_expectations.jupyter_ux.list_available_data_asset_names(context)

### Load a batch of data from the data asset you want to validate

Learn about `get_batch` in [this tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#get-batch)

__Quick Guide:__

##### If you want to validate data in Pandas Dataframes or in Spark Dataframes:

* A. If GE listed and profiled your files correctly:

```
data_asset_name = CHOOSE FROM THE LIST ABOVE
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name)
```
* B. Otherwise (you want to control the logic of reading the data):

```
df = load the data into a dataframe, e.g., df = SparkDFDataset(spark.read.csv... or pd.read_csv(...
data_asset_name = COME UP WITH A NAME - THIS WILL CREATE A NEW DATA ASSET.
batch = context.get_batch(data_asset_name, 
                          expectation_suite_name, 
                          df)
```


##### If you want to validate data in a database:

* A. To validate an existing table:

```
data_asset_name = 'CHOOSE THE NAME OF YOUR TABLE FROM THE LIST OF DATA ASSETS ABOVE'
df = context.get_batch(data_asset_name, 
                        expectation_suite_name='my_suite') 
```

* B. To validate a query result set:

```
data_asset_name = 'NAME YOUR QUERY (E.G., daily_users_query) - THIS WILL CREATE A NEW DATA ASSET'
df = context.get_batch(data_asset_name, 
                        expectation_suite_name='my_suite',
                        query='SQL FOR YOUR QUERY')
```





In [None]:
df = context.get_batch(COPY THE APPROPPRIATE CODE SNIPPET FROM THE CELL ABOVE
df.head()


#### Optionally, customize options used to read your data (e.g., separators, header, etc) by setting reader options in `get_batch`

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#reader-options)



In [None]:
# this is how you can see which data batch was loaded
df._batch_kwargs

## Author Expectations

[Read more in the tutorial](https://docs.greatexpectations.io/en/latest/getting_started/create_expectations.html?utm_source=notebook&utm_medium=create_expectations#create-expectations)

See available expectations in the [expectation glossary](https://docs.greatexpectations.io/en/latest/glossary.html?utm_source=notebook&utm_medium=create_expectations)


In [None]:
#example:

column_name = df.get_table_columns()[0]
df.expect_column_values_to_not_be_null(column_name)


In [None]:
# add more expectations here


In [None]:
# add more expectations here


In [None]:
# add more expectations here

### Review the expectations

Expectations that were true on this data batch were added. To view all the expectations you added so far about this data asset, do:

In [None]:
df.get_expectation_suite()

    
    
If you decide not to save some expectations that you created, use [remove_expectaton method](https://docs.greatexpectations.io/en/latest/module_docs/data_asset_module.html?highlight=remove_expectation&utm_source=notebook&utm_medium=create_expectations#great_expectations.data_asset.data_asset.DataAsset.remove_expectation)


The following call will save the expectation suite as a JSON file in great_expectations/expectations directory of your project:
    

In [None]:
df.save_expectation_suite() 

### You created and saved expectations for at least one of the data assets.

### We will show you how to set up validation - the process of checking if new files of this type conform to your expectations before they are processed by your pipeline's code. 

### Go to [integrate_validation_into_pipeline.ipynb](integrate_validation_into_pipeline.ipynb) to proceed.


