# Using Great Expectations for Model Development

As your data products and models are developed, you can encode assumptions about input and output datasets as **expectations**.

Using that workflow provides the following benefits:

1. These are machine verifiable and can be used to monitor data flowing through your pipelines.
2. These eliminate poisonous implicit assumptions that cause data engineers re-work and waste time - "How do we define visits?"
3. These **will eventually** be easy to edit.
4. These **will eventually** be easy to reason about visually.

In [1]:
import json
import os

import great_expectations as ge
import pandas as pd

## Initialize a DataContext

A great expectations `DataContext` represents the collection of data asset specifications in this project.

You'll need:
- the directory where you ran `great_expectations init` (where the .great_expectations.yml file is).
- dbt profile and target information in the datasources section of your great_expectations configuration

In [2]:
context = ge.data_context.DataContext('../../')

## Get a Dataset

Using the data context, provide the name of the datasource configured in your project config ("dbt" in this case), and the name of the dbt model to which to connect

In [6]:
df = context.get_data_asset("local-data", "Titanic.csv")

In [7]:
df.get_expectations_config()

	0 failing expectations
	0 result_format kwargs
	0 include_configs kwargs
	0 catch_exceptions kwargs
If you wish to change this behavior, please set discard_failed_expectations, discard_result_format_kwargs, discard_include_configs_kwargs, and discard_catch_exceptions_kwargs appropirately.


{'data_asset_name': 'Titanic.csv',
 'meta': {'great_expectations.__version__': '0.5.1__develop__sch_internal'},
 'expectations': [],
 'data_asset_type': 'Dataset'}

## Declare Expectations

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,Name,PClass,Age,Sex,Survived,SexCode
0,1,"Allen, Miss Elisabeth Walton",1st,29.0,female,1,1
1,2,"Allison, Miss Helen Loraine",1st,2.0,female,0,1
2,3,"Allison, Mr Hudson Joshua Creighton",1st,30.0,male,0,0
3,4,"Allison, Mrs Hudson JC (Bessie Waldo Daniels)",1st,25.0,female,0,1
4,5,"Allison, Master Hudson Trevor",1st,0.92,male,1,0


In [17]:
df.expect_column_values_to_be_in_set('Sex', ['female', 'male'], include_config=True)

In [11]:
df.save_expectations_config()

	0 failing expectations
	1 result_format kwargs
	0 include_configs kwargs
	0 catch_exceptions kwargs
If you wish to change this behavior, please set discard_failed_expectations, discard_result_format_kwargs, discard_include_configs_kwargs, and discard_catch_exceptions_kwargs appropirately.


In [None]:
df_every_visit_per_day.save_expectations_config()

### The expectation collections for the two datasets are saved into JSON files in great_expectations/data_asset_configurations folder in the current project - let's commit them.