# Scaffold a new Expectation Suite (Experimental)
This process helps you avoid writing lots of boilerplate when authoring suites by allowing you to select columns you care about and letting a profiler write some candidate expectations for you to adjust.

**Expectation Suite Name**: `taxi1.demo`

We'd love it if you'd **reach out to us on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)!

In [1]:
import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.profile import BasicSuiteBuilderProfiler
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

expectation_suite_name = "taxi1.demo"
suite = context.create_expectation_suite(
    expectation_suite_name, overwrite_existing=True
)

batch_kwargs = {
    "path": "D:\\GE_Handson\\great_expectations\\data_taxi\\yellow_tripdata_2020-01 (1).csv",
    "datasource": "taxi.data",
    "reader_method": "read_csv",
    "data_asset_name": "yellow_tripdata_2020-01 (1)",
}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

2020-09-27T22:00:39+0530 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.


Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1.0,2020-01-01 00:28:15,2020-01-01 00:33:03,1.0,1.2,1.0,N,238,239,1.0,6.0,3.0,0.5,1.47,0.0,0.3,11.27,2.5
1,1.0,2020-01-01 00:35:39,2020-01-01 00:43:04,1.0,1.2,1.0,N,239,238,1.0,7.0,3.0,0.5,1.5,0.0,0.3,12.3,2.5
2,1.0,2020-01-01 00:47:41,2020-01-01 00:53:52,1.0,0.6,1.0,N,238,238,1.0,6.0,3.0,0.5,1.0,0.0,0.3,10.8,2.5
3,1.0,2020-01-01 00:55:23,2020-01-01 01:00:14,1.0,0.8,1.0,N,238,151,1.0,5.5,0.5,0.5,1.36,0.0,0.3,8.16,0.0
4,2.0,2020-01-01 00:01:58,2020-01-01 00:04:16,1.0,0.0,1.0,N,193,193,2.0,3.5,0.5,0.5,0.0,0.0,0.3,4.8,0.0


## Select the columns on which you would like to scaffold expectations

Great Expectations will choose which expectations might make sense for a column based on the **data type** and **cardinality** of the data in each selected column.

Simply uncomment columns that are important. You can select multiple lines and
use a jupyter keyboard shortcut to toggle each line: **Linux/Windows**:
`Ctrl-/`, **macOS**: `Cmd-/`

In [2]:
included_columns = [
    # 'VendorID',
    # 'tpep_pickup_datetime',
    # 'tpep_dropoff_datetime',
    'passenger_count',
    # 'trip_distance',
    # 'RatecodeID',
    # 'store_and_fwd_flag',
    # 'PULocationID',
    # 'DOLocationID',
    'payment_type',
    # 'fare_amount',
    # 'extra',
    # 'mta_tax',
    # 'tip_amount',
    # 'tolls_amount',
    # 'improvement_surcharge',
    # 'total_amount',
    # 'congestion_surcharge'
]

## Run the scaffolder

The suites generated here are **not meant to be production suites** - they are **scaffolds to build upon**.

**To get to a production grade suite, you will definitely want to [edit this
suite](https://docs.greatexpectations.io/en/latest/how_to_guides/creating_and_editing_expectations/how_to_edit_an_expectation_suite_using_a_disposable_notebook.html)
after scaffolding gets you close to what you want.**

This is highly configurable depending on your goals. You can include or exclude
columns, and include or exclude expectation types (when applicable). [The
Expectation Glossary](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html?utm_source=notebook&utm_medium=scaffold_expectations)
contains a list of possible expectations.

In [3]:
# Wipe the suite clean to prevent unwanted expectations in the batch
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)
batch = context.get_batch(batch_kwargs, suite)

# In the scaffold_config, included or excluded expectation names should be strings.
scaffold_config = {
    "included_columns": included_columns,
    # "excluded_columns": [],
    # "included_expectations": [],
    # "excluded_expectations": [],
}
suite, evr = BasicSuiteBuilderProfiler().profile(batch, profiler_configuration=scaffold_config)

2020-09-27T22:23:57+0530 - INFO - 	9 expectation(s) included in expectation_suite.


## Save & review the scaffolded Expectation Suite

Let's save the scaffolded expectation suite as a JSON file in the
`great_expectations/expectations` directory of your project and rebuild the Data
 Docs site to make it easy to review the scaffolded suite.

In [4]:
context.save_expectation_suite(suite, expectation_suite_name)

results = context.run_validation_operator("action_list_operator", assets_to_validate=[batch])
validation_result_identifier = results.list_validation_result_identifiers()[0]
context.build_data_docs()
context.open_data_docs(validation_result_identifier)

2020-09-27T22:28:55+0530 - INFO - Setting run_name to: 20200927T165855.664769Z
2020-09-27T22:28:56+0530 - INFO - 	9 expectation(s) included in expectation_suite.


## Next steps
After you review this scaffolded Expectation Suite in Data Docs you
should edit this suite to make finer grained adjustments to the expectations.
This can be done by running `great_expectations suite edit taxi1.demo`.