# Edit Your Expectation Suite
Use this notebook to recreate and modify your expectation suite:

**Expectation Suite Name**: `taxi1.demo`

We'd love it if you **reach out to us on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)

In [1]:
import datetime
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.data_context.types.resource_identifiers import (
    ValidationResultIdentifier,
)

context = ge.data_context.DataContext()

# Feel free to change the name of your suite here. Renaming this will not
# remove the other one.
expectation_suite_name = "taxi1.demo"
suite = context.get_expectation_suite(expectation_suite_name)
suite.expectations = []

batch_kwargs = {
    "data_asset_name": "yellow_tripdata_2020-01 (1)",
    "datasource": "taxi.data",
    "path": "D:\\GE_Handson\\great_expectations\\data_taxi\\yellow_tripdata_2020-01 (1).csv",
    "reader_method": "read_csv",
}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

2020-09-27T22:45:55+0530 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.


Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1.0,2020-01-01 00:28:15,2020-01-01 00:33:03,1.0,1.2,1.0,N,238,239,1.0,6.0,3.0,0.5,1.47,0.0,0.3,11.27,2.5
1,1.0,2020-01-01 00:35:39,2020-01-01 00:43:04,1.0,1.2,1.0,N,239,238,1.0,7.0,3.0,0.5,1.5,0.0,0.3,12.3,2.5
2,1.0,2020-01-01 00:47:41,2020-01-01 00:53:52,1.0,0.6,1.0,N,238,238,1.0,6.0,3.0,0.5,1.0,0.0,0.3,10.8,2.5
3,1.0,2020-01-01 00:55:23,2020-01-01 01:00:14,1.0,0.8,1.0,N,238,151,1.0,5.5,0.5,0.5,1.36,0.0,0.3,8.16,0.0
4,2.0,2020-01-01 00:01:58,2020-01-01 00:04:16,1.0,0.0,1.0,N,193,193,2.0,3.5,0.5,0.5,0.0,0.0,0.3,4.8,0.0


## Create & Edit Expectations

Add expectations by calling specific expectation methods on the `batch` object. They all begin with `.expect_` which makes autocompleting easy using tab.

You can see all the available expectations in the **[expectation glossary](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html?utm_source=notebook&utm_medium=create_expectations)**.

### Table Expectation(s)

In [None]:
batch.expect_table_row_count_to_be_between(max_value=7045508, min_value=5764507)

In [None]:
batch.expect_table_column_count_to_equal(value=18)

In [None]:
batch.expect_table_columns_to_match_ordered_list(
    column_list=[
        "VendorID",
        "tpep_pickup_datetime",
        "tpep_dropoff_datetime",
        "passenger_count",
        "trip_distance",
        "RatecodeID",
        "store_and_fwd_flag",
        "PULocationID",
        "DOLocationID",
        "payment_type",
        "fare_amount",
        "extra",
        "mta_tax",
        "tip_amount",
        "tolls_amount",
        "improvement_surcharge",
        "total_amount",
        "congestion_surcharge",
    ]
)

### Column Expectation(s)

#### `passenger_count`

In [2]:
batch.expect_column_values_to_not_be_null(column="passenger_count")

{
  "result": {
    "element_count": 6405008,
    "unexpected_count": 65441,
    "unexpected_percent": 1.021716132126611,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": null,
  "success": false
}

In [3]:
batch.expect_column_distinct_values_to_be_in_set(
    column="passenger_count",
    value_set=[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
)

{
  "result": {
    "observed_value": [
      0.0,
      1.0,
      2.0,
      3.0,
      4.0,
      5.0,
      6.0,
      7.0,
      8.0,
      9.0
    ],
    "element_count": 6405008,
    "missing_count": 65441,
    "missing_percent": 1.0217161321266108
  },
  "meta": {},
  "exception_info": null,
  "success": false
}

In [4]:
batch.expect_column_values_to_be_between("passenger_count",1,6)

{
  "result": {
    "element_count": 6405008,
    "missing_count": 65441,
    "missing_percent": 1.021716132126611,
    "unexpected_count": 114367,
    "unexpected_percent": 1.7855871530527363,
    "unexpected_percent_nonmissing": 1.8040191072986531,
    "partial_unexpected_list": [
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0,
      0.0
    ]
  },
  "meta": {},
  "exception_info": null,
  "success": false
}

#### `payment_type`

In [5]:
batch.expect_column_values_to_not_be_null(column="payment_type")

{
  "result": {
    "element_count": 6405008,
    "unexpected_count": 65441,
    "unexpected_percent": 1.021716132126611,
    "partial_unexpected_list": []
  },
  "meta": {},
  "exception_info": null,
  "success": false
}

In [6]:
batch.expect_column_distinct_values_to_be_in_set(
    column="payment_type", value_set=[1.0, 2.0, 3.0, 4.0, 5.0]
)

{
  "result": {
    "observed_value": [
      1.0,
      2.0,
      3.0,
      4.0,
      5.0
    ],
    "element_count": 6405008,
    "missing_count": 65441,
    "missing_percent": 1.0217161321266108
  },
  "meta": {},
  "exception_info": null,
  "success": true
}

In [None]:
batch.expect_column_kl_divergence_to_be_less_than(
    column="payment_type",
    partition_object={
        "values": [1.0, 2.0, 3.0, 4.0, 5.0],
        "weights": [
            0.7405706099485975,
            0.25141054586220163,
            0.005169122749235082,
            0.002849563700486169,
            1.577394796837071e-07,
        ],
    },
    threshold=0.6,
)

## Save & Review Your Expectations

Let's save the expectation suite as a JSON file in the `great_expectations/expectations` directory of your project.
If you decide not to save some expectations that you created, use [remove_expectation method](https://docs.greatexpectations.io/en/latest/module_docs/data_asset_module.html?highlight=remove_expectation&utm_source=notebook&utm_medium=edit_expectations#great_expectations.data_asset.data_asset.DataAsset.remove_expectation).

Let's now rebuild your Data Docs, which helps you communicate about your data with both machines and humans.

In [7]:
batch.save_expectation_suite(discard_failed_expectations=False)

results = context.run_validation_operator("action_list_operator", assets_to_validate=[batch])
validation_result_identifier = results.list_validation_result_identifiers()[0]
context.build_data_docs()
context.open_data_docs(validation_result_identifier)

2020-09-27T22:57:01+0530 - INFO - 	5 expectation(s) included in expectation_suite. result_format settings filtered.
2020-09-27T22:57:01+0530 - INFO - Setting run_name to: 20200927T172701.914099Z
2020-09-27T22:57:01+0530 - INFO - 	5 expectation(s) included in expectation_suite.
