# Scaffold a new Expectation Suite (Experimental)
This process helps you avoid writing lots of boilerplate when authoring suites by allowing you to select columns and other factors that you care about and letting a profiler write some candidate expectations for you to adjust.

**Expectation Suite Name**: `taxi.demo2`

We'd love it if you'd **reach out to us on** the [**Great Expectations Slack Channel**](https://greatexpectations.io/slack)!

In [1]:
import great_expectations as ge
import os
import pandas as pd
import sys
from ruamel import yaml
from great_expectations.checkpoint import LegacyCheckpoint
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler
from great_expectations.data_context.types.resource_identifiers import ValidationResultIdentifier
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest

context = ge.data_context.DataContext()

expectation_suite_name = "falab_suite"
gcp_project='testfalab'
dataset='testfalab_table'
credential_path='../../credentials/credentials.json'

# conection to table and suite
ge_datasource='bigquery'
table='data_test'

column_name='fare_amount'

# Wipe the suite clean to prevent unwanted expectations in the batch
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)

batch_kwargs = {
"datasource": ge_datasource,
# This is specifying the full path via the BigQuery project.dataset.table format
"table": table
}
batch = context.get_batch(batch_kwargs, suite)
batch.head()

21/10/07 09:21:02 WARN Utils: Your hostname, MOJALFONSO resolves to a loopback address: 127.0.1.1; using 192.168.56.1 instead (on interface eth5)
21/10/07 09:21:02 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/10/07 09:21:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


DatasourceInitializationError: Cannot initialize datasource bigquery, error: [Errno 2] No such file or directory: './credentials/credentials.json'

## Select the columns on which you would like to scaffold expectations and those which you would like to ignore.

Great Expectations will choose which expectations might make sense for a column based on the **data type** and **cardinality** of the data in each selected column.

Simply comment out columns that are important and should be included. You can select multiple lines and
use a jupyter keyboard shortcut to toggle each line: **Linux/Windows**:
`Ctrl-/`, **macOS**: `Cmd-/`

In [None]:
ignored_columns = [
    'key',
    # 'fare_amount',
    'pickup_datetime',
    'pickup_longitude',
    # 'pickup_latitude',
    'dropoff_longitude',
    'dropoff_latitude',
    'passenger_count'
]

## Run the scaffolder

The suites generated here are **not meant to be production suites** - they are **scaffolds to build upon**.

**To get to a production grade suite, you will definitely want to [edit this
suite](https://docs.greatexpectations.io/en/latest/guides/how_to_guides/creating_and_editing_expectations/how_to_edit_an_expectation_suite_using_a_disposable_notebook.html?utm_source=notebook&utm_medium=scaffold_expectations)
after scaffolding gets you close to what you want.**

This is highly configurable depending on your goals.
You can ignore columns or exclude certain expectations, specify a threshold for creating value set expectations, or even specify semantic types for a given column.
You can find more information about [how to configure this profiler, including a list of the expectations that it uses, here.](https://docs.greatexpectations.io/en/latest/guides/how_to_guides/creating_and_editing_expectations/how_to_create_an_expectation_suite_with_the_user_configurable_profiler.html)



In [None]:
profiler = UserConfigurableProfiler(profile_dataset=batch,
    ignored_columns=ignored_columns,
    excluded_expectations=None,
    not_null_only=False,
    primary_or_compound_key=False,
    semantic_types_dict=None,
    table_expectations_only=False,
    value_set_threshold="MANY",
    )

suite = profiler.build_suite()

## Save & review the scaffolded Expectation Suite

Let's save the scaffolded expectation suite as a JSON file in the
`great_expectations/expectations` directory of your project and rebuild the Data
 Docs site to make it easy to review the scaffolded suite.

In [None]:
context.save_expectation_suite(suite, expectation_suite_name)

results = LegacyCheckpoint(
    name="_temp_checkpoint",
    data_context=context,
    batches=[
        {
          "batch_kwargs": batch_kwargs,
          "expectation_suite_names": [expectation_suite_name]
        }
    ]
).run()
validation_result_identifier = results.list_validation_result_identifiers()[0]
context.build_data_docs()
context.open_data_docs(validation_result_identifier)

## Next steps
After you review this scaffolded Expectation Suite in Data Docs you
should edit this suite to make finer grained adjustments to the expectations.
This can be done by running `great_expectations suite edit taxi.demo2`.