# Welcome to GX Core 1.0!


### Install the Great Expectations package

Start by installing the Great Expectations package using a pip command. If you have specific dependencies, you can specify them here. Learn more about additional dependencies in our [docs](https://docs.greatexpectations.io/docs/core/set_up_a_gx_environment/install_additional_dependencies). In this example, we'll be using Postgres.

In [None]:
!pip install 'great_expectations[sqlalchemy]'

Import the `great_expectations` and `expectations` modules and instantiate your Data Context. In 1.0, Expectations are top-level classes namespaced to `gxe`. A Data Context defines the storage location for metadata, such as your configurations for Data Sources, Expectation Suites, Checkpoints, and Data Docs. It also contains your Validation Results and the metrics associated with them, and it provides access to those objects in Python, along with other helper functions for the GX Python API. Learn more [here](https://docs.greatexpectations.io/docs/core/set_up_a_gx_environment/create_a_data_context).

In [None]:
import great_expectations as gx
import great_expectations.expectations as gxe

context = gx.get_context()

### Connect to Data

Create your Data Source and Data Asset. In this example, we'll use a publicly available Postgres data source that GX has set up for all to use and test with. It contains contains New York City (NYC) taxi data from January 2019. The [NYC Taxi data](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page) is a popular set of open source data that contains records of completed taxi cab trips in NYC, including information such as pick up and drop off times, the number of passengers, the fare collected, and so on.

In [None]:
ds = context.data_sources.add_postgres("Data Source", connection_string="postgresql+psycopg2://example_user:workshop_example_password@postgres.workshops.greatexpectations.io/gx_example_db")
asset = ds.add_table_asset(name="Data Asset", table_name="nyc_taxi_data")

A new concept (that we'll touch on in greater detail later with partitioning) is the "Batch Definition", which describes what data will be validated in each run. In this example, we'll use our whole asset with the "whole table" Batch Definition. We'll use our Batch Definition to get a concrete Batch we can validate expectations against.

In [None]:
bd = asset.add_batch_definition_whole_table("Batch Definition")

### Create an Expectation 

Let's create our first gx 1.0 Expectation!
Expectation classes are exposed directly in gx 1.0, and are statically typed using [Pydantic](https://docs.pydantic.dev/latest/). Inellisense will show valid arguments and their types. In this example, we'll create an Expectation that says that our `passenger_count` column should never be greater than 4, since taxis in NYC only have 4 seats.

In [None]:
expectation = gxe.ExpectColumnMaxToBeBetween(column="passenger_count", max_value=4)

### Run a validation

Now that we have completed the configuration, we can validate our data! All we have to do is get the specific batch of data using our Batch Definition and then calling the `batch()` method and passing in our newly-created Expectation.

In [None]:
batch = bd.get_batch()
print(batch.validate(expectation))

### Edit the Expectation 

You'll see that while our validation executed as expected, the expectation itself failed. This is because some taxis in NYC are minivans and can hold up to 6 passengers. In fact, the results show that there are observed values of 6. Fortunately, editing the expectation is quite simple. After we do so, we can re-run the validation and see that it is now passing.

In [None]:
expectation.max_value = 6
print(batch.validate(expectation))

### Create an Expectation Suite

So far we have run a single Expectation on a batch of data. However, there may often be times where you will want run a Suite of Expectations. Doing so is simple. Create an Expectation Suite and then add the Expectations you wish to run. Finally, run the `validate` method on your batch, passing in the Suite instead of the single Expectation.

In [None]:
suite = context.suites.add(gx.ExpectationSuite(name="Expectation Suite"))
suite.add_expectation(expectation)
suite.add_expectation(gxe.ExpectColumnMinToBeBetween(column="passenger_count", min_value=1))

print(batch.validate(suite))

### Create a Validation Definition

Validation definitions are a new, greatly simplified, concept in GX 1.0. They represent an explicit way to tie data, via a Batch Definition, to the expectation on that data. These are at the center of the reworked Checkpoints API. Creating a Validation Definition is simple. Give it a name, link it to the batch definition defined above, and pass in the suite of expectations we just created.

In [None]:
vd = gx.ValidationDefinition(
    name="Validation Definition",
    data=bd,
    suite=suite
)

You can simply run the validation definition as shown below, or you can use it in a Checkpoint, which we will see next.

In [None]:
print(vd.run())

### Create and run a Checkpoint

Validation Definitions can be tied to actions, such as slack notifications and building data docs, via Checkpoints. Checkpoints also provide an interface to run multiple Validation Definitions together. In this example, we'll use the validation definition created above and trigger the `UpdateDataDocsAction`.

In [None]:
from great_expectations.checkpoint.actions import UpdateDataDocsAction

checkpoint = context.checkpoints.add(gx.Checkpoint(
    name="Checkpoint",
    validation_definitions=[
        vd
    ],
    actions = [
        UpdateDataDocsAction(name="update_data_docs")
    ]
))

We can now run the Checkpoint and open the data docs to see our results.

In [None]:
checkpoint.run()
context.open_data_docs()

### Create monthly batch definitions

We previously created "whole table" Batch Definitions. Now, we'll create monthly batch definitions. When checkpoints use these, they will run expectation suites against the last batch defined by the Batch Definition. We'll create a new Validation Definition since this Batch Definition is new, and then run it. By default, running this validation will use the latest set of data available within the Data Asset.

In [None]:
monthlybd = asset.add_batch_definition_monthly("Monthly BD", column="pickup")
vd = gx.ValidationDefinition(name="Monthly VD", data=monthlybd, suite=suite)

print(vd.run())

Alternatively, you can specify which month of data you wish to use by including `batch_parameters` in the `run` method.

In [None]:
print(vd.run(batch_parameters={"year": 2019, "month":1}))