![Neptune + Great Expectations](https://neptune.ai/wp-content/uploads/2024/06/GreatExpectations.svg)

# Neptune + Great Expectations

<a target="_blank" href="https://colab.research.google.com/github/neptune-ai/examples/blob/main/integrations-and-supported-tools/great-expectations/notebooks/Neptune_Great_Expectations.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a><a target="_blank" href="https://github.com/neptune-ai/examples/blob/main/integrations-and-supported-tools/great-expectations/">
  <img alt="Open in GitHub" src="https://img.shields.io/badge/Open_in_GitHub-blue?logo=github&labelColor=black">
</a><a target="_blank" href="https://app.neptune.ai/o/showcase/org/great-expectations/runs/details?viewId=9c54e2be-0bd3-40cb-8868-08092ce30caf&detailsTab=dashboard&dashboardId=GX-metadata-9c54e2cf-4533-4b64-92a3-ae49ea174815&shortId=GX-5&type=run">  
  <img alt="Explore in Neptune" src="https://neptune.ai/wp-content/uploads/2024/01/neptune-badge.svg">
</a><a target="_blank" href="https://docs.neptune.ai/integrations/great_expectations/">
  <img alt="View tutorial in docs" src="https://neptune.ai/wp-content/uploads/2024/01/docs-badge-2.svg">
</a>

## Introduction

[Great Expectations (GX) Core](https://greatexpectations.io/gx-core) is an open-source tool to help you validate, document, and monitor your data.

This guide will show you how to:

* Log GX Core's configurations to Neptune,
* Log machine-readable validation results to Neptune,
* Upload GX Core's interactive human-readable HTML reports to Neptune.

## Before you start

This notebook example lets you try out Neptune as an anonymous user, with zero setup.

If you want to see the example logged to your own workspace instead:

  1. Create a Neptune account. [Register &rarr;](https://neptune.ai/register)
  1. Create a Neptune project that you will use for tracking metadata. For instructions, see [Creating a project](https://docs.neptune.ai/setup/creating_project) in the Neptune docs.

## Install Neptune and GX OSS

In [None]:
! pip install -qU neptune great_expectations "numpy<2.0"

## Import libraries

In [None]:
import neptune
import great_expectations as gx
import pandas as pd

from neptune.utils import stringify_unsupported

### Start a Neptune run

To create a new run for tracking the metadata, you tell Neptune who you are (`api_token`) and where to send the data (`project`).

You can use the default code cell below to create an anonymous run in the public project [common/great-expectations](https://app.neptune.ai/o/common/org/great-expectations).

**Note**: Public projects are cleaned regularly, so anonymous runs are only stored temporarily.

### Log to your own project instead

Replace the code below with the following:

```python
import neptune
from getpass import getpass

run = neptune.init_run(
    project="workspace-name/project-name",  # Replace with your workspace and project names
    api_token=getpass("Enter your Neptune API token: "),
    tags=["notebook"],  # (Optional) Replace with your own tags
)
```

To find your API token and full project name:

1. [Log in to Neptune](https://app.neptune.ai/).
1. In the bottom-left corner, expand your user menu and select **Get your API token**.
1. To copy the project path, in the top-right corner, open the settings menu and select **Details & privacy**.

For more help, see [Setting Neptune credentials](https://docs.neptune.ai/setup/setting_credentials) in the Neptune docs.

In [None]:
run = neptune.init_run(
    api_token=neptune.ANONYMOUS_API_TOKEN,
    project="common/great-expectations",
    tags=["notebook"],  # (optional) replace with your own
)

**To view the newly created run and its metadata in the Neptune app, use the link that appeared in the cell output.**

## Read data

In [None]:
df = pd.read_csv(
    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)

## Create a GX Data Context

In [None]:
context = gx.get_context(mode="file")

### Upload Context configuration to Neptune

In [None]:
run["gx/context/config"] = context.get_config().to_json_dict()

The above code cell logs the GX data context configuration to the `gx/context/config` namespace in the Neptune run.

### Connect context to data source

In [None]:
data_source = context.data_sources.add_pandas("pandas")
data_asset = data_source.add_dataframe_asset(name="pd dataframe asset")

## Create Batch

In [None]:
batch_definition = data_asset.add_batch_definition_whole_dataframe("batch-def")

batch_definition = (
    context.data_sources.get("pandas")
    .get_asset("pd dataframe asset")
    .get_batch_definition("batch-def")
)

batch_parameters = {"dataframe": df}

batch = batch_definition.get_batch(batch_parameters=batch_parameters)

## Create Expectation Suite

In [None]:
suite = gx.ExpectationSuite(name="expectation_suite")
suite = context.suites.add(suite)

suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeBetween(
        column="passenger_count", min_value=1, max_value=6
    )
)
suite.add_expectation(
    gx.expectations.ExpectColumnValuesToBeBetween(column="fare_amount", min_value=0)
)
suite.add_expectation(gx.expectations.ExpectColumnValuesToNotBeNull(column="pickup_datetime"))

### Log Expectations to Neptune

In [None]:
run["gx/meta"] = suite.meta

In [None]:
run["gx/expectations/expectations_suite_name"] = suite.name

for idx, expectation in enumerate(suite.to_json_dict()["expectations"]):
    run["gx/expectations"][idx] = expectation

The above code cell does two things:
* logs the GX Expectations suite name of the context configuration to the `gx/expectations/expectations_suite_name` field of the run,
* creates a numbered folder for each expectation in the `gx/expectations` namespace.

## Create a Validation Definition

In [None]:
definition_name = "validation_definition"
validation_definition = gx.ValidationDefinition(
    data=batch_definition, suite=suite, name=definition_name
)

## Create Checkpoint

In [None]:
checkpoint_name = "my_checkpoint"

actions = [
    gx.checkpoint.UpdateDataDocsAction(name="update_all_data_docs"),
]

checkpoint = gx.Checkpoint(
    name=checkpoint_name,
    validation_definitions=[validation_definition],
    actions=actions,
    result_format={"result_format": "COMPLETE"},
)

context.validation_definitions.add(validation_definition)

context.checkpoints.add(checkpoint)

## Run Validations

In [None]:
results = checkpoint.run(batch_parameters=batch_parameters)

### Log Validation results to Neptune

By saving GX OSS' results as a dictionary to Neptune, you can access them programmatically and use in your CI/CD pipelines.

In [None]:
run["gx/validations/success"] = results.describe_dict()["success"]

In [None]:
run["gx/validations/json"] = results.describe_dict()["validation_results"][0]

In [None]:
for idx, result in enumerate(results.describe_dict()["validation_results"][0]["expectations"]):
    run["gx/validations/json/results"][idx] = stringify_unsupported(result)

Since the results contains lists, we use [`stringify_unsupported()`](https://docs.neptune.ai/api/utils/#stringify_unsupported) to parse them into strings.

## Upload HTML reports to Neptune

Using Neptune's HTML previewer, you can view and interact with GX OSS' rich HTML reports on Neptune.

### Get the `local_site_path` of the Data Context

In [None]:
from great_expectations.data_context import EphemeralDataContext
import os

if isinstance(context, EphemeralDataContext):
    context = context.convert_to_file_context()

local_site_path = os.path.dirname(context.build_data_docs()["local_site"])[7:]

### Log Expectations reports to Neptune

In [None]:
run["gx/expectations/reports"].upload_files(os.path.join(local_site_path, "expectations"))

### Log Validations reports to Neptune

In [None]:
run["gx/validations/reports"].upload_files(os.path.join(local_site_path, "validations"))

## Cleanup

### Stop Neptune run

Once you are done logging, stop tracking the run.

In [None]:
run.stop()

### (Optional) Delete the FileSystem Data Context

In [None]:
import shutil

shutil.rmtree("gx")

## Analyze the logged metadata in the Neptune app
 
Explore the run in the Neptune app, or check this [example dashboard](https://app.neptune.ai/o/showcase/org/great-expectations/runs/details?viewId=9c54e2be-0bd3-40cb-8868-08092ce30caf&detailsTab=dashboard&dashboardId=GX-metadata-9c54e2cf-4533-4b64-92a3-ae49ea174815&shortId=GX-5&type=run).