Skip to content

Commit

Permalink
[FEATURE] Ported "How to configure a Validation Result store in Amazo…
Browse files Browse the repository at this point in the history
…n S3" from RTD to Docusaurus. (#3026)

* Ported "How to configure a Validation Result store in Amazon S3" from RTD to Docusaurus.
  • Loading branch information
alexsherstinsky committed Jul 13, 2021
1 parent 3b20e6f commit dfa1b7c
Show file tree
Hide file tree
Showing 2 changed files with 97 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,99 @@
title: How to configure a Validation Result store in Amazon S3
---

This article is a stub.
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'

By default, Validation results are stored in JSON format in the ``uncommitted/validations/`` subdirectory of your ``great_expectations/`` folder. Since Validations may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validations in Amazon S3.

<Prerequisites>

- Configured a [Data Context](../../../tutorials/getting-started/initialize-a-data-context.md).
- Configured an [Expectations Suite](../../../tutorials/getting-started/create-your-first-expectations.md).
- Configured a [Checkpoint](../../../tutorials/getting-started/validate-your-data.md).
- Installed [boto3](https://github.com/boto/boto3) in your local environment.
- Identified the S3 bucket and prefix where Validation results will be stored.

</Prerequisites>

Steps
-----

1. **Configure** [boto3](https://github.com/boto/boto3) **to connect to the Amazon S3 bucket where Validation results will be stored.**

Instructions on how to set up [boto3](https://github.com/boto/boto3) with AWS can be found at boto3's [documentation site](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html).

2. **Identify your Data Context Validations Store**

Look for the following section in your Data Context's ``great_expectations.yml`` file:

```yaml
validations_store_name: validations_store

stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
```
The configuration file tells Great Expectations to look for Validations in a store called ``validations_store``. It also creates a ``ValidationsStore`` called ``validations_store`` that is backed by a Filesystem and will store validations under the ``base_directory`` ``uncommitted/validations`` (the default).

3. **Update your configuration file to include a new store for Validation results on S3.**

In the example below, the new store's name is set to ``validations_S3_store``, but it can be any name you like. We also need to make some changes to the ``store_backend`` settings. The ``class_name`` will be set to ``TupleS3StoreBackend``, ``bucket`` will be set to the address of your S3 bucket, and ``prefix`` will be set to the folder in your S3 bucket where Validation results will be located.

:::caution

If you are also storing Expectations in S3 ([How to configure an Expectation store to use Amazon S3](./how-to-configure-an-expectation-store-in-amazon-s3)), or DataDocs in S3 ([How to host and share Data Docs on Amazon S3](../configuring-data-docs/how-to-host-and-share-data-docs-on-amazon-s3)), then please ensure that the ``prefix`` values are disjoint and one is not a substring of the other.

:::

```yaml
validations_store_name: validations_S3_store

stores:
validations_S3_store:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
```

4. **Copy existing Validation results to the S3 bucket**. (This step is optional).

One way to copy Validations into Amazon S3 is by using the ``aws s3 sync`` command. As mentioned earlier, the ``base_directory`` is set to ``uncommitted/validations/`` by default. In the example below, two Validation results, ``Validation1`` and ``Validation2`` are copied to Amazon S3. Your output should looks something like this:

```bash
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.json
upload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json
```

5. **Confirm that the new Validations store has been added by running** ``great_expectations --v3-api store list`` **.**

Notice the output contains two Validations Stores: the original ``validations_store`` on the local filesystem and the ``validations_S3_store`` we just configured. This is ok, since Great Expectations will look for Validation results on the S3 bucket as long as we set the ``validations_store_name`` variable to ``validations_S3_store``.

```bash
great_expectations --v3-api store list

- name: validations_store
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/

- name: validations_S3_store
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
```

6. **Confirm that the Validations store has been correctly configured.**

Run a [Checkpoint](../../../tutorials/getting-started/validate-your-data.md) to store results in the new Validations store on S3 then visualize the results by re-building [Data Docs](../../../tutorials/getting-started/check-out-data-docs.md).


If it would be useful to you, please comment with a +1 and feel free to add any suggestions or questions below. Also, please reach out to us on [Slack](https://greatexpectations.io/slack) if you would like to learn more, or have any questions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: How to configure a Validation Result store in Azure blob storage
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'

By default, Validations are stored in JSON format in the ``uncommitted/validations/`` subdirectory of your ``great_expectations/`` folder. Since Validations may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validations in a Azure Blob Storage.
By default, Validations are stored in JSON format in the ``uncommitted/validations/`` subdirectory of your ``great_expectations/`` folder. Since Validations may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. This guide will help you configure a new storage location for Validations in Azure Blob Storage.

<Prerequisites>

Expand Down

0 comments on commit dfa1b7c

Please sign in to comment.