Skip to content

Commit

Permalink
[DOCS] Docs/doc 95/doc 129/technical term tags for setup pages (#4392)
Browse files Browse the repository at this point in the history
* - Added technical term tags to setup overview doc
- Added technical term tags to all docs in /docs/guides/setup/configuring_data_docs

* - Updated technical term link and import relative paths
- Added technical term tags to all docs in /docs/guides/setup/configuring_data_contexts

* - Updated TechnicalTag tag to no longer need "relative" directions. (existing tags with "relative" will just disregard that parameter.)
- Updates to technical term tags in both documents.
- Updates to standardize formating in both documents (indentation was confusing the markdown render's unordered lists.)

* - Added technical term tags.
- Edited "Validations" to be "Validation Results"; minor corrections to title casing of technical terms.

* - Updates to formatting and terminology usage for all validation result store how-to guides.

* - More updates to formatting for all validation result store how-to guides.

* - Added technical term tags to the edited documents.

* - standardized formatting of all how to configure an expectation store guides to match documentation standards.

* - added technical term tags to document.

* - adds technical term tags to the edited documents.

* - final pass, adds a few missed technical terms; corrects a few formating issues.

* Update docs/guides/setup/configuring_data_contexts/how_to_configure_credentials.md

Co-authored-by: Anthony Burdi <anthony@superconductive.com>

* Update docs/guides/setup/configuring_data_docs/how_to_host_and_share_data_docs_on_azure_blob_storage.md

Co-authored-by: Anthony Burdi <anthony@superconductive.com>

* Apply suggestions from code review

Co-authored-by: Anthony Burdi <anthony@superconductive.com>

* - Updates according to review on GitHub.

Co-authored-by: Anthony Burdi <anthony@superconductive.com>
  • Loading branch information
Rachel-Reverie and anthonyburdi committed Mar 16, 2022
1 parent 435c02b commit d929ef9
Show file tree
Hide file tree
Showing 20 changed files with 986 additions and 969 deletions.
Expand Up @@ -2,14 +2,15 @@
title: How to configure a new Data Context with the CLI
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '/docs/term_tags/_tag.mdx';

<Prerequisites>

- Configured a [Data Context](../../../tutorials/getting_started/initialize_a_data_context.md)

</Prerequisites>

We recommend that you create new Data Contexts by using the a ``great_expectations init`` command in the directory where you want to deploy Great Expectations.
We recommend that you create new <TechnicalTag relative="../../../" tag="data_context" text="Data Contexts" /> by using the a ``great_expectations init`` command in the directory where you want to deploy Great Expectations.

```bash
great_expectations init
Expand Down
Expand Up @@ -4,6 +4,7 @@ title: How to configure credentials
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import Tabs from '@theme/Tabs'
import TabItem from '@theme/TabItem'
import TechnicalTag from '/docs/term_tags/_tag.mdx';

This guide will explain how to configure your ``great_expectations.yml`` project config to populate credentials from either a YAML file or a secret manager.

Expand All @@ -21,47 +22,55 @@ If your Great Expectations deployment is in an environment without a file system

<Prerequisites></Prerequisites>

**Steps**
## Steps

1. Decide where you would like to save the desired credentials or config values - in a YAML file, environment variables, or a combination - then save the values. In most cases, we suggest using a config variables YAML file. YAML files make variables more visible, easily editable, and allow for modularization (e.g. one file for dev, another for prod).
### 1. Save credentials and config

:::note
Decide where you would like to save the desired credentials or config values - in a YAML file, environment variables, or a combination - then save the values.

- In the ``great_expectations.yml`` config file, environment variables take precedence over variables defined in a config variables YAML
- Environment variable substitution is supported in both the ``great_expectations.yml`` and config variables ``config_variables.yml`` config file.
In most cases, we suggest using a config variables YAML file. YAML files make variables more visible, easily editable, and allow for modularization (e.g. one file for dev, another for prod).

:::
:::note

- In the ``great_expectations.yml`` config file, environment variables take precedence over variables defined in a config variables YAML
- Environment variable substitution is supported in both the ``great_expectations.yml`` and config variables ``config_variables.yml`` config file.

:::

If using a YAML file, save desired credentials or config values to ``great_expectations/uncommitted/config_variables.yml`` or another YAML file of your choosing:

```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L9-L15
```

If using a YAML file, save desired credentials or config values to ``great_expectations/uncommitted/config_variables.yml`` or another YAML file of your choosing:
:::note

```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L9-L15
```
- If you wish to store values that include the dollar sign character ``$``, please escape them using a backslash ``\`` so substitution is not attempted. For example in the above example for Postgres credentials you could set ``password: pa\$sword`` if your password is ``pa$sword``. Say that 5 times fast, and also please choose a more secure password!
- When you save values via the <TechnicalTag relative="../../../" tag="cli" text="CLI" />, they are automatically escaped if they contain the ``$`` character.
- You can also have multiple substitutions for the same item, e.g. ``database_string: ${USER}:${PASSWORD}@${HOST}:${PORT}/${DATABASE}``

:::note
:::

- If you wish to store values that include the dollar sign character ``$``, please escape them using a backslash ``\`` so substitution is not attempted. For example in the above example for Postgres credentials you could set ``password: pa\$sword`` if your password is ``pa$sword``. Say that 5 times fast, and also please choose a more secure password!
- When you save values via the CLI, they are automatically escaped if they contain the ``$`` character.
- You can also have multiple substitutions for the same item, e.g. ``database_string: ${USER}:${PASSWORD}@${HOST}:${PORT}/${DATABASE}``
If using environment variables, set values by entering ``export ENV_VAR_NAME=env_var_value`` in the terminal or adding the commands to your ``~/.bashrc`` file:

:::
```bash file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L19-L25
```

If using environment variables, set values by entering ``export ENV_VAR_NAME=env_var_value`` in the terminal or adding the commands to your ``~/.bashrc`` file:
### 2. Set ``config_variables_file_path``

```bash file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L19-L25
```
If using a YAML file, set the ``config_variables_file_path`` key in your ``great_expectations.yml`` or leave the default.

2. If using a YAML file, set the ``config_variables_file_path`` key in your ``great_expectations.yml`` or leave the default.
```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L29
```

```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L29
```
### 3. Replace credentials with placeholders

3. Replace credentials or other values in your ``great_expectations.yml`` with ``${}``-wrapped variable names (i.e. ``${ENVIRONMENT_VARIABLE}`` or ``${YAML_KEY}``).
Replace credentials or other values in your ``great_expectations.yml`` with ``${}``-wrapped variable names (i.e. ``${ENVIRONMENT_VARIABLE}`` or ``${YAML_KEY}``).

```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L33-L59
```
```yaml file=../../../../tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py#L33-L59
```


**Additional Notes**
## Additional Notes

- The default ``config_variables.yml`` file located at ``great_expectations/uncommitted/config_variables.yml`` applies to deployments created using ``great_expectations init``.
- To view the full script used in this page, see it on GitHub: [how_to_configure_credentials.py](https://github.com/great-expectations/great_expectations/tree/develop/tests/integration/docusaurus/setup/configuring_data_contexts/how_to_configure_credentials.py)
Expand Down
Expand Up @@ -2,8 +2,9 @@
title: How to configure DataContext components using test_yaml_config
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '/docs/term_tags/_tag.mdx';

``test_yaml_config`` is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for Datasources and Stores. For many deployments of Great Expectations, these components (plus Expectations) are the only ones you'll need.
``test_yaml_config`` is a convenience method for configuring the moving parts of a Great Expectations deployment. It allows you to quickly test out configs for <TechnicalTag relative="../../../" tag="datasource" text="Datasources" /> and <TechnicalTag relative="../../../" tag="store" text="Stores" />. For many deployments of Great Expectations, these components (plus <TechnicalTag relative="../../../" tag="expectation" text="Expectations" />) are the only ones you'll need.

<Prerequisites>

Expand Down Expand Up @@ -60,7 +61,7 @@ Steps

1. confirming that the connection works,
2. gathering a list of available DataAssets (e.g. tables in SQL; files or folders in a filesystem), and
3. verifying that it can successfully fetch at least one Batch from the source.
3. verifying that it can successfully fetch at least one <TechnicalTag relative="../../../" tag="batch" text="Batch" /> from the source.

The output will look something like this:

Expand Down Expand Up @@ -121,11 +122,11 @@ Steps

4. **Iterate as necessary.**

From here, iterate by editing your config and re-running ``test_yaml_config``, adding config blocks for additional introspection, data assets, sampling, etc.
From here, iterate by editing your config and re-running ``test_yaml_config``, adding config blocks for additional introspection, <TechnicalTag relative="../../../" tag="data_asset" text="Data Assets" />, sampling, etc.

5. **(Optional:) Test additional methods.**

Note that when ``test_yaml_config`` runs successfully, it adds the specified Datasource to your DataContext. This means that you can also test other methods, such as ``context.get_validator``, right within your notebook:
Note that when ``test_yaml_config`` runs successfully, it adds the specified Datasource to your `DataContext`. This means that you can also test other methods, such as ``context.get_validator``, right within your notebook:

```python
validator = context.get_validator(
Expand Down
Expand Up @@ -2,8 +2,9 @@
title: How to instantiate a Data Context without a yml file
---
import Prerequisites from '../../connecting_to_your_data/components/prerequisites.jsx'
import TechnicalTag from '/docs/term_tags/_tag.mdx';

This guide will help you instantiate a Data Context without a yml file, aka configure a Data Context in code. If you are working in an environment without easy access to a local filesystem (e.g. AWS Spark EMR, Databricks, etc.) you may wish to configure your Data Context in code, within your notebook or workflow tool (e.g. Airflow DAG node).
This guide will help you instantiate a <TechnicalTag tag="data_context" text="Data Context" /> without a yml file, aka configure a Data Context in code. If you are working in an environment without easy access to a local filesystem (e.g. AWS Spark EMR, Databricks, etc.) you may wish to configure your Data Context in code, within your notebook or workflow tool (e.g. Airflow DAG node).

<Prerequisites>

Expand All @@ -13,23 +14,21 @@ This guide will help you instantiate a Data Context without a yml file, aka conf
- See also our companion video for this guide: [Data Contexts In Code](https://youtu.be/4VMOYpjHNhM).
:::

## Steps

Steps
-----
### 1. **Create a DataContextConfig**

1. **Create a DataContextConfig**
The `DataContextConfig` holds all of the associated configuration parameters to build a Data Context. There are defaults set for you to minimize configuration in typical cases, but please note that every parameter is configurable and all defaults are overridable. Also note that `DatasourceConfig` also has defaults which can be overridden.

The `DataContextConfig` holds all of the associated configuration parameters to build a Data Context. There are defaults set for you to minimize configuration in typical cases, but please note that every parameter is configurable and all defaults are overridable. Also note that `DatasourceConfig` also has defaults which can be overridden.
Here we will show a few examples of common configurations, using the ``store_backend_defaults`` parameter. Note that you can use the existing API without defaults by omitting that parameter, and you can override all of the parameters as shown in the last example. A parameter set in ``DataContextConfig`` will override a parameter set in ``store_backend_defaults`` if both are used.

Here we will show a few examples of common configurations, using the ``store_backend_defaults`` parameter. Note that you can use the existing API without defaults by omitting that parameter, and you can override all of the parameters as shown in the last example. A parameter set in ``DataContextConfig`` will override a parameter set in ``store_backend_defaults`` if both are used.
The following ``store_backend_defaults`` are currently available:
- `S3StoreBackendDefaults`
- `GCSStoreBackendDefaults`
- `DatabaseStoreBackendDefaults`
- `FilesystemStoreBackendDefaults`

The following ``store_backend_defaults`` are currently available:
- :py:class:`~great_expectations.data_context.types.base.S3StoreBackendDefaults`
- :py:class:`~great_expectations.data_context.types.base.GCSStoreBackendDefaults`
- :py:class:`~great_expectations.data_context.types.base.DatabaseStoreBackendDefaults`
- :py:class:`~great_expectations.data_context.types.base.FilesystemStoreBackendDefaults`

The following example shows a Data Context configuration with an SQLAlchemy datasource and an AWS S3 bucket for all metadata stores, using default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code.
The following example shows a Data Context configuration with an SQLAlchemy <TechnicalTag relative="../../../" tag="datasource" text="Datasource" /> and an AWS S3 bucket for all metadata <TechnicalTag relative="../../../" tag="store" text="Stores" />, using default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code.

```python
from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, S3StoreBackendDefaults
Expand Down Expand Up @@ -65,7 +64,7 @@ data_context_config = DataContextConfig(
)
```

The following example shows a Data Context configuration with a Pandas datasource and local filesystem defaults for metadata stores. Note: imports are omitted in the following examples. Note: You may add an optional root_directory parameter to set the base location for the Store Backends.
The following example shows a Data Context configuration with a Pandas datasource and local filesystem defaults for metadata stores. Note: imports are omitted in the following examples. Note: You may add an optional root_directory parameter to set the base location for the Store Backends.

```python
from great_expectations.data_context.types.base import DataContextConfig, DatasourceConfig, FilesystemStoreBackendDefaults
Expand Down Expand Up @@ -95,9 +94,9 @@ data_context_config = DataContextConfig(
)
```

The following example shows a Data Context configuration with an SQLAlchemy datasource and two GCS buckets for metadata Stores, using some custom and some default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code. `default_bucket_name`, `default_project_name` sets the default value for all stores that are not specified individually.
The following example shows a Data Context configuration with an SQLAlchemy datasource and two GCS buckets for metadata Stores, using some custom and some default prefixes. Note that you can still substitute environment variables as in the YAML based configuration to keep sensitive credentials out of your code. `default_bucket_name`, `default_project_name` sets the default value for all stores that are not specified individually.

The resulting `DataContextConfig` from the following example creates an Expectations Store and Data Docs using the `my_default_bucket` and `my_default_project` parameters since their bucket and project is not specified explicitly. The Validations Store is created using the explicitly specified `my_validations_bucket` and `my_validations_project`. Further, the prefixes are set for the Expectations Store and Validations Store, while Data Docs use the default `data_docs` prefix.
The resulting `DataContextConfig` from the following example creates an <TechnicalTag tag="expectation_store" text="Expectations Store" /> and <TechnicalTag relative="../../../" tag="data_docs" text="Data Docs" /> using the `my_default_bucket` and `my_default_project` parameters since their bucket and project is not specified explicitly. The <TechnicalTag tag="validation_result_store" text="Validation Results Store" /> is created using the explicitly specified `my_validations_bucket` and `my_validations_project`. Further, the prefixes are set for the Expectations Store and Validation Results Store, while Data Docs use the default `data_docs` prefix.

```python
data_context_config = DataContextConfig(
Expand Down Expand Up @@ -138,7 +137,7 @@ data_context_config = DataContextConfig(
)
```

The following example sets overrides for many of the parameters available to you when creating a `DataContextConfig` and a Datasource.
The following example sets overrides for many of the parameters available to you when creating a `DataContextConfig` and a Datasource.

```python
data_context_config = DataContextConfig(
Expand Down Expand Up @@ -236,9 +235,9 @@ context = BaseDataContext(project_config=data_context_config)

### 3. Use this BaseDataContext instance as your DataContext

If you are using Airflow, you may wish to pass this Data Context to your GreatExpectationsOperator as a parameter. See the following guide for more details:
If you are using Airflow, you may wish to pass this Data Context to your GreatExpectationsOperator as a parameter. See the following guide for more details:

- [Deploying Great Expectations with Airflow](../../../../docs/intro.md)
- [Deploying Great Expectations with Airflow](../../../../docs/intro.md)


Additional resources
Expand Down

0 comments on commit d929ef9

Please sign in to comment.