Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'develop' into m/dx-47/drop-tables-from-validator
* develop: Consolidate Cloud tutorials (#7395) [DOCS] DOC-473 Adds guide "How to set up GX to work with SQL databases" (#7409) [CONTRIB] Limit results for two expectations (#7403) [MAINTENANCE] : split up map_metric_provider.py (#7402) [DOCS] DOC-473: Adds shared components for fluent and state management updates (#7404) [FEATURE] add optional `id` to Fluent Datasources and DataAsset schemas (#7334) [Contrib] Adding support for date for the row condition parser (#7359) [MAINTENANCE] Test against minimum SQLAlchemy versions (#7396) [BUGFIX] Fixing typographical errors and argument omissions (#7398)
- Loading branch information
Showing
206 changed files
with
6,192 additions
and
4,063 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
# This file is responsible for configuring the `sqlalchemy-compatibility` pipeline (https://dev.azure.com/great-expectations/great_expectations/_build) | ||
# | ||
# The pipeline is run under the following conditions: | ||
# - On the develop branch whenever a commit is made to an open PR | ||
# | ||
# In this pipeline we run tests against several databases using the latest patch | ||
# version of currently supported sqlalchemy versions. E.g 1.3.x, 1.4.x, 2.0.x | ||
# where x is the latest patch. This will help ensure we are compatible with all | ||
# previously supported versions as we make changes to support later versions. | ||
|
||
trigger: | ||
branches: | ||
include: | ||
- develop | ||
|
||
resources: | ||
containers: | ||
- container: postgres | ||
image: postgres:11 | ||
ports: | ||
- 5432:5432 | ||
env: | ||
POSTGRES_DB: "test_ci" | ||
POSTGRES_HOST_AUTH_METHOD: "trust" | ||
|
||
variables: | ||
GE_USAGE_STATISTICS_URL: "https://qa.stats.greatexpectations.io/great_expectations/v1/usage_statistics" | ||
|
||
|
||
stages: | ||
- stage: scope_check | ||
pool: | ||
vmImage: 'ubuntu-latest' | ||
jobs: | ||
- job: changes | ||
steps: | ||
- task: ChangedFiles@1 | ||
name: CheckChanges | ||
inputs: | ||
verbose: true | ||
rules: | | ||
[GXChanged] | ||
great_expectations/**/*.py | ||
pyproject.toml | ||
setup.cfg | ||
setup.py | ||
MANIFEST.in | ||
tests/** | ||
/*.txt | ||
/*.yml | ||
requirements* | ||
reqs/*.txt | ||
ci/**/*.yml | ||
assets/scripts/** | ||
scripts/*.py | ||
scripts/*.sh | ||
- stage: lint | ||
dependsOn: scope_check | ||
pool: | ||
vmImage: 'ubuntu-latest' | ||
|
||
jobs: | ||
- job: lint | ||
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true) | ||
steps: | ||
- task: UsePythonVersion@0 | ||
inputs: | ||
versionSpec: 3.7 | ||
displayName: 'Use Python 3.7' | ||
|
||
- script: | | ||
pip install $(grep -E '^(black|invoke|ruff)' reqs/requirements-dev-contrib.txt) | ||
EXIT_STATUS=0 | ||
invoke fmt --check || EXIT_STATUS=$? | ||
invoke lint || EXIT_STATUS=$? | ||
exit $EXIT_STATUS | ||
- stage: import_ge | ||
dependsOn: scope_check | ||
pool: | ||
vmImage: 'ubuntu-latest' | ||
|
||
jobs: | ||
- job: import_ge | ||
|
||
steps: | ||
- task: UsePythonVersion@0 | ||
inputs: | ||
versionSpec: '3.7' | ||
displayName: 'Use Python 3.7' | ||
|
||
- script: | | ||
pip install . | ||
displayName: 'Install GX and required dependencies (i.e. not sqlalchemy)' | ||
- script: | | ||
python -c "import great_expectations as gx; print('Successfully imported GX Version:', gx.__version__)" | ||
displayName: 'Import Great Expectations' | ||
- stage: sqlalchemy_compatibility | ||
dependsOn: [scope_check, lint, import_ge] | ||
pool: | ||
vmImage: 'ubuntu-latest' | ||
|
||
jobs: | ||
- job: sqlalchemy_compatibility_postgres | ||
timeoutInMinutes: 90 | ||
condition: eq(stageDependencies.scope_check.changes.outputs['CheckChanges.GXChanged'], true) | ||
strategy: | ||
# This matrix is intended to run against the latest patch versions of | ||
# sqlalchemy minor versions that we support. | ||
# (versions as semver major.minor.patch) | ||
matrix: | ||
# Uncomment if we need 1.3.x verification | ||
# sqlalchemy_1_3_x: | ||
# sqlalchemy_base_version: '1.3.0' | ||
sqlalchemy_1_4_x: | ||
sqlalchemy_base_version: '1.4.0' | ||
# Uncomment when we are compatible with 2.0.x. | ||
# sqlalchemy_2_0_x: | ||
# sqlalchemy_base_version: '2.0.0' | ||
|
||
services: | ||
postgres: postgres | ||
|
||
steps: | ||
- task: UsePythonVersion@0 | ||
inputs: | ||
versionSpec: '3.7' | ||
displayName: 'Use Python 3.7' | ||
|
||
- script: | | ||
cp constraints-dev.txt constraints-dev-temp.txt | ||
echo "SQLAlchemy~=$(sqlalchemy_base_version)" >> constraints-dev-temp.txt | ||
pip install --constraint constraints-dev-temp.txt ".[dev]" pytest-azurepipelines | ||
displayName: 'Install dependencies using SQLAlchemy base version $(sqlalchemy_base_version)' | ||
- script: | | ||
# Run pytest | ||
pytest \ | ||
--postgresql \ | ||
--ignore 'tests/cli' \ | ||
--ignore 'tests/integration/usage_statistics' \ | ||
--napoleon-docstrings \ | ||
--junitxml=junit/test-results.xml \ | ||
--cov=. \ | ||
--cov-report=xml \ | ||
--cov-report=html \ | ||
-m 'not unit and not e2e' | ||
displayName: 'pytest' | ||
env: | ||
GE_USAGE_STATISTICS_URL: ${{ variables.GE_USAGE_STATISTICS_URL }} | ||
SQLALCHEMY_WARN_20: true | ||
- task: PublishTestResults@2 | ||
condition: succeededOrFailed() | ||
inputs: | ||
testResultsFiles: '**/test-*.xml' | ||
testRunTitle: 'Publish test results for Python $(python.version)' | ||
|
||
- task: PublishCodeCoverageResults@1 | ||
inputs: | ||
codeCoverageTool: Cobertura | ||
summaryFileLocation: '$(System.DefaultWorkingDirectory)/**/coverage.xml' | ||
reportDirectory: '$(System.DefaultWorkingDirectory)/**/htmlcov' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
export default { | ||
release_version: 'great_expectations, version 0.16.1', | ||
min_python: 'Python 3.7', | ||
max_python: 'Python 3.10' | ||
release_version: 'great_expectations, version 0.15.50', | ||
min_python: '3.7', | ||
max_python: '3.10' | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
94 changes: 94 additions & 0 deletions
94
docs/docusaurus/docs/components/_templates/_template_basic_install_and_setup.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
--- | ||
[//]: # (TODO: title: How to set up GX to work with general $TODO$) | ||
tag: [how-to, setup] | ||
[//]: # (TODO: keywords: [Great Expectations, SQL, $TODO$]) | ||
--- | ||
|
||
[//]: # (TODO: # How to set up Great Expectations to work with general $TODO$) | ||
|
||
import TechnicalTag from '/docs/term_tags/_tag.mdx'; | ||
import Prerequisites from '/docs/components/_prerequisites.jsx' | ||
|
||
<!-- ## Introduction --> | ||
import IntroInstallPythonGxAndDependencies from '/docs/components/setup/installation/_intro_python_environment_with_dependencies.mdx' | ||
|
||
<!-- ## Prerequisites --> | ||
|
||
<!-- ### 1. Check your Python version --> | ||
import PythonCheckVersion from '/docs/components/setup/python_environment/_python_check_version.mdx' | ||
|
||
<!-- ### 2. Create a Python virtual environment --> | ||
import PythonCreateVenv from '/docs/components/setup/python_environment/_python_create_venv.md' | ||
import TipPythonOrPython3Executable from '/docs/components/setup/python_environment/_tip_python_or_python3_executable.md' | ||
|
||
<!-- ### 3. Install GX with optional dependencies for ??? --> | ||
import InstallDependencies from '/docs/components/setup/dependencies/_sql_install_dependencies.mdx' | ||
|
||
<!-- ### 4. Verify that GX has been installed correctly --> | ||
import GxVerifyInstallation from '/docs/components/setup/_gx_verify_installation.md' | ||
|
||
<!-- ### 5. Initialize a Data Context to store your credentials --> | ||
import InitializeDataContextFromCli from '/docs/components/setup/data_context/_filesystem_data_context_initialize_with_cli.md' | ||
import VerifyDataContextInitializedFromCli from '/docs/components/setup/data_context/_filesystem_data_context_verify_initialization_from_cli.md' | ||
|
||
<!-- ### 6. Configure the `config_variables.yml` file with your credentials --> | ||
[//]: # (TODO: import ConfigureCredentialsInDataContext from '/docs/components/setup/dependencies/_postgresql_configure_credentials_in_config_variables_yml.md') | ||
|
||
|
||
<!-- ## Next steps --> | ||
[//]: # (TODO: import FurtherConfiguration from '/docs/components/setup/next_steps/_links_after_installing_gx.md') | ||
|
||
|
||
## Introduction | ||
|
||
[//]: # (TODO:<IntroInstallPythonGxAndDependencies dependencies="$TODO$" />) | ||
|
||
## Prerequisites | ||
|
||
<Prerequisites requirePython = {true} requireInstallation = {false} requireDataContext = {false} requireSourceData = {null} requireDatasource = {false} requireExpectationSuite = {false}> | ||
|
||
- The ability to install Python modules with pip | ||
- | ||
- A passion for data quality | ||
|
||
</Prerequisites> | ||
|
||
## Steps | ||
|
||
### 1. Check your Python version | ||
|
||
<PythonCheckVersion /> | ||
|
||
<TipPythonOrPython3Executable /> | ||
|
||
### 2. Create a Python virtual environment | ||
|
||
<PythonCreateVenv /> | ||
|
||
[//]: # (TODO: ### 3. Install GX with optional dependencies for $TODO$) | ||
|
||
[//]: # (TODO: <InstallDependencies install_key="sqlalchemy" database_name="SQL"/>) | ||
|
||
### 4. Verify that GX has been installed correctly | ||
|
||
<GxVerifyInstallation /> | ||
|
||
[//]: # (TODO: ### 5. Initialize a Data Context to store your PostgreSQL credentials) | ||
|
||
<InitializeDataContextFromCli /> | ||
|
||
:::info Verifying the Data Context initialized successfully | ||
|
||
<VerifyDataContextInitializedFromCli /> | ||
|
||
::: | ||
|
||
[//]: # (TODO: ### 6. Configure the `config_variables.yml` file with your PostgreSQL credentials) | ||
|
||
<ConfigureCredentialsInDataContext /> | ||
|
||
## Next steps | ||
|
||
<FurtherConfiguration /> | ||
|
||
|
12 changes: 12 additions & 0 deletions
12
...aurus/docs/components/connect_to_data/cloud/_abs_batching_regex_explaination.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
Your Data Asset will connect to all files that match the regex that you provide. Each matched file will become a Batch inside your Data Asset. | ||
|
||
For example: | ||
|
||
Let's say that your Azure Blob Storage container has the following files: | ||
- "yellow_tripdata_sample_2021-11.csv" | ||
- "yellow_tripdata_sample_2021-12.csv" | ||
- "yellow_tripdata_sample_2023-01.csv" | ||
|
||
If you define a Data Asset using the full file name with no regex groups, such as `"yellow_tripdata_sample_2023-01\.csv"` your Data Asset will contain only one Batch, which will correspond to that file. | ||
|
||
However, if you define a partial file name with a regex group, such as `"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2})\.csv"` your Data Asset will contain 3 Batches, one corresponding to each matched file. You can then use the keys `year` and `month` to indicate exactly which file you want to request from the available Batches. |
14 changes: 14 additions & 0 deletions
14
...us/docs/components/connect_to_data/cloud/_abs_fluent_data_asset_config_keys.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import CodeBlock from '@theme/CodeBlock'; | ||
|
||
To specify data to connect to you will need the following elements: | ||
- `name`: A name by which you can reference the Data Asset in the future. | ||
- `batching_regex`: A regular expression that indicates which files to treat as batches in your Data Asset and how to identify them. | ||
- `container`: The name of your Azure Blob Storage container. | ||
- `name_starts_with`: A string indicating what part of the batching_regex to truncate from the final batch names. | ||
|
||
```python title="Python code" | ||
asset_name = "MyTaxiDataAsset" | ||
batching_regex = r"data/taxi_yellow_tripdata_samples/yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2})\.csv" | ||
container = "superconductive-public" | ||
name_starts_with = "data/taxi_yellow_tripdata_samples/" | ||
``` |
13 changes: 13 additions & 0 deletions
13
...cusaurus/docs/components/connect_to_data/cloud/_batching_regex_explaination.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Your Data Asset will connect to all files that match the regex that you provide. Each matched file will become a Batch inside your Data Asset. | ||
|
||
For example: | ||
|
||
<p>Let's say that your {props.storage_location_type} has the following files:</p> | ||
|
||
- "yellow_tripdata_sample_2021-11.csv" | ||
- "yellow_tripdata_sample_2021-12.csv" | ||
- "yellow_tripdata_sample_2023-01.csv" | ||
|
||
If you define a Data Asset using the full file name with no regex groups, such as `"yellow_tripdata_sample_2023-01\.csv"` your Data Asset will contain only one Batch, which will correspond to that file. | ||
|
||
However, if you define a partial file name with a regex group, such as `"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2})\.csv"` your Data Asset will contain 3 Batches, one corresponding to each matched file. You can then use the keys `year` and `month` to indicate exactly which file you want to request from the available Batches. |
1 change: 1 addition & 0 deletions
1
...us/docs/components/connect_to_data/filesystem/_defining_multiple_data_assets.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Your Datasource can contain multiple Data Assets. If you have additional files to connect to, you can provide different `name` and `batching_regex` parameters to create additional Data Assets for those files in your Datasource. You can even include the same files in multiple Data Assets, if a given file matches the `batching_regex` of more than one Data Asset. |
7 changes: 7 additions & 0 deletions
7
...s/connect_to_data/filesystem/_info_filesystem_datasource_relative_base_paths.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
:::info Using relative paths as the `base_path` of a Filesystem Datasource | ||
|
||
If you are using a Filesystem Data Context you can provide a path for `base_path` that is relative to the folder containing your Data Context. | ||
|
||
However, an in-memory Ephemeral Data Context doesn't exist in a folder. Therefore, when using an Ephemeral Data Context, relative paths will be determined based on the folder your Python code is being executed in, instead. | ||
|
||
::: |
Oops, something went wrong.