Skip to content

Commit

Permalink
Merge branch 'develop' into pr/7339
Browse files Browse the repository at this point in the history
* develop: (96 commits)
  [DOCS] Updates the "Interactive Mode" guide for creating Expectations (great-expectations#7624)
  [MAINTENANCE] Utilize `NotImported` for SQLAlchemy, Google Cloud Services, Azure Blob Storage, and Spark import usage (great-expectations#7617)
  [BUGFIX] MapCondition Memory Inefficiencies in Spark (great-expectations#7626)
  [DOCS] Update overview.md (great-expectations#7627)
  [MAINTENANCE] Update `teams.yml` (great-expectations#7623)
  [BUGFIX] fix marshmallow schema for SQLAlchemy `connect_args` passthrough (great-expectations#7614)
  [RELEASE] 0.16.7 (great-expectations#7622)
  [DOCS] Corrects Heading Issue in How to host and share Data Docs on Azure Blob Storage (great-expectations#7620)
  [DOCS] Corrects Step Numbering in How to instantiate a specific Filesystem Data Context (great-expectations#7612)
  [BUGFIX] Remove spark from bic Expectations since it never worked for them (great-expectations#7619)
  [MAINTENANCE] Fluent Datasources: Eliminate redundant Datasource name and DataAsset name from dictionary and JSON configuration (great-expectations#7573)
  [DOCS] Add scripts under test for "How to create and edit Expectations with instant feedback from a sample Batch of data" (great-expectations#7615)
  [MAINTENANCE] Explicitly test relevant modules in Sqlalchemy compatibility pipeline (great-expectations#7613)
  [MAINTENANCE] Deprecate TableExpectation in favor of BatchExpectation (great-expectations#7610)
  [MAINTENANCE]  Deprecate ColumnExpectation in favor of ColumnAggregateExpectation (great-expectations#7609)
  [DOCS] Correct expectation documentation for expect_column_max_to_be_between (great-expectations#7597)
  [BUGFIX] Misc gallery bugfixes (great-expectations#7611)
  [MAINTENANCE] SqlAlchemy 2 Compatibility - `engine.execute()` (great-expectations#7469)
  [BUGFIX] `dataset_name` made optional parameter for Expectations (great-expectations#7603)
  [FEATURE] Added AssumeRole Feature (great-expectations#7547)
  ...
  • Loading branch information
Will Shin committed Apr 14, 2023
2 parents 510e49d + d7398f7 commit 30442b9
Show file tree
Hide file tree
Showing 557 changed files with 9,030 additions and 6,381 deletions.
5 changes: 4 additions & 1 deletion .git-blame-ignore-revs
Expand Up @@ -6,6 +6,9 @@
# https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html#avoiding-ruining-git-blame
# Apply noqa markers for all TCH001 violations
f5e7df1846102d9a62cc9b9110387925ffae60cc
# Apply noqa markes for all PTH (use-pathlib) violations
# Apply noqa markers for all PTH (use-pathlib) violations
# https://github.com/great-expectations/great_expectations/pull/7290
597b2b625569b6f5f110f8230ac26ab405167da6
# Apply noqa markers for TID251 (sqlalchemy) violations
# https://github.com/great-expectations/great_expectations/pull/7564
e55b3484a86f654e8b819041dd6cc73730e01a8f
58 changes: 26 additions & 32 deletions .github/teams.yml
Expand Up @@ -2,15 +2,9 @@
# To add an additional team, simply add a top-level key with a list of users.
# NOTE - this should be kept in sync with the GX org's teams

platform:
- '@NathanFarmer' # Nathan Farmer
- '@alexsherstinsky' # Alex Sherstinsky
- '@cdkini' # Chetan Kini
- '@billdirks' # Bill Dirks
- '@Kilo59' # Gabriel Gore

dx:
- '@Shinnnyshinshin' # Will Shin
- '@alexsherstinsky' # Alex Sherstinsky
- '@anthonyburdi' # Anthony Burdi
- '@kenwade4' # Ken Wade

Expand All @@ -20,35 +14,35 @@ devrel:
- '@kyleaton' # Kyle Eaton
- '@rdodev' # Ruben Orduz
- '@talagluck' # Tal Gluck
- '@tjholsman' # TJ Holsman

cloud:
- '@roblim' # Rob Lim
- '@rreinoldsc' # Robby Reinold
- '@joshua-stauffer' # Josh Stauffer
- '@dctalbot' # David Talbot
- '@wookasz' # Łukasz Lempart
- '@josectobar' # José Tobar
- '@elenajdanova' # Elena Jdanova
core:
- '@DrewHoo' # Drew Hoover
- '@lockettks' # Kim Mathieu
- '@superengi' # Saahir Foux

# Aggregates a few different teams
core-team:
# Mario
- '@NathanFarmer' # Nathan Farmer
- '@alexsherstinsky' # Alex Sherstinsky
- '@cdkini' # Chetan Kini
- '@billdirks' # Bill Dirks
- '@Kilo59' # Gabriel Gore
# Luigi
- '@NathanFarmer' # Nathan Farmer
- '@Shinnnyshinshin' # Will Shin
- '@anthonyburdi' # Anthony Burdi
- '@kenwade4' # Ken Wade
# Misc
- '@abegong' # Abe Gong
- '@jcampbell' # James Campbell
- '@donaldheppner' # Don Heppner
- '@Super-Tanner' # Tanner Beam
- '@abegong' # Abe Gong
- '@alexsherstinsky' # Alex Sherstinsky
- '@allensallinger' # Allen Sallinger
- '@anthonyburdi' # Anthony Burdi
- '@billdirks' # Bill Dirks
- '@cdkini' # Chetan Kini
- '@dctalbot' # David Talbot
- '@donaldheppner' # Don Heppner
- '@elenajdanova' # Elena Jdanova
- '@jcampbell' # James Campbell
- '@josectobar' # José Tobar
- '@joshua-stauffer' # Josh Stauffer
- '@jshaikGX' # Javed Shaik
- '@kenwade4' # Ken Wade
- '@lockettks' # Kim Mathieu
- '@roblim' # Rob Lim
- '@rreinoldsc' # Robby Reinold
- '@sujensen' # Susan Jensen
- '@tyler-hoffman' # Tyler Hoffman
- '@wookasz' # Łukasz Lempart

bot:
- '@dependabot'
- '@dependabot[bot]'
2 changes: 1 addition & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
@@ -1,5 +1,5 @@
# Each line is a file pattern followed by one or more owners.

sidebars.js @donaldheppner @Rachel-Reverie
sidebars.js @donaldheppner @Rachel-Reverie @tjholsman

great_expectations/core/usage_statistics/schemas.py @tannerbeam
2 changes: 1 addition & 1 deletion CODE_OF_CONDUCT.md
Expand Up @@ -67,7 +67,7 @@ Expectations project, both in-person and virtual.
## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting Kyle Eaton on the Great Expectations Developer Relations team at kyle@superconductive.com. All
reported by contacting Josh Zheng on the Great Expectations Developer Relations team at josh.zheng@greatexpectations.io. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The Great Expectations core team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Expand Down
66 changes: 0 additions & 66 deletions SLACK_GUIDELINES.md

This file was deleted.

182 changes: 182 additions & 0 deletions assets/scripts/gx_cloud/experimental/onboarding_script.py
@@ -0,0 +1,182 @@
import pprint

import great_expectations as gx
from great_expectations.checkpoint import Checkpoint
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.core.expectation_suite import ExpectationSuite
from great_expectations.data_context import CloudDataContext
from great_expectations.datasource import BaseDatasource
from great_expectations.exceptions import StoreBackendError
from great_expectations.validator.validator import Validator

import pandas as pd


# Create a GX Data Context
# Make sure GX_CLOUD_ACCESS_TOKEN and GX_CLOUD_ORGANIZATION_ID
# are set in your environment or config_variables.yml
context: CloudDataContext = gx.get_context(
cloud_mode=True,
)

# Set variables for creating a Datasource
datasource_name = None
data_connector_name = (
"default_runtime_data_connector_name" # Optional: Set your own data_connector_name
)
assert datasource_name, "Please set datasource_name."

# Set variable for creating an Expectation Suite
expectation_suite_name = None
assert expectation_suite_name, "Please set expectation_suite_name."

# Set variables for connecting a Validator to a Data Asset, along with a Batch of data
data_asset_name = None
assert data_asset_name, "Please set data_asset_name."
path_to_validator_batch = None # e.g. "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
assert (
path_to_validator_batch
), "Please set path_to_validator_batch. This can be a local filepath or a remote URL."

# Set variable for creating a Checkpoint
checkpoint_name = None
assert checkpoint_name, "Please set checkpoint_name."

# Set variable to get a Batch of data to validate against the new Checkpoint
path_to_batch_to_validate = None # e.g. "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
assert (
path_to_batch_to_validate
), "Please set path_to_batch_to_validate. This can be a local filepath or a remote URL."

# Create Datasource
# For simplicity, this script creates a Datasource with a PandasExecutionEngine and a RuntimeDataConnector
try:
datasource: BaseDatasource = context.get_datasource(datasource_name=datasource_name)
except ValueError:
datasource_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
class_name: PandasExecutionEngine
data_connectors:
{data_connector_name}:
class_name: RuntimeDataConnector
batch_identifiers:
- path
"""
# Test your configuration:
datasource: BaseDatasource = context.test_yaml_config(datasource_yaml)

# Save your Datasource:
datasource: BaseDatasource = context.add_or_update_datasource(datasource=datasource)

print(f"\n{20*'='}\nDatasource Config\n{20*'='}\n")
pprint.pprint(datasource.config)

# Create a new Expectation Suite
try:
expectation_suite: ExpectationSuite = context.get_expectation_suite(
expectation_suite_name=expectation_suite_name
)
expectation_suite_ge_cloud_id = expectation_suite.ge_cloud_id
except StoreBackendError:
expectation_suite: ExpectationSuite = context.add_or_update_expectation_suite(
expectation_suite_name=expectation_suite_name
)
expectation_suite_ge_cloud_id = expectation_suite.ge_cloud_id

# Connect a Batch of data to a Validator to add Expectations interactively
batch_df: pd.DataFrame = pd.read_csv(path_to_validator_batch)

batch_request = RuntimeBatchRequest(
runtime_parameters={"batch_data": batch_df},
batch_identifiers={"path": path_to_validator_batch},
datasource_name=datasource_name,
data_connector_name=data_connector_name,
data_asset_name=data_asset_name,
)
validator: Validator = context.get_validator(
expectation_suite_name=expectation_suite_name, batch_request=batch_request
)

# Add Expectations interactively using tab completion
validator.expect_column_to_exist(column="")

# Save Expectation Suite
validator.save_expectation_suite(discard_failed_expectations=False)
expectation_suite: ExpectationSuite = context.get_expectation_suite(
expectation_suite_name=expectation_suite_name
)
print(f"\n{20*'='}\nExpectation Suite\n{20*'='}\n")
pprint.pprint(expectation_suite)

# Create a new Checkpoint
try:
checkpoint: Checkpoint = context.get_checkpoint(checkpoint_name)
checkpoint_id = checkpoint.ge_cloud_id
except StoreBackendError:
checkpoint_config = {
"name": checkpoint_name,
"validations": [
{
"expectation_suite_name": expectation_suite_name,
"expectation_suite_ge_cloud_id": expectation_suite_ge_cloud_id,
"batch_request": {
"datasource_name": datasource_name,
"data_connector_name": data_connector_name,
"data_asset_name": data_asset_name,
},
}
],
"config_version": 1,
"class_name": "Checkpoint",
}

context.add_or_update_checkpoint(**checkpoint_config)
checkpoint: Checkpoint = context.get_checkpoint(checkpoint_name)
checkpoint_id = checkpoint.ge_cloud_id

print(f"\n{20*'='}\nCheckpoint Config\n{20*'='}\n")
pprint.pprint(checkpoint)

# Get a Checkpoint snippet to use in a CI script
run_checkpoint_snippet = f"""\
import pprint
import great_expectations as gx
import pandas as pd
path_to_batch_to_validate = None
assert path_to_batch_to_validate is not None, "Please set path_to_batch_to_validate. This can be a local filepath or a remote URL."
validation_df = pd.read_csv(path_to_batch_to_validate)
result = context.run_checkpoint(
ge_cloud_id="{checkpoint_id}",
batch_request={{
"runtime_parameters": {{
"batch_data": validation_df
}},
"batch_identifiers": {{
"path": path_to_batch_to_validate
}},
}}
)
ppint.pprint(result)
"""

print(f"\n{20*'='}\nCheckpoint Snippet\n{20*'='}\n")
print(run_checkpoint_snippet)

# Run the Checkpoint:
validation_df: pd.DataFrame = pd.read_csv(path_to_batch_to_validate)

result = context.run_checkpoint(
ge_cloud_id=checkpoint_id,
batch_request={
"runtime_parameters": {"batch_data": validation_df},
"batch_identifiers": {"path": path_to_batch_to_validate},
},
)

print(f"\n{20*'='}\nValidation Result\n{20*'='}\n")
pprint.pprint(result)
6 changes: 3 additions & 3 deletions ci/azure-pipelines-cloud-integration.yml
Expand Up @@ -22,7 +22,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
- bash: python -m pip install --upgrade pip
displayName: 'Update pip'

# includes explicit install of chardet, which was causing errors in pipeline
Expand Down Expand Up @@ -84,7 +84,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
- bash: python -m pip install --upgrade pip
displayName: 'Update pip'

# includes explicit install of chardet, which was causing errors in pipeline
Expand Down Expand Up @@ -126,7 +126,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
- bash: python -m pip install --upgrade pip
displayName: 'Update pip'

# includes explicit install of grpcio-status and chardet, which was causing errors in pipeline
Expand Down
4 changes: 2 additions & 2 deletions ci/azure-pipelines-contrib.yml
Expand Up @@ -65,7 +65,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
- bash: python -m pip install --upgrade pip
displayName: 'Update pip'

- job: deploy_experimental
Expand All @@ -78,7 +78,7 @@ stages:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'

- bash: python -m pip install --upgrade pip==20.2.4
- bash: python -m pip install --upgrade pip
displayName: 'Update pip'

- script: |
Expand Down

0 comments on commit 30442b9

Please sign in to comment.