Skip to content

Commit

Permalink
Merge branch 'develop' into cloud-actions
Browse files Browse the repository at this point in the history
  • Loading branch information
kwcanuck committed May 17, 2024
2 parents 825605c + b7c4291 commit 7f61b00
Show file tree
Hide file tree
Showing 210 changed files with 580 additions and 3,173 deletions.
2 changes: 1 addition & 1 deletion docs/docusaurus/docs/components/_data.jsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
export default {
release_version: 'great_expectations, version 1.0.0a3',
release_version: 'great_expectations, version 1.0.0a4',
min_python: '3.8',
max_python: '3.11'
}
73 changes: 73 additions & 0 deletions docs/docusaurus/docs/oss/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,79 @@ slug: /core/changelog
- Deprecation warnings are accompanied by a moniker (as a code comment) indicating when they were deprecated. For example: `# deprecated-v0.13`
- Changes to methods and parameters due to deprecation are also noted in the relevant docstrings.

### 1.0.0a4
* [FEATURE] Remove ExpectationSuite.execution_engine_type ([#9841](https://github.com/great-expectations/great_expectations/pull/9841))
* [FEATURE] Directory Asset BatchDefinition API ([#9874](https://github.com/great-expectations/great_expectations/pull/9874))
* [FEATURE] DirectoryAsset BatchDefinition API ([#9888](https://github.com/great-expectations/great_expectations/pull/9888))
* [FEATURE] update slack renderer to new design ([#9919](https://github.com/great-expectations/great_expectations/pull/9919))
* [BUGFIX] Make column_index optional ([#9860](https://github.com/great-expectations/great_expectations/pull/9860))
* [BUGFIX] fix sqlalchemy import ([#9872](https://github.com/great-expectations/great_expectations/pull/9872))
* [BUGFIX] Ensure that `SlackNotificationAction` renders properly ([#9885](https://github.com/great-expectations/great_expectations/pull/9885))
* [BUGFIX] Patch issue with `SlackNotificationAction` header rendering ([#9903](https://github.com/great-expectations/great_expectations/pull/9903))
* [DOCS] Remove Instances of Test Connection from the GX Cloud Docs ([#9815](https://github.com/great-expectations/great_expectations/pull/9815))
* [DOCS] Remove Query Asset Content from GX Cloud Docs ([#9802](https://github.com/great-expectations/great_expectations/pull/9802))
* [DOCS] added discourse to OSS support ([#9847](https://github.com/great-expectations/great_expectations/pull/9847))
* [DOCS] Update get support ([#9852](https://github.com/great-expectations/great_expectations/pull/9852))
* [DOCS] Minor Updates to GX Cloud Expectations Topics ([#9884](https://github.com/great-expectations/great_expectations/pull/9884))
* [DOCS] Gx 1.0 Introductory content initial reorganization (take 2) ([#9869](https://github.com/great-expectations/great_expectations/pull/9869))
* [DOCS] Minor GX Cloud Docs Fixes ([#9892](https://github.com/great-expectations/great_expectations/pull/9892))
* [DOCS] Revises the GX component overview for GX 1.0 ([#9896](https://github.com/great-expectations/great_expectations/pull/9896))
* [DOCS] Change the texts of the "Was this Helpful?" widget ([#9905](https://github.com/great-expectations/great_expectations/pull/9905))
* [DOCS] Updates to About Great Expectations and Community Resources (OSS) ([#9912](https://github.com/great-expectations/great_expectations/pull/9912))
* [DOCS] Update 0.18 changelog ([#9914](https://github.com/great-expectations/great_expectations/pull/9914))
* [DOCS] Minor Edits to Connect GX Cloud to PostgreSQL (GX Cloud) ([#9927](https://github.com/great-expectations/great_expectations/pull/9927))
* [DOCS] Revise "Try GX" for GX 1.0 ([#9897](https://github.com/great-expectations/great_expectations/pull/9897))
* [DOCS] reorganizes content under the 1.0 Set up a GX environment topic ([#9930](https://github.com/great-expectations/great_expectations/pull/9930))
* [MAINTENANCE] Ruff 0.4.2 ([#9833](https://github.com/great-expectations/great_expectations/pull/9833))
* [MAINTENANCE] Enable SIM110 ([#9836](https://github.com/great-expectations/great_expectations/pull/9836))
* [MAINTENANCE] Enable SIM211 ([#9832](https://github.com/great-expectations/great_expectations/pull/9832))
* [MAINTENANCE] Enable SIM300 ([#9834](https://github.com/great-expectations/great_expectations/pull/9834))
* [MAINTENANCE] Enable SIM201 ([#9835](https://github.com/great-expectations/great_expectations/pull/9835))
* [MAINTENANCE] Delete dataset directory. ([#9842](https://github.com/great-expectations/great_expectations/pull/9842))
* [MAINTENANCE] Finish removing data asset top level package ([#9843](https://github.com/great-expectations/great_expectations/pull/9843))
* [MAINTENANCE] Make `SerializableDataContext.create` private ([#9853](https://github.com/great-expectations/great_expectations/pull/9853))
* [MAINTENANCE] mypy 1.10 ([#9857](https://github.com/great-expectations/great_expectations/pull/9857))
* [MAINTENANCE] Make `ExpectationSuite` importable from the top level GX namespace ([#9854](https://github.com/great-expectations/great_expectations/pull/9854))
* [MAINTENANCE] Remove block style datasource and batch from public api ([#9858](https://github.com/great-expectations/great_expectations/pull/9858))
* [MAINTENANCE] Remove LegacyDatasource ([#9848](https://github.com/great-expectations/great_expectations/pull/9848))
* [MAINTENANCE] set marker tests to not fail fast ([#9862](https://github.com/great-expectations/great_expectations/pull/9862))
* [MAINTENANCE] Ensure Spark can start ([#9866](https://github.com/great-expectations/great_expectations/pull/9866))
* [MAINTENANCE] Remove test_yaml_config and all integration tests that … ([#9861](https://github.com/great-expectations/great_expectations/pull/9861))
* [MAINTENANCE] Actually remove LegacyDatasource ([#9867](https://github.com/great-expectations/great_expectations/pull/9867))
* [MAINTENANCE] Remove `DataAssistants` ([#9859](https://github.com/great-expectations/great_expectations/pull/9859))
* [MAINTENANCE] Remove public decorator from anything BatchRequest related ([#9871](https://github.com/great-expectations/great_expectations/pull/9871))
* [MAINTENANCE] Skip unsupported time metric (1.0) ([#9856](https://github.com/great-expectations/great_expectations/pull/9856))
* [MAINTENANCE] enable tests ([#9865](https://github.com/great-expectations/great_expectations/pull/9865))
* [MAINTENANCE] Integration test around pandas ABS partitioning ([#9837](https://github.com/great-expectations/great_expectations/pull/9837))
* [MAINTENANCE] Integration tests around s3 batches ([#9846](https://github.com/great-expectations/great_expectations/pull/9846))
* [MAINTENANCE] GCS Integration tests around partitioning ([#9839](https://github.com/great-expectations/great_expectations/pull/9839))
* [MAINTENANCE] Update context factories to delete by name ([#9870](https://github.com/great-expectations/great_expectations/pull/9870))
* [MAINTENANCE] `ExpectationSuite` API cleanup ([#9875](https://github.com/great-expectations/great_expectations/pull/9875))
* [MAINTENANCE] Remove yaml config validator again ([#9877](https://github.com/great-expectations/great_expectations/pull/9877))
* [MAINTENANCE] Remove dataconnector tests that reference block style D… ([#9879](https://github.com/great-expectations/great_expectations/pull/9879))
* [MAINTENANCE] Remove some references to block style datasource ([#9868](https://github.com/great-expectations/great_expectations/pull/9868))
* [MAINTENANCE] Remove URN support ([#9886](https://github.com/great-expectations/great_expectations/pull/9886))
* [MAINTENANCE] Rename core partitioners ([#9894](https://github.com/great-expectations/great_expectations/pull/9894))
* [MAINTENANCE] Remove legacy `GeCloudStoreBackend` ([#9893](https://github.com/great-expectations/great_expectations/pull/9893))
* [MAINTENANCE] FileDataAsset BatchDefinition API accepts either `str` or `re.Pattern` ([#9895](https://github.com/great-expectations/great_expectations/pull/9895))
* [MAINTENANCE] Instrument validation workflows ([#9889](https://github.com/great-expectations/great_expectations/pull/9889))
* [MAINTENANCE] Remove remaining references to block style datasources ([#9881](https://github.com/great-expectations/great_expectations/pull/9881))
* [MAINTENANCE] Refactor legacy `anonymous_usage_statistics` into new top-level fields ([#9891](https://github.com/great-expectations/great_expectations/pull/9891))
* [MAINTENANCE] Remove suite CRUD from data_context ([#9890](https://github.com/great-expectations/great_expectations/pull/9890))
* [MAINTENANCE] Ensure that actions have names ([#9902](https://github.com/great-expectations/great_expectations/pull/9902))
* [MAINTENANCE] Remove simple sqlalchemy datasource ([#9900](https://github.com/great-expectations/great_expectations/pull/9900))
* [MAINTENANCE] Remove references to suite crud ([#9907](https://github.com/great-expectations/great_expectations/pull/9907))
* [MAINTENANCE] Improve error message around instantiating and saving s… ([#9908](https://github.com/great-expectations/great_expectations/pull/9908))
* [MAINTENANCE] Remove BaseDatasource ([#9901](https://github.com/great-expectations/great_expectations/pull/9901))
* [MAINTENANCE] Remove DatasourceConfig ([#9916](https://github.com/great-expectations/great_expectations/pull/9916))
* [MAINTENANCE] Ruff `0.4.4` ([#9918](https://github.com/great-expectations/great_expectations/pull/9918))
* [MAINTENANCE] Update packaging pipeline to work on 1.0 ([#9922](https://github.com/great-expectations/great_expectations/pull/9922))
* [MAINTENANCE] Remove legacy DataConnectors ([#9923](https://github.com/great-expectations/great_expectations/pull/9923))
* [MAINTENANCE] Remove batch kwargs ([#9932](https://github.com/great-expectations/great_expectations/pull/9932))
* [MAINTENANCE] Move `convert_to_json_serializable` to top-level utils package ([#9933](https://github.com/great-expectations/great_expectations/pull/9933))
* [MAINTENANCE] Ban future use of `convert_to_json_serializable` ([#9935](https://github.com/great-expectations/great_expectations/pull/9935))
* [MAINTENANCE] Remove batching regex from FilePathDataConnector ([#9898](https://github.com/great-expectations/great_expectations/pull/9898))

### 1.0.0a3
* [FEATURE] Add Regex Partitioner ([#9792](https://github.com/great-expectations/great_expectations/pull/9792))
* [FEATURE] Fluent BatchDefinition API for Pandas Assets ([#9820](https://github.com/great-expectations/great_expectations/pull/9820))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,14 +255,15 @@
# </snippet>

# <snippet name="docs/docusaurus/docs/oss/guides/connecting_to_your_data/fluent/database/gcp_deployment_patterns_file_gcs.py asset">
data_asset = datasource.add_csv_asset(
name="csv_taxi_gcs_asset", batching_regex=batching_regex, gcs_prefix=prefix
)
data_asset = datasource.add_csv_asset(name="csv_taxi_gcs_asset", gcs_prefix=prefix)
# </snippet>

# <snippet name="docs/docusaurus/docs/oss/guides/connecting_to_your_data/fluent/database/gcp_deployment_patterns_file_gcs.py batch_request">
batch_request = data_asset.build_batch_request(
options={
batch_definition = data_asset.add_batch_definition_monthly(
name="Monthly Taxi Data", regex=batching_regex
)
batch_request = batch_definition.build_batch_request(
batch_parameters={
"month": "03",
}
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@
# <snippet name="docs/docusaurus/docs/oss/guides/connecting_to_your_data/fluent/filesystem/how_to_connect_to_data_on_azure_blob_storage_using_pandas.py add_asset">
data_asset = datasource.add_csv_asset(
name=asset_name,
batching_regex=batching_regex,
abs_container=abs_container,
abs_name_starts_with=abs_name_starts_with,
)
Expand All @@ -52,10 +51,13 @@

assert datasource.get_asset_names() == {"my_taxi_data_asset"}

my_batch_request = data_asset.build_batch_request({"year": "2019", "month": "03"})
batches = data_asset.get_batch_list_from_batch_request(my_batch_request)
assert len(batches) == 1
assert set(batches[0].columns()) == {
batch_definition = data_asset.add_batch_definition_monthly(
name="Monthly Taxi Data", regex=batching_regex
)
batch_parameters = {"year": "2019", "month": "03"}
batch = batch_definition.get_batch(batch_parameters=batch_parameters)

assert set(batch.columns()) == {
"vendor_id",
"pickup_datetime",
"dropoff_datetime",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,19 @@
asset_name = "my_taxi_data_asset"
gcs_prefix = "data/taxi_yellow_tripdata_samples/"
batching_regex = r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2})\.csv"
data_asset = datasource.add_csv_asset(
name=asset_name, batching_regex=batching_regex, gcs_prefix=gcs_prefix
)
data_asset = datasource.add_csv_asset(name=asset_name, gcs_prefix=gcs_prefix)
# </snippet>

assert data_asset

assert datasource.get_asset_names() == {"my_taxi_data_asset"}

my_batch_request = data_asset.build_batch_request({"year": "2019", "month": "03"})
my_batch_definition = data_asset.add_batch_definition_monthly(
name="Monthly Taxi Data", regex=batching_regex
)
my_batch_request = my_batch_definition.build_batch_request(
{"year": "2019", "month": "03"}
)
batches = data_asset.get_batch_list_from_batch_request(my_batch_request)
assert len(batches) == 1
assert set(batches[0].columns()) == {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,19 @@
asset_name = "my_taxi_data_asset"
s3_prefix = "data/taxi_yellow_tripdata_samples/"
batching_regex = r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2})\.csv"
data_asset = datasource.add_csv_asset(
name=asset_name, batching_regex=batching_regex, s3_prefix=s3_prefix
data_asset = datasource.add_csv_asset(name=asset_name, s3_prefix=s3_prefix)
batch_definition = data_asset.add_batch_definition_monthly(
"Monthly Taxi Data", regex=batching_regex
)
# </snippet>

assert data_asset

assert datasource.get_asset_names() == {"my_taxi_data_asset"}

my_batch_request = data_asset.build_batch_request({"year": "2019", "month": "03"})
batches = data_asset.get_batch_list_from_batch_request(my_batch_request)
assert len(batches) == 1
assert set(batches[0].columns()) == {
batch_parameters = {"year": "2019", "month": "03"}
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
assert set(batch.columns()) == {
"vendor_id",
"pickup_datetime",
"dropoff_datetime",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,20 @@

# Python
# <snippet name="docs/docusaurus/docs/oss/guides/connecting_to_your_data/fluent/filesystem/how_to_connect_to_one_or_more_files_using_spark.py add_asset">
datasource.add_csv_asset(
name=asset_name, batching_regex=batching_regex, header=True, infer_schema=True
)
datasource.add_csv_asset(name=asset_name, header=True, infer_schema=True)
# </snippet>

assert datasource.get_asset_names() == {"my_taxi_data_asset"}

my_asset = datasource.get_asset(asset_name)
assert my_asset

my_batch_request = my_asset.build_batch_request({"year": "2019", "month": "03"})
my_batch_definition = my_asset.add_batch_definition_monthly(
name="my_monthly_batch_definition", regex=batching_regex
)
my_batch_request = my_batch_definition.build_batch_request(
batch_parameters={"year": "2019", "month": "03"}
)
batches = my_asset.get_batch_list_from_batch_request(my_batch_request)
assert len(batches) == 1
assert set(batches[0].columns()) == {
Expand Down
11 changes: 8 additions & 3 deletions docs/docusaurus/docs/snippets/aws_cloud_storage_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,17 +229,22 @@
# <snippet name="docs/docusaurus/docs/snippets/aws_cloud_storage_pandas.py get_pandas_s3_asset">
asset = datasource.add_csv_asset(
name="csv_taxi_s3_asset",
batching_regex=r".*_(?P<year>\d{4})\.csv",
)
batch_definition = asset.add_batch_definition_yearly(
name="Yearly Taxi Data",
regex=r".*_(?P<year>\d{4})\.csv",
)
# </snippet>

# <snippet name="docs/docusaurus/docs/snippets/aws_cloud_storage_pandas.py get_batch_request">
request = asset.build_batch_request({"year": "2021"})
batch_parameters = {"year": "2021"}
request = batch_definition.build_batch_request(batch_parameters=batch_parameters)
# </snippet>


# <snippet name="docs/docusaurus/docs/snippets/aws_cloud_storage_pandas.py get_batch_list">
batches = asset.get_batch_list_from_batch_request(request)
batch_parameters = {"year": "2021"}
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
# </snippet>

config = context.fluent_datasources["s3_datasource"].yaml()
Expand Down
11 changes: 8 additions & 3 deletions docs/docusaurus/docs/snippets/batch_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,16 @@
# The batching_regex should max file names in the data_directory
asset = datasource.add_csv_asset(
name="csv_asset",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["year", "month"],
)

batch_request = asset.build_batch_request(options={"year": "2019", "month": "02"})
batch_definition = asset.add_batch_definition_monthly(
name="monthly_batch_definition",
regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
)
batch_request = batch_definition.build_batch_request(
batch_parameters={"year": "2019", "month": "02"}
)
# </snippet>

assert batch_request.datasource_name == "my_pandas_datasource"
Expand All @@ -43,4 +48,4 @@
print(options)
# </snippet>

assert set(options) == {"year", "month", "path"}
assert set(options) == {"path"}
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,11 @@

my_asset = my_datasource.add_csv_asset(
name="my_asset",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["year", "month"],
)
my_batch_definition = my_asset.add_batch_definition_monthly(
name="my_batch_definition",
regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
sort_ascending=True,
)

import pandas as pd
Expand All @@ -51,14 +54,16 @@

# Python
# <snippet name="docs/docusaurus/docs/snippets/get_existing_data_asset_from_existing_datasource_pandas_filesystem_example.py my_batch_parameters">
print(my_asset.get_batch_parameters_keys())
print(my_asset.get_batch_parameters_keys(partitioner=my_batch_definition.partitioner))
# </snippet>

assert my_asset.get_batch_parameters_keys() == ("year", "month", "path")
assert my_asset.get_batch_parameters_keys(
partitioner=my_batch_definition.partitioner
) == ("path", "year", "month")

# Python
# <snippet name="docs/docusaurus/docs/snippets/get_existing_data_asset_from_existing_datasource_pandas_filesystem_example.py my_batch_request">
my_batch_request = my_asset.build_batch_request()
my_batch_request = my_batch_definition.build_batch_request()
# </snippet>

assert my_batch_request.datasource_name == "my_datasource"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import pathlib
from abc import ABC
from functools import singledispatchmethod
from typing import TYPE_CHECKING, Generic, Optional, Pattern
from typing import TYPE_CHECKING, Generic, Optional

from great_expectations import exceptions as gx_exceptions
from great_expectations._docs_decorators import public_api
Expand All @@ -17,7 +17,7 @@
ColumnPartitionerYearly,
)
from great_expectations.datasource.fluent import BatchRequest
from great_expectations.datasource.fluent.constants import _DATA_CONNECTOR_NAME, MATCH_ALL_PATTERN
from great_expectations.datasource.fluent.constants import _DATA_CONNECTOR_NAME
from great_expectations.datasource.fluent.data_asset.path.dataframe_partitioners import (
DataframePartitioner,
DataframePartitionerDaily,
Expand All @@ -41,10 +41,6 @@ class DirectoryDataAsset(PathDataAsset[DatasourceT, ColumnPartitioner], Generic[
"""Base class for PathDataAssets which batch by combining the contents of a directory."""

data_directory: pathlib.Path
# todo: remove. this is included to allow for an incremental refactor.
batching_regex: Pattern = ( # must use typing.Pattern for pydantic < v1.10
MATCH_ALL_PATTERN
)

@public_api
def add_batch_definition_daily(self, name: str, column: str) -> BatchDefinition:
Expand Down

0 comments on commit 7f61b00

Please sign in to comment.