Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] ValidationConfigStore #9523

Merged
merged 39 commits into from
Feb 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
44172f8
start impl
cdkini Feb 23, 2024
2b92cd4
update init
cdkini Feb 23, 2024
008bd50
remove type ignore
cdkini Feb 23, 2024
31aafb3
patch tests
cdkini Feb 23, 2024
be70160
patch test
cdkini Feb 26, 2024
3e9a833
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini Feb 27, 2024
295eba0
start tests
cdkini Feb 27, 2024
fe3ebfe
more tests
cdkini Feb 27, 2024
ddddc8d
simplify store
cdkini Feb 27, 2024
9fa6eba
get tests passing
cdkini Feb 27, 2024
0997f8b
plug in store to factory
cdkini Feb 27, 2024
d62cd3b
remove debug call
cdkini Feb 27, 2024
8f088d6
remove extraneous init
cdkini Feb 27, 2024
3771806
misc updates
cdkini Feb 27, 2024
764ad93
from __future__
cdkini Feb 27, 2024
537def9
Update great_expectations/data_context/data_context/abstract_data_con…
cdkini Feb 27, 2024
3eba6a4
remove forward refs
cdkini Feb 27, 2024
61a1c23
mypy
cdkini Feb 27, 2024
76700d8
patch spark test
cdkini Feb 27, 2024
dffbbe1
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini Feb 28, 2024
b7d1072
add id
cdkini Feb 28, 2024
a216c92
fix unit tests
cdkini Feb 28, 2024
2c766f9
patch cloud test
cdkini Feb 28, 2024
07acaea
update validation_config_store prop
cdkini Feb 28, 2024
f137989
add property
cdkini Feb 28, 2024
8a948fb
misc updates
cdkini Feb 28, 2024
ec7c5f8
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini Feb 29, 2024
947d901
add default stores
cdkini Feb 29, 2024
72bdf51
update store backend defaults and tests
cdkini Feb 29, 2024
6882737
mypy
cdkini Feb 29, 2024
deb1076
ephemeral defaults
cdkini Feb 29, 2024
695cc17
update templates
cdkini Feb 29, 2024
182edab
patch s3 tests
cdkini Feb 29, 2024
a65ce0b
patch cloud tests
cdkini Feb 29, 2024
c533188
patch cloud tests
cdkini Feb 29, 2024
f987bef
fix docs tests
cdkini Feb 29, 2024
22e5d55
Merge branch 'develop' of https://github.com/great-expectations/great…
cdkini Feb 29, 2024
1c256ca
update validator
cdkini Feb 29, 2024
a82f189
update docs tests
cdkini Feb 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"evaluation_parameter_store",
"validations_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down Expand Up @@ -124,6 +125,7 @@
"expectations_store",
"expectations_GCS_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
"evaluation_parameter_store",
"validations_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down Expand Up @@ -126,6 +127,7 @@
"expectations_store",
"expectations_GCS_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down
2 changes: 2 additions & 0 deletions docs/docusaurus/docs/snippets/aws_cloud_storage_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
"evaluation_parameter_store",
"validations_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down Expand Up @@ -110,6 +111,7 @@
"expectations_store",
"expectations_S3_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"evaluation_parameter_store",
"validations_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down Expand Up @@ -111,6 +112,7 @@
"expectations_store",
"expectations_S3_store",
"profiler_store",
"validation_config_store",
]
for store in pop_stores:
stores.pop(store)
Expand Down
2 changes: 1 addition & 1 deletion great_expectations/core/expectation_suite.py
Original file line number Diff line number Diff line change
Expand Up @@ -1207,4 +1207,4 @@ def _convert_uuids_to_str(self, data, **kwargs):
return data


expectationSuiteSchema = ExpectationSuiteSchema()
expectationSuiteSchema: ExpectationSuiteSchema = ExpectationSuiteSchema()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This type can be inferred and is the same as the constructor so you don't need this annotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weirdly enough I got a mypy error without this (when using it in the custom Pydantic encoder below)

10 changes: 8 additions & 2 deletions great_expectations/core/factory/validation_factory.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
from __future__ import annotations

from typing import TYPE_CHECKING

from great_expectations._docs_decorators import public_api
from great_expectations.compatibility.typing_extensions import override
from great_expectations.core.factory.factory import Factory
from great_expectations.core.validation_config import ValidationConfig

if TYPE_CHECKING:
from great_expectations.data_context.store.validation_config_store import (
ValidationConfigStore,
)


# TODO: Add analytics as needed
class ValidationFactory(Factory[ValidationConfig]):
def __init__(self, store) -> None:
# TODO: Update type hints when new ValidationConfigStore is implemented
def __init__(self, store: ValidationConfigStore) -> None:
self._store = store

@public_api
Expand Down
32 changes: 24 additions & 8 deletions great_expectations/core/validation_config.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
from __future__ import annotations

from typing import TYPE_CHECKING
from typing import Union

from great_expectations._docs_decorators import public_api
from great_expectations.compatibility.pydantic import BaseModel

if TYPE_CHECKING:
from great_expectations.core.batch_config import BatchConfig
from great_expectations.core.expectation_suite import ExpectationSuite

# from great_expectations.datasource.fluent.interfaces import DataAsset
from great_expectations.compatibility.pydantic import BaseModel, validator
from great_expectations.core.batch_config import BatchConfig # noqa: TCH001
from great_expectations.core.expectation_suite import (
ExpectationSuite,
expectationSuiteSchema,
)


class ValidationConfig(BaseModel):
Expand All @@ -23,9 +22,26 @@ class ValidationConfig(BaseModel):

"""

class Config:
arbitrary_types_allowed = True
json_encoders = {
ExpectationSuite: lambda v: expectationSuiteSchema.dump(v),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should add a comment on the why here in the code.

}
Comment on lines +26 to +29
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrently using Marshmallow and Pydantic results in something like this. We need custom encoders/decoders but this should work as intended


name: str
data: BatchConfig # TODO: Should support a union of Asset | BatchConfig
suite: ExpectationSuite
id: Union[str, None] = None

@validator("suite", pre=True)
def _validate_suite(cls, v):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above - I think it's worth a docstring explaining this is just because the suite isn't pydantic.

if isinstance(v, dict):
return ExpectationSuite(**expectationSuiteSchema.load(v))
elif isinstance(v, ExpectationSuite):
return v
raise ValueError(
"Suite must be a dictionary (if being deserialized) or an ExpectationSuite object."
)

@public_api
def run(self):
Expand Down
1 change: 1 addition & 0 deletions great_expectations/data_context/cloud_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ class GXCloudRESTResource(str, Enum):
PROFILER = "profiler"
RENDERED_DATA_DOC = "rendered_data_doc"
VALIDATION_RESULT = "validation_result"
VALIDATION_CONFIG = "validation_config"
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,9 @@
DataDocsSiteConfigTypedDict,
StoreConfigTypedDict,
)
from great_expectations.data_context.store.validation_config_store import (
ValidationConfigStore,
)
from great_expectations.data_context.store.validations_store import ValidationsStore
from great_expectations.data_context.types.resource_identifiers import (
GXCloudIdentifier,
Expand Down Expand Up @@ -331,8 +334,9 @@ def _init_factories(self) -> None:
context=self,
)

# TODO: Update to follow existing pattern once new ValidationConfigStore is implemented
self._validations: ValidationFactory | None = None
self._validations: ValidationFactory = ValidationFactory(
store=self.validation_config_store
)

def _init_analytics(self) -> None:
init_analytics(
Expand Down Expand Up @@ -617,6 +621,13 @@ def validations_store_name(self, value: str) -> None:
def validations_store(self) -> ValidationsStore:
return self.stores[self.validations_store_name]

@property
def validation_config_store(self) -> ValidationConfigStore:
cdkini marked this conversation as resolved.
Show resolved Hide resolved
# Purposely not exposing validation_config_store_name as a user-configurable property
return self.stores[
DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value
]

@property
def checkpoint_store_name(self) -> Optional[str]:
from great_expectations.data_context.store.checkpoint_store import (
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ class SerializableDataContext(AbstractDataContext):
DataContextConfigDefaults.EXPECTATIONS_BASE_DIRECTORY.value,
DataContextConfigDefaults.PLUGINS_BASE_DIRECTORY.value,
DataContextConfigDefaults.PROFILERS_BASE_DIRECTORY.value,
DataContextConfigDefaults.VALIDATION_CONFIGS_BASE_DIRECTORY.value,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to enable proper filesystem scaffolding

GX_UNCOMMITTED_DIR,
]
GX_DIR: ClassVar[str] = "gx"
Expand Down
1 change: 1 addition & 0 deletions great_expectations/data_context/store/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,4 @@
from .profiler_store import ProfilerStore # isort:skip
from .data_context_store import DataContextStore # isort:skip
from .data_asset_store import DataAssetStore # isort:skip
from .validation_config_store import ValidationConfigStore # isort:skip
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta):
GXCloudRESTResource.PROFILER: "profiler",
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_doc",
GXCloudRESTResource.VALIDATION_RESULT: "result",
GXCloudRESTResource.VALIDATION_CONFIG: "validation_config",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates to enable proper Cloud integration (purely mocking for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the enum can be modified/restructured if we have to list all these enums and define a new value for them. This makes it appear that our enum/abstraction is wrong.

}

ALLOWED_SET_KWARGS_BY_RESOURCE_TYPE: Dict[GXCloudRESTResource, Set[str]] = {
Expand All @@ -157,6 +158,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta):
GXCloudRESTResource.PROFILER: "profilers",
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_docs",
GXCloudRESTResource.VALIDATION_RESULT: "validation_results",
GXCloudRESTResource.VALIDATION_CONFIG: "validation_configs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here.

}
)

Expand Down
39 changes: 39 additions & 0 deletions great_expectations/data_context/store/validation_config_store.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from __future__ import annotations

from great_expectations.compatibility.typing_extensions import override
from great_expectations.core.data_context_key import StringKey
from great_expectations.core.validation_config import ValidationConfig
from great_expectations.data_context.cloud_constants import GXCloudRESTResource
from great_expectations.data_context.store.store import Store
from great_expectations.data_context.types.resource_identifiers import (
GXCloudIdentifier,
)


class ValidationConfigStore(Store):
_key_class = StringKey

def get_key(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing the bare minimum required in the factory. Tests should reflect this as well

self, name: str, id: str | None = None
) -> GXCloudIdentifier | StringKey:
"""Given a name and optional ID, build the correct key for use in the ValidationConfigStore."""
if self.cloud_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but we need to get away from these cloud mode checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It's poorly encapsulated but I couldn't think of a better way to get both file and cloud working

return GXCloudIdentifier(
resource_type=GXCloudRESTResource.VALIDATION_CONFIG,
id=id,
resource_name=name,
)
return StringKey(key=name)

@override
def serialize(self, value):
if self.cloud_mode:
data = value.dict()
data["suite"] = data["suite"].to_json_dict()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another necessity due to Marshmallow

return data

return value.json()

@override
def deserialize(self, value):
return ValidationConfig.parse_raw(value)
8 changes: 8 additions & 0 deletions great_expectations/data_context/templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@ def dump(self, data, stream=None, **kw):
PROFILER_STORE_STRING = yaml.dump(
{"profiler_store": DataContextConfigDefaults.DEFAULT_STORES.value["profiler_store"]}
).replace("\n", "\n ")[:-2]
VALIDATION_CONFIG_STORE_STRING = yaml.dump(
{
"validation_config_store": DataContextConfigDefaults.DEFAULT_STORES.value[
"validation_config_store"
]
}
).replace("\n", "\n ")[:-2]

PROJECT_OPTIONAL_CONFIG_COMMENT = (
CONFIG_VARIABLES_INTRO
Expand All @@ -130,6 +137,7 @@ def dump(self, data, stream=None, **kw):
{EVALUATION_PARAMETER_STORE_STRING}
{CHECKPOINT_STORE_STRING}
{PROFILER_STORE_STRING}
{VALIDATION_CONFIG_STORE_STRING}
expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
Expand Down