New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] ValidationConfigStore
#9523
Conversation
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
…_expectations into m/v1-178/validation_config_store
@@ -48,6 +48,7 @@ class SerializableDataContext(AbstractDataContext): | |||
DataContextConfigDefaults.EXPECTATIONS_BASE_DIRECTORY.value, | |||
DataContextConfigDefaults.PLUGINS_BASE_DIRECTORY.value, | |||
DataContextConfigDefaults.PROFILERS_BASE_DIRECTORY.value, | |||
DataContextConfigDefaults.VALIDATION_CONFIGS_BASE_DIRECTORY.value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates to enable proper filesystem scaffolding
@@ -133,6 +133,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta): | |||
GXCloudRESTResource.PROFILER: "profiler", | |||
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_doc", | |||
GXCloudRESTResource.VALIDATION_RESULT: "result", | |||
GXCloudRESTResource.VALIDATION_CONFIG: "validation_config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates to enable proper Cloud integration (purely mocking for now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the enum can be modified/restructured if we have to list all these enums and define a new value for them. This makes it appear that our enum/abstraction is wrong.
runtime_environment=runtime_environment, | ||
) | ||
|
||
def get_key( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing the bare minimum required in the factory. Tests should reflect this as well
arbitrary_types_allowed = True | ||
json_encoders = { | ||
ExpectationSuite: lambda v: expectationSuiteSchema.dump(v), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrently using Marshmallow and Pydantic results in something like this. We need custom encoders/decoders but this should work as intended
@property | ||
def validation_config_store_name(self) -> str: | ||
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value | ||
|
||
def validaiton_config_store(self) -> ValidationConfigStore: | ||
return self.stores[self.validation_config_store_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't sure about this but deciding to not make validation_config_store_name
a context variable. I've wavered on it but I'm now confident that store naming just causes more harm than benefit.
An argument against this is that it is a deviation from the existing pattern and perhaps consistency is more important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually not pro-deviation, but I think in this case it's probably reasonable. It'd require minimal changes if we changed our mind later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know this key will always be present?
What is the harm you see from making this a context variable? By context variable, do you mean a top level property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding it as a context variable, its another field that a user can modify.
These top level store names are dangerous in that they can enable the following:
context.checkpoint_store_name = "my_nonexistent_store"
...
context.checkpoint_store # Raises an error
context.checkpoints.add(...) # Raises an error
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow customization of the names but don't automatically update the stores
dict, resulting in the possibility of orphaning a store by losing its name.
Seems relatively unlikely since we don't advertise this much in docs/public API.
We should always have a ValidationConfigStore
but the name may vary.
def serialize(self, value): | ||
if self.cloud_mode: | ||
data = value.dict() | ||
data["suite"] = data["suite"].to_json_dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another necessity due to Marshmallow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I don't have the best context with respect to stores. I've asked some questions inline.
great_expectations/data_context/data_context/abstract_data_context.py
Outdated
Show resolved
Hide resolved
great_expectations/data_context/data_context/abstract_data_context.py
Outdated
Show resolved
Hide resolved
@property | ||
def validation_config_store_name(self) -> str: | ||
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value | ||
|
||
def validaiton_config_store(self) -> ValidationConfigStore: | ||
return self.stores[self.validation_config_store_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know this key will always be present?
What is the harm you see from making this a context variable? By context variable, do you mean a top level property?
self, name: str, id: str | None = None | ||
) -> GXCloudIdentifier | StringKey: | ||
"""Given a name and optional ID, build the correct key for use in the ValidationConfigStore.""" | ||
if self.cloud_mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but we need to get away from these cloud mode checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It's poorly encapsulated but I couldn't think of a better way to get both file and cloud working
…text.py Co-authored-by: Bill Dirks <bill@greatexpectations.io>
…_expectations into m/v1-178/validation_config_store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I've asked 1 question and added a couple of nits that don't need to be solved now. There is also the outstanding property comment you've made.
@@ -1207,4 +1207,4 @@ def _convert_uuids_to_str(self, data, **kwargs): | |||
return data | |||
|
|||
|
|||
expectationSuiteSchema = ExpectationSuiteSchema() | |||
expectationSuiteSchema: ExpectationSuiteSchema = ExpectationSuiteSchema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This type can be inferred and is the same as the constructor so you don't need this annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weirdly enough I got a mypy error without this (when using it in the custom Pydantic encoder below)
def _validate_suite(cls, v): | ||
if isinstance(v, dict): | ||
return ExpectationSuite(**expectationSuiteSchema.load(v)) | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check if v is an expectationSuiteSchema and raise if it's not in the fall through case? Or does pydantic do this for us since it won't be the right type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic validation will happen after this return - I've updated this to be a bit more explicit but it should work as-is
@@ -133,6 +133,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta): | |||
GXCloudRESTResource.PROFILER: "profiler", | |||
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_doc", | |||
GXCloudRESTResource.VALIDATION_RESULT: "result", | |||
GXCloudRESTResource.VALIDATION_CONFIG: "validation_config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the enum can be modified/restructured if we have to list all these enums and define a new value for them. This makes it appear that our enum/abstraction is wrong.
@@ -157,6 +158,7 @@ class GXCloudStoreBackend(StoreBackend, metaclass=ABCMeta): | |||
GXCloudRESTResource.PROFILER: "profilers", | |||
GXCloudRESTResource.RENDERED_DATA_DOC: "rendered_data_docs", | |||
GXCloudRESTResource.VALIDATION_RESULT: "validation_results", | |||
GXCloudRESTResource.VALIDATION_CONFIG: "validation_configs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here.
…_expectations into m/v1-178/validation_config_store
…_expectations into m/v1-178/validation_config_store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some nits, but this LGTM
@property | ||
def validation_config_store_name(self) -> str: | ||
return DataContextConfigDefaults.DEFAULT_VALIDATION_CONFIG_STORE_NAME.value | ||
|
||
def validaiton_config_store(self) -> ValidationConfigStore: | ||
return self.stores[self.validation_config_store_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually not pro-deviation, but I think in this case it's probably reasonable. It'd require minimal changes if we changed our mind later.
class Config: | ||
arbitrary_types_allowed = True | ||
json_encoders = { | ||
ExpectationSuite: lambda v: expectationSuiteSchema.dump(v), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we should add a comment on the why here in the code.
id: Union[str, None] = None | ||
|
||
@validator("suite", pre=True) | ||
def _validate_suite(cls, v): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above - I think it's worth a docstring explaining this is just because the suite isn't pydantic.
) | ||
|
||
|
||
@pytest.mark.parametrize("store_fixture", ["ephemeral_store", "file_backed_store"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use strings for the stores and then call getfixturevalue? Can't you just reference the fixture directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the fixture won't be properly instantiated unless you do it this way.
Fixtures only get loaded by pytest if you include them in the test constructor.
Although this is a common complaint because it prevents you from using fixtures as part of parametrize
.
There are packages out there like pytest-lazy-fixture
that essentially do this work for you... but you still have to reference it by the string.
https://github.com/tvorog/pytest-lazy-fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
) | ||
if config["class_name"] not in configured_stores: | ||
# Create ephemeral store config | ||
store_configs[name] = {"class_name": config["class_name"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write a warning if we use the default store.
|
||
@pytest.mark.parametrize("store_fixture", ["ephemeral_store", "file_backed_store"]) | ||
@pytest.mark.unit | ||
def test_add(request, store_fixture: str, validation_config: ValidationConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_add(request, store_fixture: str, validation_config: ValidationConfig): | |
def test_add(request: pytest.FixtureRequest, store_fixture: str, validation_config: ValidationConfig): |
Annoting request
will developers understand what's going on.
https://docs.pytest.org/en/stable/reference/reference.html#request
invoke lint
(usesblack
+ruff
)For more information about contributing, see Contribute.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!