New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] V1 Checkpoint #9590
[FEATURE] V1 Checkpoint #9590
Conversation
✅ Deploy Preview for niobium-lead-7998 canceled.
|
…_expectations into f/v1-225/v1_checkpoint
…_expectations into f/v1-225/v1_checkpoint
…_expectations into f/v1-225/v1_checkpoint
@@ -1110,6 +1111,16 @@ def render(self) -> None: | |||
) | |||
) | |||
|
|||
def serialize(self) -> _IdentifierBundle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved any encoding logic to the actual class being serialized (move from _encode_suite
in validation config to serialize
as instance method on suite).
I don't love exposing these publicly but how do we get around this? Do we want custom serializer classes to encapsulate this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only serializes this object when it is used in another object right? I would call this serialize_identifier_bundle()
or just identifier_bundler()
. This name should be uniform across any object that needs to do this. I think this public since whatever object serializes this expects this method to exist. We could formalize it as an internal protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for suites and validation definitions but renderers and batch definitions have different outputs (_EncodedValidationData
and class_name/module_name
, respectively).
from typing import Union | ||
|
||
from great_expectations.compatibility.pydantic import ( | ||
BaseModel, | ||
) | ||
|
||
|
||
class _IdentifierBundle(BaseModel): | ||
name: str | ||
id: Union[str, None] | ||
|
||
|
||
class _EncodedValidationData(BaseModel): | ||
datasource: _IdentifierBundle | ||
asset: _IdentifierBundle | ||
batch_definition: _IdentifierBundle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared space with intermediate serialization models
…_expectations into f/v1-225/v1_checkpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this!
I've only looked at the first 2 files. I need to run an errand and I'll look more when I return.
I've left a few comments, mostly around naming.
""" | ||
|
||
name: str | ||
validations: List[ValidationConfig] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a list of ValidationDefinitions
instead of config objects? That's what this shows: https://greatexpectations.atlassian.net/wiki/spaces/SUP/pages/917471267/Validation+Workflows
Or does the ValidationConfig need to be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think ValidationConfig does need to be renamed. That can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be renamed!
f"Unable to retrieve validation config {id_bundle} from store" | ||
) | ||
|
||
if validation_config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this if
? With the try/except above, it looks like this will need to be defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did it to appease mypy
but updated to be a bit clearer
@@ -47,3 +48,22 @@ def build_batch_request( | |||
|
|||
def save(self) -> None: | |||
self.data_asset._save_batch_config(self) | |||
|
|||
def serialize(self) -> _EncodedValidationData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect a method called serialize
on an object to serialize the object but this looks like it serializes the validation definition? Could you update the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to serialize_validation_definition
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at tests after my next meeting. Another couple name comments. Thanks!
@@ -1110,6 +1111,16 @@ def render(self) -> None: | |||
) | |||
) | |||
|
|||
def serialize(self) -> _IdentifierBundle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only serializes this object when it is used in another object right? I would call this serialize_identifier_bundle()
or just identifier_bundler()
. This name should be uniform across any object that needs to do this. I think this public since whatever object serializes this expects this method to exist. We could formalize it as an internal protocol.
@@ -302,3 +264,16 @@ def _get_expectation_suite_and_validation_result_ids( | |||
run_id=run_id, | |||
) | |||
return expectation_suite_identifier, validation_result_id | |||
|
|||
def serialize(self) -> _IdentifierBundle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before. I would rename to identifer_bundle()
.
self, serialized_checkpoint: str, expected_error: str | ||
): | ||
with pytest.raises(ValidationError) as e: | ||
Checkpoint.parse_obj(serialized_checkpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between parse_raw
used above and parse_obj
used here? I was expected the same deserialization call in both places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_raw
is on a string and parse_obj
is on a dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
The V1 Checkpoint is a significant simplification of the v0.18 model. It now only takes two primary input params (validation definitions and actions) and utilizes Pydantic.
The complexity within this PR stems from the need for custom serialization logic; due to the shared usage of models like batch definitions and expectation suites, we need to ensure we denormalize our data to obtain a single source of truth about an object's state. We do this utilizing our new pattern of serializing directly into identifiers (which can be used to index into the appropriate store to retrieve the object in its accurate state).
invoke lint
(usesruff format
+ruff check
)For more information about contributing, see Contribute.
After you submit your PR, keep the page open and monitor the statuses of the various checks made by our continuous integration process at the bottom of the page. Please fix any issues that come up and reach out on Slack if you need help. Thanks for contributing!