New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Fix copying of CheckpointConfig for substitution and printing purposes #3759
Conversation
✔️ Deploy Preview for niobium-lead-7998 ready! 🔨 Explore the source changes: e769faa 🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/61a7a481dd5b66000704eb93 😎 Browse the preview: https://deploy-preview-3759--niobium-lead-7998.netlify.app |
HOWDY! This is your friendly 🤖 CHANGELOG bot 🤖Please don't forget to add a clear and succinct description of your change under the Develop header in ✨ Thank you! ✨ |
@@ -377,13 +327,13 @@ def run( | |||
"template_name": template_name, | |||
"run_name_template": run_name_template, | |||
"expectation_suite_name": expectation_suite_name, | |||
"expectation_suite_ge_cloud_id": expectation_suite_ge_cloud_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this line removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was just moved to the bottom (new line 336)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donaldheppner I simply moved it to a "more logical place" congruent with the order these arguments were supplied on input to the method.
@@ -11,6 +10,9 @@ | |||
from great_expectations.data_context.types.base import ( | |||
CheckpointConfig, | |||
CheckpointConfigSchema, | |||
delete_runtime_parameters_batch_data_references_from_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could/should these not be methods on the config object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donaldheppner In fact, even better: in the next commit, they will be moved to batch.py
core module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alexsherstinsky, if that is the case, could I ask that individual tests for the methods be added in the next commit (or is it PR?)
@@ -207,26 +207,11 @@ def send_email( | |||
# TODO: <Alex>A common utility function should be factored out from DataContext.get_batch_list() for any purpose.</Alex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this TODO still valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donaldheppner Unfortunately, this comment is still valid -- we have not had a chance to do the work. I would like to leave it for now and remove only after we had a chance to go through the backlog formally. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few questions!
if (batch_data is not None) and ( | ||
self.batch_request.get("runtime_parameters") is not None | ||
|
||
def delete_runtime_parameters_batch_data_references_from_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤
val["batch_request"]["runtime_parameters"].pop("batch_data") | ||
|
||
|
||
def restore_runtime_parameters_batch_data_references_into_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤
def __deepcopy__(self, memo): | ||
batch_data_references: Tuple[ | ||
Optional[Any], Optional[List[Any]] | ||
] = get_runtime_parameters_batch_data_references_from_config(config=self) | ||
delete_runtime_parameters_batch_data_references_from_config(config=self) | ||
|
||
cls = self.__class__ | ||
result = cls.__new__(cls) | ||
result._commented_map = CommentedMap() | ||
|
||
memo[id(self)] = result | ||
for key, value in self.to_json_dict().items(): | ||
# noinspection PyArgumentList | ||
value_copy = copy.deepcopy(value, memo) | ||
setattr(result, key, value_copy) | ||
|
||
restore_runtime_parameters_batch_data_references_into_config( | ||
config=self, batch_data_references=batch_data_references | ||
) | ||
restore_runtime_parameters_batch_data_references_into_config( | ||
config=result, batch_data_references=batch_data_references | ||
) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a custom __deepcopy__
not be avoided here? Why isn't this something we can do where required, like we are doing in other places using these new functions in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanFarmer This custom __deepcoppy__
enables us to simplify / delete a lot of code. It fits, because it "knows" how to handle the copying for the (complex) CheckpointConfig
object.
@@ -2337,6 +2337,7 @@ def test_newstyle_checkpoint_config_substitution_nested( | |||
CheckpointConfig( | |||
name="my_nested_checkpoint", | |||
config_version=1, | |||
template_name="my_nested_checkpoint_template_3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a new test to be added for the bug you referenced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanFarmer This test has effectively been added by repairing some existing tests. In addition, the functionality will be fully manifested in a near-future PR that focuses on checkpoint.run
payload instrumentation, which is how the bug was detected in the first place (the batch_data
values were empty in the runtime_parameter
dictionary). Thanks.
|
||
return json_dict | ||
def to_json_dict(self) -> dict: | ||
return checkpointResultSchema.dump(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me aside from resolving Don and Nathan's comments :)
@@ -377,13 +327,13 @@ def run( | |||
"template_name": template_name, | |||
"run_name_template": run_name_template, | |||
"expectation_suite_name": expectation_suite_name, | |||
"expectation_suite_ge_cloud_id": expectation_suite_ge_cloud_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it was just moved to the bottom (new line 336)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this work @alexsherstinsky. Looks good on my end too, with one clarification question
@@ -361,8 +361,8 @@ def _validate_runtime_batch_request_specific_init_parameters( | |||
): | |||
if not (runtime_parameters and (isinstance(runtime_parameters, dict))): | |||
raise TypeError( | |||
f"""The type for runtime_parameters must be a dict object. | |||
The type given is "{str(type(runtime_parameters))}", which is illegal.""" | |||
f"""The runtime_parameters must be a non-empty dict object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful error message. Thank you for this
val["batch_request"]["runtime_parameters"].pop("batch_data") | ||
|
||
|
||
def restore_runtime_parameters_batch_data_references_into_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤
if (batch_data is not None) and ( | ||
self.batch_request.get("runtime_parameters") is not None | ||
|
||
def delete_runtime_parameters_batch_data_references_from_config( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤
@@ -11,6 +10,9 @@ | |||
from great_expectations.data_context.types.base import ( | |||
CheckpointConfig, | |||
CheckpointConfigSchema, | |||
delete_runtime_parameters_batch_data_references_from_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alexsherstinsky, if that is the case, could I ask that individual tests for the methods be added in the next commit (or is it PR?)
effective_batch_request: dict = dict( | ||
**runtime_config_batch_request, **validation_batch_request | ||
) | ||
if "runtime_parameters" in effective_batch_request or isinstance( | ||
validation_batch_request, RuntimeBatchRequest | ||
): | ||
batch_request_class = RuntimeBatchRequest | ||
else: | ||
batch_request_class = BatchRequest | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better!
…NT-22/alexsherstinsky/instrument_runtime_data_connector_for_usage_statistics-2021_11_23-10-overload_deepcopy_and_dump-2021_11_29-11
Tests have been added for new utilities. In addition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a thing of beauty. Just one question about a TODO
.
config=data, batch_data_references=batch_data_references | ||
) | ||
return data | ||
|
||
|
||
class CheckpointConfig(BaseYamlConfig): | ||
# TODO: <Alex>ALEX (does not work yet)</Alex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NathanFarmer We still need those comments -- the idea was good, but it did not work, and we ought to figure out how to make it work -- or discuss it and conclude that the current approach is the best that can be done. Thanks!
…trument_runtime_data_connector_for_usage_statistics-2021_11_23-10-overload_deepcopy_and_dump-2021_11_29-11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a wonderful refactor 😄 LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful work 🙇🏼 thank you @alexsherstinsky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! <3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
] = validations_batch_data_list[idx] | ||
|
||
|
||
def standardize_batch_request_display_ordering( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -40,6 +41,9 @@ def anonymize_batch_request(self, *args, **kwargs) -> Dict[str, List[str]]: | |||
anonymized_batch_request_dict: Optional[ | |||
Union[str, dict] | |||
] = self._anonymize_batch_request_properties(source=batch_request_dict) | |||
anonymized_batch_request_dict = standardize_batch_request_display_ordering( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -59,3 +74,771 @@ def test_serialization_of_spark_df(spark_session): | |||
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) | |||
sdf = spark_session.createDataFrame(df) | |||
assert convert_to_json_serializable(sdf) == {"a": [1, 2, 3], "b": [4, 5, 6]} | |||
|
|||
|
|||
def test_batch_request_deepcopy(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding these tests!
Scope
This work fixes a bug in
CheckpointConfig
serialization, specifically the loss ofbatch_data
as the side effect.The implementation consists of:
CheckpointConfigSchema.dump()
,CheckpointConfig.__deepcopy__()
, andCheckpointConfig.__repr__()
andCheckpointResult.__repr__()
methodsin the way that:
batch_data
values from theCheckpointConfiguration
and saving them.batch_data
values from theCheckpointConfiguration
and cloning/serializing the resulting "safe" configuration object.batch_data
values into the cloned and/or original configuration object.Additional clean up consists of:
get_runtime_batch_request()
Checkpoint utility more elegant.As the result of the above fixes, the impacted modules became simpler, and the amount of code has been reduced.
Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.
Changes proposed in this pull request:
After submitting your PR, CI checks will run and @tiny-tim-bot will check for your CLA signature.
For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.
In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g.
closes #123
).Previous Design Review notes:
Definition of Done
Please delete options that are not relevant.
Thank you for submitting!