New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] FDS persist DataAsset
to YAML file immediately on creation
#7705
Conversation
✅ Deploy Preview for niobium-lead-7998 canceled.
|
DataAsset
immediately on creation
b6ccbb4
to
581f889
Compare
DataAsset
immediately on creation DataAsset
to YAML file immediately on creation
@@ -398,7 +398,7 @@ class Datasource( | |||
assets: MutableSequence[_DataAssetT] = [] | |||
|
|||
# private attrs | |||
_data_context: GXDataContext = pydantic.PrivateAttr() | |||
_data_context: Optional[GXDataContext] = pydantic.PrivateAttr(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the type here to reflect that this isn't guaranteed to be set, and the default value to be None
instead of the attribute being undefined.
Generally, it is set but we've experienced some bugs due to it not being set and we also have some unit-tests where it isn't set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Added 2 small comments.
@@ -545,7 +545,8 @@ def _validate_asset_name(asset_name: Optional[str] = None) -> str: | |||
|
|||
def _get_validator(self, asset: _PandasDataAsset) -> Validator: | |||
batch_request: BatchRequest = asset.build_batch_request() | |||
return self._data_context.get_validator(batch_request=batch_request) | |||
# TODO: raise error if `_data_context` not set | |||
return self._data_context.get_validator(batch_request=batch_request) # type: ignore[union-attr] # self._data_context must be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe _data_context
should be a property or we have a private get_validator
method on this datasource that will raise an error for us in a consistent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea but I'd like to defer implementing that in a different PR.
This PR doesn't make this particular error any more or less likely, it just makes mypy better understand this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Should going through the various parts of the codebase (in particular new fluent
tests, such as the fixtures for FDS Checkpoint tests) and removing context. _save_context_project_config()
-- where it is no longer necessary -- be part of this work? Thanks.
@alexsherstinsky I don't see this method being called unnecessarily in our tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* develop: [MAINTENANCE] add warning messages when using CLI to edit an expectaiton suite if fluent datasources are present (#7714) [MAINTENANCE] fix get available data assets names for fds (#7723) [MAINTENANCE] Minor stylistic cleanup (#7732) [DOCS] Update for fluent datasources: Create a new Checkpoint (#7729) [MAINTENANCE] Iterate over the regex_pattern characters too in (#7720) [MAINTENANCE] Add CLI warnings when adding a checkpoint with fluent datasources (#7685) [DOCS] Update batch glossary docs. (#7726) [BUGFIX] Add missing pyspark reference (#7684) [FEATURE] FDS persist `DataAsset` to YAML file immediately on creation (#7705)
Changes proposed in this pull request:
context._save_project_config()
after adding a fluentDataAsset
great_expectations.yml
context._save_project_config()
after.delete_asset()
is called.Datasource._data_context
value to beNone
instead of undefined.TextClause
PandasDatasource.dict()
to prevent serialization of#ephemeral_pandas_asset
named assetsGxSerializationWarning
instead of erroring if asset is unserializable.Note
This should only affect
FileDataContexts
.Update for
CloudDataContext
will follow.Definition of Done