Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] FDS persist DataAsset to YAML file immediately on creation #7705

Merged
merged 27 commits into from Apr 25, 2023

Conversation

Kilo59
Copy link
Member

@Kilo59 Kilo59 commented Apr 24, 2023

Changes proposed in this pull request:

  • Call context._save_project_config() after adding a fluent DataAsset
    • This saves the current asset to the great_expectations.yml
  • Call context._save_project_config() after .delete_asset() is called.
  • Changed default Datasource._data_context value to be None instead of undefined.
  • Add custom json encoder for TextClause
  • override PandasDatasource.dict() to prevent serialization of #ephemeral_pandas_asset named assets
  • raise a GxSerializationWarning instead of erroring if asset is unserializable.

Note

This should only affect FileDataContexts.
Update for CloudDataContext will follow.

Definition of Done

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

@Kilo59 Kilo59 self-assigned this Apr 24, 2023
@netlify
Copy link

netlify bot commented Apr 24, 2023

Deploy Preview for niobium-lead-7998 canceled.

Name Link
🔨 Latest commit 637d6e6
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/644814094d20ad00083c5ac3

@github-actions github-actions bot added the core label Apr 24, 2023
@Kilo59 Kilo59 changed the title [FEATURE] Persist asset on creation [FEATURE] FDS persist DataAsset immediately on creation Apr 24, 2023
@ghost
Copy link

ghost commented Apr 24, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@Kilo59 Kilo59 force-pushed the f/lakitu-104/persist-asset-on-add branch from b6ccbb4 to 581f889 Compare April 24, 2023 13:10
@Kilo59 Kilo59 marked this pull request as ready for review April 24, 2023 13:31
tests/datasource/fluent/test_contexts.py Outdated Show resolved Hide resolved
@Kilo59 Kilo59 changed the title [FEATURE] FDS persist DataAsset immediately on creation [FEATURE] FDS persist DataAsset to YAML file immediately on creation Apr 24, 2023
@@ -398,7 +398,7 @@ class Datasource(
assets: MutableSequence[_DataAssetT] = []

# private attrs
_data_context: GXDataContext = pydantic.PrivateAttr()
_data_context: Optional[GXDataContext] = pydantic.PrivateAttr(None)
Copy link
Member Author

@Kilo59 Kilo59 Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the type here to reflect that this isn't guaranteed to be set, and the default value to be None instead of the attribute being undefined.

Generally, it is set but we've experienced some bugs due to it not being set and we also have some unit-tests where it isn't set.

@Kilo59 Kilo59 enabled auto-merge (squash) April 24, 2023 14:09
Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Added 2 small comments.

great_expectations/datasource/fluent/interfaces.py Outdated Show resolved Hide resolved
@@ -545,7 +545,8 @@ def _validate_asset_name(asset_name: Optional[str] = None) -> str:

def _get_validator(self, asset: _PandasDataAsset) -> Validator:
batch_request: BatchRequest = asset.build_batch_request()
return self._data_context.get_validator(batch_request=batch_request)
# TODO: raise error if `_data_context` not set
return self._data_context.get_validator(batch_request=batch_request) # type: ignore[union-attr] # self._data_context must be set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _data_context should be a property or we have a private get_validator method on this datasource that will raise an error for us in a consistent way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea but I'd like to defer implementing that in a different PR.
This PR doesn't make this particular error any more or less likely, it just makes mypy better understand this issue.

@Kilo59 Kilo59 disabled auto-merge April 24, 2023 18:02
Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Should going through the various parts of the codebase (in particular new fluent tests, such as the fixtures for FDS Checkpoint tests) and removing context. _save_context_project_config() -- where it is no longer necessary -- be part of this work? Thanks.

@Kilo59
Copy link
Member Author

Kilo59 commented Apr 25, 2023

@alexsherstinsky I don't see this method being called unnecessarily in our tests.

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Kilo59 Kilo59 enabled auto-merge (squash) April 25, 2023 17:06
@Kilo59 Kilo59 merged commit 59f5133 into develop Apr 25, 2023
47 checks passed
@Kilo59 Kilo59 deleted the f/lakitu-104/persist-asset-on-add branch April 25, 2023 19:03
Shinnnyshinshin added a commit that referenced this pull request Apr 26, 2023
* develop:
  [MAINTENANCE] add warning messages when using CLI to edit an expectaiton suite if fluent datasources are present (#7714)
  [MAINTENANCE] fix get available data assets names for fds (#7723)
  [MAINTENANCE] Minor stylistic cleanup (#7732)
  [DOCS] Update for fluent datasources: Create a new Checkpoint (#7729)
  [MAINTENANCE] Iterate over the regex_pattern characters too in  (#7720)
  [MAINTENANCE] Add CLI warnings when adding a checkpoint with fluent datasources (#7685)
  [DOCS] Update batch glossary docs. (#7726)
  [BUGFIX] Add missing pyspark reference (#7684)
  [FEATURE] FDS persist `DataAsset` to YAML file immediately on creation  (#7705)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants