Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add DataContext.update_datasource CRUD method #5417

Merged

Conversation

cdkini
Copy link
Member

@cdkini cdkini commented Jun 30, 2022

Changes proposed in this pull request:

  • New update_datasource method to enable updating of existing datasources from notebooks
  • Updated tests for DatasourceStore

Definition of Done

Please delete options that are not relevant.

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

@netlify
Copy link

netlify bot commented Jun 30, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit b0c9e4b
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/62bf3387508fee000835b1c0
😎 Deploy Preview https://deploy-preview-5417--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Jun 30, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@cdkini cdkini self-assigned this Jun 30, 2022
@@ -1913,6 +1913,22 @@ def _instantiate_datasource_from_config(
)
return datasource

def update_datasource(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there is more to do here, and that maybe we don't want to pass a config object here but a datasource one. If someone grabs a context with a datasource, makes changes, how do they save it? Other methods work to maintain the cached datasource collection, is that same effort needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! I've modified the signature to take in an actual datasource.

Additionally, I've made it such that BaseDataContext doesn't persist by default and DataContext does. The cache has also been updated.

Let me know if this is closer to what you're envisioning.

Copy link
Contributor

@billdirks billdirks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Args:
datasource_name: The name of the Datasource to update.
datasource_config: The config object to persist using the DatasourceStore.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add a Raises section here or do you not want to bake that into the API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question! I'm not sure what is a best practice here. I've leaned towards only adding Raises if an exception is raised within the top level scope of the function body.

Any preference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small comment about this made in the Google python style guide:
https://google.github.io/styleguide/pyguide.html#doc-function-raises

My reading of it is If you want to codify that raising this exception is part of this functions API and callers should be handling it then document it, otherwise don't. I'm not sure where/how this is planning on being used.

datasource_name=datasource_name, datasource_config=updated_datasource_config
)

key: DataContextVariableKey = DataContextVariableKey(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to remove the annotations on key and actual_config since you instantiate/cast it in the same line and mypy will infer this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! We've been leaning towards extra verbosity but this is quite obvious when we see the constructor call in the same line. Updating now.

resource_name=datasource_name,
)
actual_config: DatasourceConfig = cast(
DatasourceConfig, datasource_store_with_single_datasource.get(key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked and the backing store for this config is a dict and not typed 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh yeah our typing needs some work - lots of bags of data that could use structure. We've been incrementally adding dataclasses to provide some of that type safety but it's a work in progress.

@cdkini cdkini enabled auto-merge (squash) July 1, 2022 18:04
@cdkini cdkini merged commit 5f80232 into develop Jul 1, 2022
@cdkini cdkini deleted the feature/great-888/great-1037/DataContext.update_datasource branch July 1, 2022 19:01
Args:
datasource_name: The name of the Datasource to update.
datasource_config: The config object to persist using the DatasourceStore.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small comment about this made in the Google python style guide:
https://google.github.io/styleguide/pyguide.html#doc-function-raises

My reading of it is If you want to codify that raising this exception is part of this functions API and callers should be handling it then document it, otherwise don't. I'm not sure where/how this is planning on being used.

def update_datasource(
self,
datasource_name: str,
datasource: Union[LegacyDatasource, BaseDatasource],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasource has a name property so passing datasource_name now seems redundant.

def update_datasource(
self,
datasource_name: str,
datasource: Union[LegacyDatasource, BaseDatasource],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here about datasource_name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants