Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Improve get validator functionality #4661

Merged
merged 14 commits into from Apr 6, 2022

Conversation

abegong
Copy link
Member

@abegong abegong commented Apr 4, 2022

Changes proposed in this pull request:

  • Makes it possible to use context.get_validator without specifying an ExpectationSuite (by name, id, or otherwise.) This behavior was already supported by Validator.__init__. It just hadn't been enabled for DataContext.get_validator.
  • Makes it possible to pass a single Batch to context.get_validator

Both of these additions will improve usability of the top-level API in DataContext.

Each is tested with a single unit test.

@netlify
Copy link

netlify bot commented Apr 4, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 69e1fd7
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/624d0a89ea88d60009394e12
😎 Deploy Preview https://deploy-preview-4661--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

assert my_validator.expectation_suite_name == "default"

def test_get_validator_with_batch(
empty_data_context, tmp_path_factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_path_factory is session-scoped, which can cause state to bleed into other tests when the whole suite is run. I think we've been making a move to tmp_path for that reason.

@@ -1788,6 +1788,112 @@ def test_get_validator_with_attach_expectation_suite(
)
assert my_validator.expectation_suite_name == "A_expectation_suite"

def test_get_validator_without_expectation_suite(
empty_data_context_stats_enabled, tmp_path_factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want stats enabled here? I think we should refrain from using this fixture here since we don't test analytics.

def test_get_validator_without_expectation_suite(
empty_data_context_stats_enabled, tmp_path_factory
):
context: DataContext = empty_data_context_stats_enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a context and updating the filesystem, could we instantiate a BaseDataContext like we do here: https://github.com/great-expectations/great_expectations/blob/develop/tests/expectations/test_expectation_arguments.py#L39-L86

An approach like this would remove the need for I/O and improve performance. We might need to do some mocking though happy to dig in further as needed.

"alphanumeric": "some_file",
},
)
assert type(my_validator.get_expectation_suite()) == ExpectationSuite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - Could we use isinstance just to be consistent? This method accounts for things like inheritance hierarchies and is a bit safer.

def test_get_validator_with_batch(
empty_data_context, tmp_path_factory
):
context = empty_data_context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general note about context - I think we could probably use BaseDataContext and mock any I/O.

@@ -1759,6 +1759,7 @@ def get_validator(
data_connector_name: Optional[str] = None,
data_asset_name: Optional[str] = None,
*,
batch: Optional[Batch] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abegong I believe that this should be batch_list: List[Batch] -- do you agree? At least some way to accept list of Batch objects, even if we must support the passing of a single Batch object. Thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For usability sake I think it's important to take a single Batch as an argument.

We could also add the option to pass a batch_list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abegong I agree -- and, if possible, would prefer for both to be supported in the same pull request. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abegong Small change requested (happy to discuss) -- I really like what this PR will provide (I believe that we even had a JIRA item for it in the Backlog)! Thanks!

@abegong
Copy link
Member Author

abegong commented Apr 6, 2022

@cdkini , @alexsherstinsky , I've responded to all comments.

…hub.com:great-expectations/great_expectations into maintenance/improve-get-validator-functionality
Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abegong abegong merged commit c1c0dc0 into develop Apr 6, 2022
@abegong abegong deleted the maintenance/improve-get-validator-functionality branch April 6, 2022 11:52
@fjork3 fjork3 changed the title Maintenance/improve get validator functionality [MAINTENANCE] Improve get validator functionality Apr 7, 2022
Shinnnyshinshin pushed a commit that referenced this pull request Apr 7, 2022
…ckpoint' of https://github.com/great-expectations/great_expectations into feature/CLOUD-839/GREAT-767/runtime-data-connector-assets

* 'DOCS/DOC-212/how_to_validate_data_with_an_in_memory_checkpoint' of https://github.com/great-expectations/great_expectations: (22 commits)
  -correction to line reference
  -corrections to python version of the config reference script
  - formatted sample scripts with black - updated line numbers in document
  -updated line numbers for script references
  - Added example scripts with asserts for testing for use as referenced files in documentation. - Added example scripts to test_script_runner.py
  Remove bootstrap tests that are no longer needed (#4818)
  - correct typo in URL in script docstring (#4817)
  - Update how-to guide content - Update old documentation to point to new how-to guide.
  Maintenance/check for mostly equals 1 in renderers (#4815)
  Update tutorial_review.md (#4611)
  Update README.md (#4595)
  fix IPython deprecation warning (#4301)
  [FEATURE] Add support for returning fully-qualified parameters names/values from RuleOutput object (#4773)
  [MAINTENANCE] Remove unused bootstrap methods that were migrated to ML Flow (#4742)
  fix: update module_name in NoteBookConfigSchema from v2 path to v3 (#4589)
  [HACKATHON] ExpectColumnValuesToBeValidTcpPort (#4634)
  [FEATURE] Introducing RuleState class and RuleOutput class for Rule-Based Profiler in support of richer use cases (such as DataAssistant). (#4704)
  revert to not raising datasource errors on data context init (#4732)
  Add checks for mostly=1.0 for all renderers (#4736)
  Maintenance/improve get validator functionality (#4661)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants