New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAINTENANCE] Improve get validator functionality #4661
[MAINTENANCE] Improve get validator functionality #4661
Conversation
…an ExpectationSuite when calling DataContext.get_validator
✅ Deploy Preview for niobium-lead-7998 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
assert my_validator.expectation_suite_name == "default" | ||
|
||
def test_get_validator_with_batch( | ||
empty_data_context, tmp_path_factory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tmp_path_factory
is session-scoped, which can cause state to bleed into other tests when the whole suite is run. I think we've been making a move to tmp_path
for that reason.
@@ -1788,6 +1788,112 @@ def test_get_validator_with_attach_expectation_suite( | |||
) | |||
assert my_validator.expectation_suite_name == "A_expectation_suite" | |||
|
|||
def test_get_validator_without_expectation_suite( | |||
empty_data_context_stats_enabled, tmp_path_factory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want stats enabled here? I think we should refrain from using this fixture here since we don't test analytics.
def test_get_validator_without_expectation_suite( | ||
empty_data_context_stats_enabled, tmp_path_factory | ||
): | ||
context: DataContext = empty_data_context_stats_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using a context and updating the filesystem, could we instantiate a BaseDataContext
like we do here: https://github.com/great-expectations/great_expectations/blob/develop/tests/expectations/test_expectation_arguments.py#L39-L86
An approach like this would remove the need for I/O and improve performance. We might need to do some mocking though happy to dig in further as needed.
"alphanumeric": "some_file", | ||
}, | ||
) | ||
assert type(my_validator.get_expectation_suite()) == ExpectationSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick - Could we use isinstance
just to be consistent? This method accounts for things like inheritance hierarchies and is a bit safer.
def test_get_validator_with_batch( | ||
empty_data_context, tmp_path_factory | ||
): | ||
context = empty_data_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same general note about context - I think we could probably use BaseDataContext
and mock any I/O.
@@ -1759,6 +1759,7 @@ def get_validator( | |||
data_connector_name: Optional[str] = None, | |||
data_asset_name: Optional[str] = None, | |||
*, | |||
batch: Optional[Batch] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abegong I believe that this should be batch_list: List[Batch]
-- do you agree? At least some way to accept list of Batch
objects, even if we must support the passing of a single Batch
object. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For usability sake I think it's important to take a single Batch
as an argument.
We could also add the option to pass a batch_list
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abegong I agree -- and, if possible, would prefer for both to be supported in the same pull request. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abegong Small change requested (happy to discuss) -- I really like what this PR will provide (I believe that we even had a JIRA item for it in the Backlog)! Thanks!
@cdkini , @alexsherstinsky , I've responded to all comments. |
…hub.com:great-expectations/great_expectations into maintenance/improve-get-validator-functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ckpoint' of https://github.com/great-expectations/great_expectations into feature/CLOUD-839/GREAT-767/runtime-data-connector-assets * 'DOCS/DOC-212/how_to_validate_data_with_an_in_memory_checkpoint' of https://github.com/great-expectations/great_expectations: (22 commits) -correction to line reference -corrections to python version of the config reference script - formatted sample scripts with black - updated line numbers in document -updated line numbers for script references - Added example scripts with asserts for testing for use as referenced files in documentation. - Added example scripts to test_script_runner.py Remove bootstrap tests that are no longer needed (#4818) - correct typo in URL in script docstring (#4817) - Update how-to guide content - Update old documentation to point to new how-to guide. Maintenance/check for mostly equals 1 in renderers (#4815) Update tutorial_review.md (#4611) Update README.md (#4595) fix IPython deprecation warning (#4301) [FEATURE] Add support for returning fully-qualified parameters names/values from RuleOutput object (#4773) [MAINTENANCE] Remove unused bootstrap methods that were migrated to ML Flow (#4742) fix: update module_name in NoteBookConfigSchema from v2 path to v3 (#4589) [HACKATHON] ExpectColumnValuesToBeValidTcpPort (#4634) [FEATURE] Introducing RuleState class and RuleOutput class for Rule-Based Profiler in support of richer use cases (such as DataAssistant). (#4704) revert to not raising datasource errors on data context init (#4732) Add checks for mostly=1.0 for all renderers (#4736) Maintenance/improve get validator functionality (#4661) ...
Changes proposed in this pull request:
context.get_validator
without specifying anExpectationSuite
(by name, id, or otherwise.) This behavior was already supported byValidator.__init__
. It just hadn't been enabled forDataContext.get_validator
.Batch
tocontext.get_validator
Both of these additions will improve usability of the top-level API in DataContext.
Each is tested with a single unit test.