-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Enables Regex-Based Column Map Expectations #4315
[FEATURE] Enables Regex-Based Column Map Expectations #4315
Conversation
✅ Deploy Preview for niobium-lead-7998 ready! 🔨 Explore the source changes: 05fa40a 🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/623903bd2f948300078c2c06 😎 Browse the preview: https://deploy-preview-4315--niobium-lead-7998.netlify.app |
…_expectations into feature/regex-based-column-map-expectation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving sidebar
examples/expectations/regex_based_column_map_expectation_template.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick, unsolicited review. 😄
return True | ||
|
||
# Here your regex is used to create a custom metric for this expectation | ||
map_metric = ColumnMapRegexExpectation._register_metric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this pattern been arch reviewed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to enable a very streamlined development process for ColumnMapRegexExpectations
, specifically by not needing to declare a separate Metric
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern works in the sense of "executes (and makes the Expectation executable) without raising errors, failing tests, or producing immediately obvious side effects."
But I wrote it on a weekend without talking to anyone else. I'm not up to speed on the finer points of how our Expectation, Metric, and Renderer registries work, and I may be missing recent additions to our Code Standards.
So before committing to this pattern, I was hoping to get some other eyes on it.
(On the plus side, this is an internal method, so changing it wouldn't be a breaking API change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. That's really interesting. I don't think we've covered this in arch review, and it looks like we may not get a chance to for a few days. If we want to get this merged in more quickly, maybe we can get some thoughts from @alexsherstinsky ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a chance to review with @alexsherstinsky earlier today; after some experimentation and research post-discussion we're safe to move forward with an updated version of this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! Love the functionality and the new renderers - this is really cool. A couple of questions, and seconding Abe's feedback as well.
...rus/expectations/creating_custom_expectations/expect_column_values_to_only_contain_vowels.py
Show resolved
Hide resolved
```python file=../../../../examples/expectations/regex_based_column_map_expectation_template.py#L24-L26 | ||
``` | ||
|
||
with something like this: | ||
|
||
```python file=../../../../tests/integration/docusaurus/expectations/creating_custom_expectations/expect_column_values_to_only_contain_vowels.py#L16-L18 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code snippets are doing the thing where the first line it outdented, and the new ones are indented. Do we have a way around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After much experimentation, yet to find a way to do this that isn't hard-coding the blocks into the doc
return True | ||
|
||
# Here your regex is used to create a custom metric for this expectation | ||
map_metric = ColumnMapRegexExpectation._register_metric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this pattern?
great_expectations/expectations/regex_based_column_map_expectation.py
Outdated
Show resolved
Hide resolved
great_expectations/expectations/regex_based_column_map_expectation.py
Outdated
Show resolved
Hide resolved
…github.com/great-expectations/great_expectations into feature/regex-based-column-map-expectation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good! Just a couple of questions. Also, the netlify build seems to have failed, so I can't see the rendered doc site.
great_expectations/expectations/regex_based_column_map_expectation.py
Outdated
Show resolved
Hide resolved
...rus/expectations/creating_custom_expectations/expect_column_values_to_only_contain_vowels.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one additional question which could use a tiny update, but I don't want to gate further here.
This is all that you need to define for now. The `ColumnMapRegexExpectation` class has built-in logic to handle all the machinery of data validation, including standard parameters like `mostly`, generation of Validation Results, etc. | ||
|
||
<details> | ||
<summary>Other parameters</summary> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using this ColumnMapRegexExpectation
class allow for specifying your own Success Keys? If so, you may want to mention this or have a throwaway example here (e.g. you could have an expectation around phone numbers which lets you pass in a flag to include or exclude parentheses and hyphens).
…ps://github.com/great-expectations/great_expectations into MAINTENANCE/GREAT-575/war-on-test-script-runner * 'MAINTENANCE/GREAT-575/war-on-test-script-runner' of https://github.com/great-expectations/great_expectations: [FEATURE / DOCS] Enables Regex-Based Column Map Expectations (#4315) MAINTENANCE] Clean Up Types and Rely on "to_json_dict()" where appropriate (#4489) Add missing word and fix wrong dataset reference (#4478) [MAINTENANCE] Update expectation filenames to match snake_case of their defined Expectations (#4484)
Changes proposed in this pull request:
ColumnMapRegexExpectation
with supporting Metric