Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Enables Regex-Based Column Map Expectations #4315

Merged
merged 29 commits into from
Mar 21, 2022

Conversation

abegong
Copy link
Member

@abegong abegong commented Mar 2, 2022

Changes proposed in this pull request:

  • Creates ColumnMapRegexExpectation with supporting Metric
  • Creates doc, example script, and template to support above

@netlify
Copy link

netlify bot commented Mar 2, 2022

✅ Deploy Preview for niobium-lead-7998 ready!

🔨 Explore the source changes: 05fa40a

🔍 Inspect the deploy log: https://app.netlify.com/sites/niobium-lead-7998/deploys/623903bd2f948300078c2c06

😎 Browse the preview: https://deploy-preview-4315--niobium-lead-7998.netlify.app

@austiezr austiezr marked this pull request as ready for review March 16, 2022 20:35
@austiezr austiezr changed the title Feature/RegexBasedColumnMapExpectations [FEATURE / DOCS] Enables Regex-Based Column Map Expectations Mar 16, 2022
Copy link
Member

@donaldheppner donaldheppner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving sidebar

Copy link
Member Author

@abegong abegong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick, unsolicited review. 😄

return True

# Here your regex is used to create a custom metric for this expectation
map_metric = ColumnMapRegexExpectation._register_metric(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this pattern been arch reviewed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this pattern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to enable a very streamlined development process for ColumnMapRegexExpectations, specifically by not needing to declare a separate Metric class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern works in the sense of "executes (and makes the Expectation executable) without raising errors, failing tests, or producing immediately obvious side effects."

But I wrote it on a weekend without talking to anyone else. I'm not up to speed on the finer points of how our Expectation, Metric, and Renderer registries work, and I may be missing recent additions to our Code Standards.

So before committing to this pattern, I was hoping to get some other eyes on it.

(On the plus side, this is an internal method, so changing it wouldn't be a breaking API change.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. That's really interesting. I don't think we've covered this in arch review, and it looks like we may not get a chance to for a few days. If we want to get this merged in more quickly, maybe we can get some thoughts from @alexsherstinsky ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a chance to review with @alexsherstinsky earlier today; after some experimentation and research post-discussion we're safe to move forward with an updated version of this pattern.

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Love the functionality and the new renderers - this is really cool. A couple of questions, and seconding Abe's feedback as well.

Comment on lines 184 to 189
```python file=../../../../examples/expectations/regex_based_column_map_expectation_template.py#L24-L26
```

with something like this:

```python file=../../../../tests/integration/docusaurus/expectations/creating_custom_expectations/expect_column_values_to_only_contain_vowels.py#L16-L18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code snippets are doing the thing where the first line it outdented, and the new ones are indented. Do we have a way around this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After much experimentation, yet to find a way to do this that isn't hard-coding the blocks into the doc

return True

# Here your regex is used to create a custom metric for this expectation
map_metric = ColumnMapRegexExpectation._register_metric(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this pattern?

@austiezr austiezr requested a review from talagluck March 18, 2022 21:29
Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good! Just a couple of questions. Also, the netlify build seems to have failed, so I can't see the rendered doc site.

Copy link
Contributor

@talagluck talagluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just one additional question which could use a tiny update, but I don't want to gate further here.

This is all that you need to define for now. The `ColumnMapRegexExpectation` class has built-in logic to handle all the machinery of data validation, including standard parameters like `mostly`, generation of Validation Results, etc.

<details>
<summary>Other parameters</summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does using this ColumnMapRegexExpectation class allow for specifying your own Success Keys? If so, you may want to mention this or have a throwaway example here (e.g. you could have an expectation around phone numbers which lets you pass in a flag to include or exclude parentheses and hyphens).

@austiezr austiezr enabled auto-merge (squash) March 21, 2022 23:01
@austiezr austiezr merged commit 7f467f2 into develop Mar 21, 2022
@austiezr austiezr deleted the feature/regex-based-column-map-expectation branch March 21, 2022 23:11
Shinnnyshinshin pushed a commit that referenced this pull request Mar 22, 2022
…ps://github.com/great-expectations/great_expectations into MAINTENANCE/GREAT-575/war-on-test-script-runner

* 'MAINTENANCE/GREAT-575/war-on-test-script-runner' of https://github.com/great-expectations/great_expectations:
  [FEATURE / DOCS] Enables Regex-Based Column Map Expectations (#4315)
  MAINTENANCE] Clean Up Types and Rely on "to_json_dict()" where appropriate (#4489)
  Add missing word and fix wrong dataset reference (#4478)
  [MAINTENANCE] Update expectation filenames to match snake_case of their defined Expectations (#4484)
@fjork3 fjork3 changed the title [FEATURE / DOCS] Enables Regex-Based Column Map Expectations [FEATURE] Enables Regex-Based Column Map Expectations Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants