Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of missing/NULL as unexpected in expectation expect_column_values_to_match_regex #5027

Closed
bluinchiostro opened this issue May 5, 2022 · 4 comments
Assignees
Labels
community devrel This item is being addressed by the Developer Relations Team

Comments

@bluinchiostro
Copy link

Running the expectation expect_column_values_to_match_regex verifies that the regex provided as an argument is respected by the elements of the column which, if different from strings, are transformed into strings before executing the expectation.
However, if the column contains missing, the validation result always returns 100% matching. It is understandable that where the data is not there, it cannot conflict with the regex but wanting to open a discussion on validity, it would mean stating that the total of the rows is congruent to the constraint of the regex, while it satisfies the expectation only the percentage of data actually available (not missing).

A further development of the expectation could be implemented in such a way that we can treat the missing as unexpected, specifying it with an additional argument.

The alternative is to manually perform the calculation after executing the checkpoint, subtracting from the number of rows the number of unexpected and the number of missing, but this takes a lot of time for different batches and expenditure of resources.

@austiezr austiezr added community devrel This item is being addressed by the Developer Relations Team labels May 5, 2022
@rishabhsahrawat
Copy link

"It is understandable that where the data is not there, it cannot conflict with the regex"
Is it true in case of Great Expectations or in general?

@github-actions
Copy link
Contributor

Is this issue still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale issues and PRs label Jul 27, 2022
@github-actions github-actions bot removed the stale Stale issues and PRs label Jul 28, 2022
@bluinchiostro
Copy link
Author

Hey @rishabhsahrawat, it's a general consideration but great_expectations assumes that if a field is null, then expectation can't be executed

@AFineDayFor
Copy link
Contributor

LGTM @bluinchiostro, the team shared the same conclusion in #5232 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community devrel This item is being addressed by the Developer Relations Team
Projects
None yet
Development

No branches or pull requests

5 participants