Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise constraint errors together + misc. #807

Merged
merged 11 commits into from
May 26, 2022
Merged

Conversation

fealho
Copy link
Member

@fealho fealho commented May 19, 2022

Resolve #801.

Note: this PR also fixes several typos and minor fixes.

@codecov-commenter
Copy link

codecov-commenter commented May 19, 2022

Codecov Report

Merging #807 (0d14a47) into master (3632d4c) will decrease coverage by 0.06%.
The diff coverage is 69.23%.

@@            Coverage Diff             @@
##           master     #807      +/-   ##
==========================================
- Coverage   68.05%   67.98%   -0.07%     
==========================================
  Files          38       38              
  Lines        2883     2899      +16     
==========================================
+ Hits         1962     1971       +9     
- Misses        921      928       +7     
Impacted Files Coverage Δ
sdv/metadata/table.py 75.47% <37.50%> (-1.34%) ⬇️
sdv/constraints/base.py 96.13% <81.25%> (-1.46%) ⬇️
sdv/constraints/errors.py 100.00% <100.00%> (ø)
sdv/constraints/tabular.py 98.59% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3632d4c...0d14a47. Read the comment docs.

@@ -236,6 +262,8 @@ def _validate_constraint_columns(self, table_data):
table_data (pandas.DataFrame):
Table data.
"""
self._validate_data_on_constraint(table_data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods: _validate_data_on_constraint and _validate_constraint_columns should be merged in the future, when the fit_columns_model logic gets dropped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these separate for a couple of reasons:

  1. We may end up keeping the columns model
  2. These are actually two different types of validation. One is checking that the data being transformed actually adheres to the constraint. The other is checking if any columns in the constraint are missing and need to be sampled. Let's keep the logic separate. Maybe changing the name of _validate_constraint_columns to _generate_missing_columns would help or something like that

@@ -19,7 +19,9 @@
on the other columns of the table.
* Between: Ensure that the value in one column is always between the values
of two other columns/scalars.
* Rounding: Round a column based on the specified number of digits.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to my changes, we just forgot to add these here when creating the constraints.

@fealho
Copy link
Member Author

fealho commented May 24, 2022

@amontanez24 @npatki What is the expected behavior of a constraint when handling_strategy=reject_sampling? Right now it doesn't go through validation, so if no rows can be sampled it simply returns and empty dataframe.

@npatki
Copy link
Contributor

npatki commented May 24, 2022

@amontanez24 @npatki What is the expected behavior of a constraint when handling_strategy=reject_sampling? Right now it doesn't go through validation, so if no rows can be sampled it simply returns and empty dataframe.

The expected behavior is to handle this the same as conditional sampling with reject sampling. See #809. It would be nice to make these changes in tandem with the constraint changes.

gc = GaussianCopula(constraints=constraints)

err_msg = re.escape(
"\nunsupported operand type(s) for -: 'str' and 'str'"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only implements the "pretty print" for the general case where the data doesn't conform with constraint.is_valid(). If some other error shows up, it will just be printed as usual.

@fealho fealho force-pushed the issue-801-constraints-errors branch from 31f5623 to 6820ece Compare May 24, 2022 17:44
@fealho fealho marked this pull request as ready for review May 24, 2022 18:39
@fealho fealho requested a review from a team as a code owner May 24, 2022 18:39
@fealho fealho requested review from amontanez24 and pvk-developer and removed request for a team May 24, 2022 18:39
sdv/constraints/base.py Outdated Show resolved Hide resolved
@@ -236,6 +262,8 @@ def _validate_constraint_columns(self, table_data):
table_data (pandas.DataFrame):
Table data.
"""
self._validate_data_on_constraint(table_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these separate for a couple of reasons:

  1. We may end up keeping the columns model
  2. These are actually two different types of validation. One is checking that the data being transformed actually adheres to the constraint. The other is checking if any columns in the constraint are missing and need to be sampled. Let's keep the logic separate. Maybe changing the name of _validate_constraint_columns to _generate_missing_columns would help or something like that

sdv/constraints/base.py Show resolved Hide resolved
sdv/constraints/tabular.py Show resolved Hide resolved
sdv/metadata/table.py Show resolved Hide resolved
@fealho fealho requested a review from amontanez24 May 24, 2022 21:17
@@ -582,93 +581,6 @@ def test__transform_constraints_drops_columns(self):
}, index=[0, 1, 2])
assert result.equals(expected_result)

def test__validate_data_on_constraints(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the tests for this method but just move them to the appropriate place

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is almost ready, but we should keep the tests for the _validate_on_constraint method

@fealho fealho requested a review from amontanez24 May 25, 2022 02:52
@fealho
Copy link
Member Author

fealho commented May 25, 2022

I think this is almost ready, but we should keep the tests for the _validate_on_constraint method

@amontanez24 The old tests didn't really match the new code, so I rewrote them a bit.

Copy link
Member

@pvk-developer pvk-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just minor question about the Errors or Error

@@ -3,3 +3,7 @@

class MissingConstraintColumnError(Exception):
"""Error to use when constraint is provided a table with missing columns."""


class MultipleConstraintsErrors(Exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Error instaed of Errors ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Errors is clearer, since it is a list of multiple errors

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fealho fealho merged commit 04cfd65 into master May 26, 2022
@fealho fealho deleted the issue-801-constraints-errors branch May 26, 2022 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message for invalid constraints
5 participants