Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove handling_strategy parameter #833

Closed
amontanez24 opened this issue Jun 5, 2022 · 0 comments · Fixed by #843
Closed

Remove handling_strategy parameter #833

amontanez24 opened this issue Jun 5, 2022 · 0 comments · Fixed by #843
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

As a user, it is sometimes confusing to know which handling_strategy should be used for a constraint. On top of that, if the transform strategy fails, reject_sampling could still work so it should be an automatic fall back.

Expected behavior

  • Remove the handling_strategy parameter from all constraints
  • All constraints should attempt to do the transform strategy, and if that fails because of a MissingConstraintColumnError, then it should do nothing to the data and fallback on reject sampling.
    • Raise the following warning if transforming fails:
    Warning: <constraint name> cannot be transformed because columns [<names>] are not found. Using the reject sampling approach instead.
    

Additional context

  • This change will need to be addressed in a couple of other places besides the constraints themselves.
    • In metadata.table.py, there is a method called _prepare_constraints that orders the constraints and raises an error if any constraints touch the same columns. Instead, those constraints should just be set to use reject_sampling by simply skipping their transformations. These constraints should still run their is_valid check on the data being transformed. This is tricky to handle because if two constraints touch the same columns, then the one that will transform needs to go last meaning we will need a way to know how to skip transformations for certain constraints or to set them to use the identity method.
    • In mtadata.table.py there is a method called _transform_constraints. This method should remove the on_missing_column parameter and just always drop (ie. default to reject sampling).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants