Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Non-negative and Positive constraints across multiple columns #409

Closed
dyuliu opened this issue Apr 20, 2021 · 3 comments · Fixed by #488
Closed

Add Non-negative and Positive constraints across multiple columns #409

dyuliu opened this issue Apr 20, 2021 · 3 comments · Fixed by #488
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@dyuliu
Copy link
Contributor

dyuliu commented Apr 20, 2021

Problem Description

When we are handling single tabular data, it is very common to see some constraints that require the values of some numeric columns to be Positive (>0) and the values of some numeric columns to be non-negative (>=0).

Expected behavior

I would expect some constraint class to be integrated in the library like this:

from sdv.constraints import Positives

multiple_columns_constraint = Positives(
    columns=['col1', 'col2', 'col3'],
    handling_strategy='reject_sampling'
)

Or maybe have even more general design like this:

from sdv.constraints import GreatThan

multiple_columns_constraint = GreatThan(
    columns=['col1', 'col2', 'col3'],
    constant=0,
    handling_strategy='reject_sampling'
)
@dyuliu dyuliu added feature request Request for a new feature pending review labels Apr 20, 2021
@kveerama
Copy link
Contributor

Thanks @dyuliu Both are good options.

@npatki and I were discussing that we can have general design, and perhaps we expose multiple constraints with specific names like ispositive and is-nonnegative but the underlying call will use the general purpose greaterthan. It may give better usability, and interpretability. [minor item]

@npatki
Copy link
Contributor

npatki commented Apr 21, 2021

+1 We'll at least need to support greater_than, greater_than_or_equal_to, less_than, and less_than_or_equal_to as base cases to cover all possibilities. Doing this will allow us to layer additional functionality for ease-of-use. We can prioritize based on usage but I assume we'll at least want to have:

  • is_positive for >0
  • is_non_negative for >=0
  • between(X,Y) for >X and <Y(see Add Between values constraint #367)
  • between_including(X,Y) for >=X and <=Y (may want different phrasing for this)

@Timbimjim
Copy link

Hey, having above mentioned features would be very valuable. Very often datasets contain columns which from an engineering view would always have to be >0 or bigger than any chosen number.

Iam actually solving this at the moment, with just generating a lot of synth data and then filtering the data for rows with values only > 0.

I am not 100% sure if this approach will keep the overall quality of the synth data intact?

@csala csala closed this as completed in #488 Jul 2, 2021
@csala csala added this to the 0.11.0 milestone Jul 2, 2021
@npatki npatki changed the title Non-negative and Positive constraints across multiple columns Add Non-negative and Positive constraints across multiple columns Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants