Should `ConstraintsNotMetError` be a Warning instead? #595

npatki · 2021-09-17T20:07:29Z

Problem Description

Starting fromv0.12.0, the SDV only allows you fit a model with constraints if all of the input data matches the constraints.

constraint = GreaterThan(low='effective_date', high='due_date')
model = CopulaGAN(constraints=[constraint])

model.fit(my_data)
# Crash (ConstraintsNotMetError) if effective_date > due_date for any single row of my dataset

Expected behavior

It's useful to know whether the input data passes the constraints, but should this really be a hard requirement that all the rows need to pass the constraint?

My expectation: Give me a Warning but continue fitting the data.

The warning can be descriptive. For eg, tell me how many rows aren't passing, or which rows they are
It's ok if the SDV drops the offending rows before modeling

Additional context

There may be legitimate reasons why a few rows of the input data don't match the constraints: some rows in the dataset were manually overridden exceptions, there was a bug in my application, the rows were generated by some legacy system, etc.

In any case, the only recourse I have now is to manually identify & delete the offending rows.

The text was updated successfully, but these errors were encountered:

kveerama · 2021-09-17T20:17:34Z

@npatki it could be interesting to see if we can provide this information before fit is done. So user has an option to make some choices in the fit method. Those options could be to drop or fix and possibly remove the constraint.

npatki · 2021-09-17T21:17:38Z

I like that idea. As a default, it might still make sense to crash so I can figure out what to do with it. I think those are the only 2 options available:

Drop the offending rows. Or rather, have SDV do it for me. Maybe ignore_violations makes more sense
Just remove the constraint. I'd just have to manually get rid of it next time I instantiate the model.

kvrameshreddy · 2021-10-28T10:40:29Z

Hi @npatki,
The function "_validate_data_on_constraints" in table.py file is validating constraints over the entire sample data and if any record fails throwing error. Can we validate like if few rows of the sample satisfies the constraints the model should fit else raise error.
I am trying to change at this like this, I am not sure whether this will work for all the constraints or not. Please review it and suggest any workaround if available.

Thankyou.

npatki · 2021-11-01T16:57:34Z

Hi @kvrameshreddy, it would be helpful if you could describe your use case & the constraint you want to add in more detail.

The intended use for constraints is for strict rules that all of the rows must follow. For various reasons, the input data may have a few exceptions to the rule, which is why I filed this issue.

However, if the real data has many exceptions to the rule, I am curious why it's considered a strict rule you want to add? The goal is to emulate the input data. If a majority of the input data isn't following the rule, is this something you want the synthetic data to do?

kilickursat · 2021-11-01T17:23:36Z

I have taken the following error after ColumnFormula for UniqueConstraints. I have also attached some parts of the data.

npatki · 2021-11-01T17:48:47Z

@kilickursat You may want to try inputting only 1 constraint at a time to see which one is causing the ConstraintsNotMetError. This error indicates that the input data does not follow the rule you specified.

Is it expected that the input has some violations? For now, the easiest workaround will be to delete the offending row(s) in the input data before passing it into the fit function.

kvrameshreddy · 2021-11-01T18:09:50Z

@npatki , my use case is same, if any one record from the input data is not satisfying the constraints resulting in error. can we handle this internally to drop those records and fit the model

kilickursat · 2021-11-02T02:58:23Z

I applied only 1 constraint and but each of them has given the same error. I didn't understand why the data doesn't follow the specified rules.

npatki · 2021-11-02T15:33:58Z

@kilickursat can you check this manually in your input data? It's possible there may be 1-2 rows that are wrong for some reason -- could be due to rounding, misc errors, etc.

One other thing to check: Make sure that you're using the latest version of SDV (v0.12.1). The earlier version had an issue when constraints were applied to columns that could have NaN values.

kilickursat · 2021-11-03T06:19:15Z

@npatki thanks for your all effort. I have checked the file and converted it from excel to CSV. But, in this case, I got a different error, "MissingConstraintColumnError".

import numpy as np

df = pd.read_csv('soil.csv')

from sdv.tabular import CopulaGAN
from sdv.constraints import ColumnFormula


# According to Mohr Columb, the following functions were created using UniqueConstraints of SDV GAN 
def Compressive_Strength(df):
  return df["Effective_AxialStress"]-df["Effective_LateralStress"]

Compressive_Strength_constraint=ColumnFormula(
    column="Compressive_Strength",
    formula=Compressive_Strength,
    handling_strategy="transform"
)


def Effective_AxialStress(df):
  return df["Effective_LateralStress"]+df["Compressive_Strength"]

Effective_AxialStress_constraint=ColumnFormula(
    column="Effective_AxialStress",
    formula=Effective_AxialStress,
    handling_strategy="transform"
)

constraint = [Compressive_Strength_constraint, 
               Effective_AxialStress_constraint]


model = CopulaGAN(constraints=constraint)
model.fit(df)```

MissingConstraintColumnError

npatki · 2021-11-04T15:02:37Z

@kilickursat It seems like you have a circular dependency in your formulas

To compute the Compressive_Strength, you need to know the Effective_AxialStress
To compute the Effective_AxialStress, you need to know the Compressive_Strength

The SDV cannot handle circular dependencies. Is there a way to formulate your constraints in a different way?

kilickursat · 2021-11-04T15:29:01Z

Thanks again. Effective stress and compressive strength are known. I will try to apply different ways to find those parameters. But they link to each other.

npatki · 2022-06-10T18:32:49Z

Update: We are defining constraints as business rules that are true of every row in your dataset. Models cannot be expected to properly learn constraints if the real data does not exhibit the same properties.

For a more user-friendly API we will:

Print out the offending rows whenever there's a ConstraintsNotMetError so that users can easily debug the issue -- see Improve error message for invalid constraints #801
Add documentation about what you can do if you encounter this (remove the constraint or remove the offensive rows)

npatki added feature request Request for a new feature needs discussion labels Sep 17, 2021

kveerama added the feature:constraints Related to inputting rules or business logic label Sep 19, 2021

npatki mentioned this issue Nov 1, 2021

How physics based loss function applied to synthetic data generation? #618

Closed

katxiao added work planned and removed feature request Request for a new feature needs discussion labels Nov 12, 2021

npatki closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `ConstraintsNotMetError` be a Warning instead? #595

Should `ConstraintsNotMetError` be a Warning instead? #595

npatki commented Sep 17, 2021 •

edited

Loading

kveerama commented Sep 17, 2021

npatki commented Sep 17, 2021 •

edited

Loading

kvrameshreddy commented Oct 28, 2021

npatki commented Nov 1, 2021

kilickursat commented Nov 1, 2021

npatki commented Nov 1, 2021

kvrameshreddy commented Nov 1, 2021

kilickursat commented Nov 2, 2021

npatki commented Nov 2, 2021

kilickursat commented Nov 3, 2021

npatki commented Nov 4, 2021

kilickursat commented Nov 4, 2021

npatki commented Jun 10, 2022 •

edited

Loading

Should ConstraintsNotMetError be a Warning instead? #595

Should ConstraintsNotMetError be a Warning instead? #595

Comments

npatki commented Sep 17, 2021 • edited Loading

Problem Description

Expected behavior

Additional context

kveerama commented Sep 17, 2021

npatki commented Sep 17, 2021 • edited Loading

kvrameshreddy commented Oct 28, 2021

npatki commented Nov 1, 2021

kilickursat commented Nov 1, 2021

npatki commented Nov 1, 2021

kvrameshreddy commented Nov 1, 2021

kilickursat commented Nov 2, 2021

npatki commented Nov 2, 2021

kilickursat commented Nov 3, 2021

npatki commented Nov 4, 2021

kilickursat commented Nov 4, 2021

npatki commented Jun 10, 2022 • edited Loading

Should `ConstraintsNotMetError` be a Warning instead? #595

Should `ConstraintsNotMetError` be a Warning instead? #595

npatki commented Sep 17, 2021 •

edited

Loading

npatki commented Sep 17, 2021 •

edited

Loading

npatki commented Jun 10, 2022 •

edited

Loading