Ensure values fall within the specified range #423

dyuliu · 2021-05-06T19:43:38Z

This is a follow-up question of the following issue:
#200 (comment)

My situation:
I have many columns:

they are non-negative
they have a lot of 0 value
they sometimes have large positive values

My synthetic model often generates negative values for these columns.

My questions:
(1) I am not able to replicate by using CopulaGAN with "gamma" distribution to control the value range. (#200 (comment))

(2) two solutions here are difficult for me.
The first solution is super slow because I have too many columns having this issue.
The second solution is not clear to me, @csala, can you can me the detailed example?

MLjungg · 2021-05-07T15:52:37Z

I have also experienced problems with setting "gamma" manually to ensure the distribution falls in the positive range. An alternative is to set the distribution to "semi_bounded", which allows other distributions to be selected that are defined in the positive range – that has worked for me.

npatki · 2021-05-20T04:48:38Z

I'd also recommend learning different distributions if possible. That will ensure your correlations remain undisturbed.

(2) two solutions here are difficult for me.
The second solution is not clear to me, @csala, can you can me the detailed example?

In the second solution, you will write a CustomConstraint will strategically transform and/or reverse_transform the data to fit your constraints.

One possible way to do this would be to "fix" any data that doesn't conform to the constraint.

# retroactively fix the output data to conform to your constraint
def reverse_transform(self, table_data):
    # every value in Col A that is less than 0 gets fixed to be 0; leave rest of the data alone
    table_data.loc[table_data['col_A'] < 0, 'col_A']  = 0

I'm not exactly sure what Carles' suggestion was, but I assume it's a similar strategy where by either transforming the input data and/or fixing the output data, you will ensure constraints are met.

The issue with any of these approaches is that you're affecting any correlations by using non-linear transformations.

npatki · 2021-07-07T23:04:49Z

Closed in #492 -- by default, the models will respect the min and max values observed in the real data, although you have the ability to reset this as needed.

dyuliu added pending review question General question about the software labels May 6, 2021

npatki closed this as completed Jul 7, 2021

npatki added feature request Request for a new feature and removed pending review question General question about the software labels Jul 7, 2021

npatki assigned amontanez24 Jul 7, 2021

npatki added this to the 0.11.0 milestone Jul 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure values fall within the specified range #423

Ensure values fall within the specified range #423

dyuliu commented May 6, 2021

MLjungg commented May 7, 2021 •

edited

Loading

npatki commented May 20, 2021

npatki commented Jul 7, 2021

Ensure values fall within the specified range #423

Ensure values fall within the specified range #423

Comments

dyuliu commented May 6, 2021

MLjungg commented May 7, 2021 • edited Loading

npatki commented May 20, 2021

npatki commented Jul 7, 2021

MLjungg commented May 7, 2021 •

edited

Loading