Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure values fall within the specified range #423

Closed
dyuliu opened this issue May 6, 2021 · 3 comments
Closed

Ensure values fall within the specified range #423

dyuliu opened this issue May 6, 2021 · 3 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@dyuliu
Copy link
Contributor

dyuliu commented May 6, 2021

This is a follow-up question of the following issue:
#200 (comment)

My situation:
I have many columns:

  • they are non-negative
  • they have a lot of 0 value
  • they sometimes have large positive values

My synthetic model often generates negative values for these columns.

My questions:
(1) I am not able to replicate by using CopulaGAN with "gamma" distribution to control the value range. (#200 (comment))

(2) two solutions here are difficult for me.
The first solution is super slow because I have too many columns having this issue.
The second solution is not clear to me, @csala, can you can me the detailed example?

@dyuliu dyuliu added pending review question General question about the software labels May 6, 2021
@MLjungg
Copy link

MLjungg commented May 7, 2021

I have also experienced problems with setting "gamma" manually to ensure the distribution falls in the positive range. An alternative is to set the distribution to "semi_bounded", which allows other distributions to be selected that are defined in the positive range – that has worked for me.

@npatki
Copy link
Contributor

npatki commented May 20, 2021

I'd also recommend learning different distributions if possible. That will ensure your correlations remain undisturbed.

(2) two solutions here are difficult for me.
The second solution is not clear to me, @csala, can you can me the detailed example?

In the second solution, you will write a CustomConstraint will strategically transform and/or reverse_transform the data to fit your constraints.

One possible way to do this would be to "fix" any data that doesn't conform to the constraint.

# retroactively fix the output data to conform to your constraint
def reverse_transform(self, table_data):
    # every value in Col A that is less than 0 gets fixed to be 0; leave rest of the data alone
    table_data.loc[table_data['col_A'] < 0, 'col_A']  = 0

I'm not exactly sure what Carles' suggestion was, but I assume it's a similar strategy where by either transforming the input data and/or fixing the output data, you will ensure constraints are met.

The issue with any of these approaches is that you're affecting any correlations by using non-linear transformations.

@npatki
Copy link
Contributor

npatki commented Jul 7, 2021

Closed in #492 -- by default, the models will respect the min and max values observed in the real data, although you have the ability to reset this as needed.

@npatki npatki closed this as completed Jul 7, 2021
@npatki npatki added feature request Request for a new feature and removed pending review question General question about the software labels Jul 7, 2021
@npatki npatki added this to the 0.11.0 milestone Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants