You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a follow-up question of the following issue: #200 (comment)
My situation:
I have many columns:
they are non-negative
they have a lot of 0 value
they sometimes have large positive values
My synthetic model often generates negative values for these columns.
My questions:
(1) I am not able to replicate by using CopulaGAN with "gamma" distribution to control the value range. (#200 (comment))
(2) two solutions here are difficult for me.
The first solution is super slow because I have too many columns having this issue.
The second solution is not clear to me, @csala, can you can me the detailed example?
The text was updated successfully, but these errors were encountered:
I have also experienced problems with setting "gamma" manually to ensure the distribution falls in the positive range. An alternative is to set the distribution to "semi_bounded", which allows other distributions to be selected that are defined in the positive range – that has worked for me.
I'd also recommend learning different distributions if possible. That will ensure your correlations remain undisturbed.
(2) two solutions here are difficult for me.
The second solution is not clear to me, @csala, can you can me the detailed example?
In the second solution, you will write a CustomConstraint will strategically transform and/or reverse_transform the data to fit your constraints.
One possible way to do this would be to "fix" any data that doesn't conform to the constraint.
# retroactively fix the output data to conform to your constraintdefreverse_transform(self, table_data):
# every value in Col A that is less than 0 gets fixed to be 0; leave rest of the data alonetable_data.loc[table_data['col_A'] <0, 'col_A'] =0
I'm not exactly sure what Carles' suggestion was, but I assume it's a similar strategy where by either transforming the input data and/or fixing the output data, you will ensure constraints are met.
The issue with any of these approaches is that you're affecting any correlations by using non-linear transformations.
Closed in #492 -- by default, the models will respect the min and max values observed in the real data, although you have the ability to reset this as needed.
This is a follow-up question of the following issue:
#200 (comment)
My situation:
I have many columns:
My synthetic model often generates negative values for these columns.
My questions:
(1) I am not able to replicate by using CopulaGAN with "gamma" distribution to control the value range. (#200 (comment))
(2) two solutions here are difficult for me.
The first solution is super slow because I have too many columns having this issue.
The second solution is not clear to me, @csala, can you can me the detailed example?
The text was updated successfully, but these errors were encountered: