Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose log_frequency parameter for conditional sampling #20

Merged
merged 3 commits into from Jan 16, 2020

Conversation

kevinykuo
Copy link
Contributor

This change adds a parameter, log_frequency, to the fit() method of CTGANSynthesizer to allow users to specify whether they want to use log frequency of categorical levels for sampling. The parameter defaults to True which is the current behavior so existing code is not affected. A new unit tests is included to ensure expected behavior when this flag is on/off.

Closes #16.

@codecov-io
Copy link

codecov-io commented Jan 10, 2020

Codecov Report

Merging #20 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
+ Coverage   80.59%   80.63%   +0.03%     
==========================================
  Files           9        9              
  Lines         500      501       +1     
==========================================
+ Hits          403      404       +1     
  Misses         97       97
Impacted Files Coverage Δ
ctgan/conditional.py 94.93% <100%> (+0.06%) ⬆️
ctgan/synthesizer.py 92.8% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c31d3ef...06a7033. Read the comment docs.

Copy link
Contributor

@csala csala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @leix28 ?

@kevinykuo
Copy link
Contributor Author

@leix28 could you please take a look? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider adding option to sample from true data frequency instead of logged frequency
4 participants