Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to R package #11

Closed
kevinykuo opened this issue Dec 9, 2019 · 7 comments
Closed

Link to R package #11

kevinykuo opened this issue Dec 9, 2019 · 7 comments

Comments

@kevinykuo
Copy link
Contributor

I put together an R package to help broaden the reach of y'all's excellent work at https://github.com/kasaai/ctgan, which also powers a short (insurance) industry-specific application paper. Wanted to see if you'd like to link to the package in the README for the useRs that stumble upon the repo :)

@leix28 @csala

@csala
Copy link
Contributor

csala commented Dec 19, 2019

Great contribution @kevinykuo !

I was just reviewing your project before adding the link to the README and noticed that you are using an old version of the code.

Yesterday we released a new version, with a cleaner Python API and broader support for discrete variables, so I think it would be better if you replaced the current installation and the code copy with a simple ctgan==0.2.0 install from PyPI (use plain pip instead of conda). If you do this you should be able to also remove all the other dependencies (numpy, scipy...), as they will be pulled alongside ctgan by pip.

Also notice that we moved epochs from __init__ to fit, as you suggested, but we kept batch_size in the __init__, as this is something that gets stored in the instance and used later on when sampling. So maybe it would be a good idea if you mirrored the same API structure, so both implementations are consistent with each other?

Also, keep an eye on the releases, as other things will be added soon as well (like the save/load functionality)

@kevinykuo
Copy link
Contributor Author

Yep I have been watching the repo, and will refactor accordingly. There were some hacks I had to make to get it to work with R. Ideally we can just grab ctgan from pip to lessen maintenance burden :)

I'll keep an eye out on new features, and ensure consistency as much as possible to reduce friction for teams working with different APIs. Note that the R package currently supports save/load already, and we need to have a custom implementation there since we also serialize R-specific metadata. This is partly because we also made the design decision to infer discrete columns from column types, which is more Rtistic (by convention, categorical variables are factors or characters.)

@kevinykuo
Copy link
Contributor Author

@csala I was able to cleanly remove the inline hacks and depend directly on the new ctgan release! (see PR kasaai/ctgan#7) :)

The only issue left is a cosmetic one, which I've filed here #14

@kevinykuo
Copy link
Contributor Author

Actually, a slightly less cosmetic issue here: #16 :P

@kevinykuo
Copy link
Contributor Author

@csala do you mind cutting a patch/minor release incorporating the last couple PRs so we can streamline the dependencies for the R package and get it on CRAN? Planning to tell people about SDV/CTGAN at the RStudio developer conference next week :)

@csala
Copy link
Contributor

csala commented Jan 24, 2020

Sure, we'll go for it today @kevinykuo

@csala I was able to cleanly remove the inline hacks and depend directly on the new ctgan release! (see PR kasaai/ctgan#7) :)

The only issue left is a cosmetic one, which I've filed here #14

Great news btw! Thanks for taking care of it!

@csala
Copy link
Contributor

csala commented Jan 27, 2020

Release v0.2.1 is out @kevinykuo !

The link to R package has also been added to the readme, so closing this.

@csala csala closed this as completed Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants