-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubts on the usage of conditional sampling #322
Comments
Hi @tonydp03, Nice to meet you. From looking at the raw CSV of input data, it seems that there is a leading space before every value. So in this case, the value you're conditioning on should be BTW if your project allows for it, I would recommend accessing the CTGAN model through the SDV library. The SDV is a publicly available Python SDK that allows you to generate synthetic data using a variety of synthesizers such as CTGAN. It also providers convenient wrappers for data pre- and post-processing, should you want to modify that. And you can use conditional sampling with it too. Some resources: |
Hi @npatki, thanks for your answer. I simply assumed the test dataset could be used "out-of-the-box" and didn't notice the leading space at the beginning of the column value. I will give it another try, for sure. Thanks for the resources too. For the moment, we were just testing the usage of CTGAN to generate synthetic data, as we were positively impressed by the results shown in the paper. In parallel, we're also testing the usage of the SDV library, as it seems an interesting tool. |
One more thing: is it correct that, in the main.py, the function |
Hi @tonydp03 my apologies for getting this reply so late. The current recommended approach is to use CTGAN via the SDV library as described above. I can answer your usage questions and help you troubleshoot any issues with your project. Unfortunately I'm unable to go through any detailed lines of code with you. Please also note that some code in the repo may be deprecated or unsupported so I would always recommend the docs for the latest supported usage. Thanks and please feel free to file a new issue with additional questions or feature requests. |
Environment details
If you are already running CTGAN, please indicate the following details about the environment in
which you are running it:
Problem description
I'm trying to generate samples from the example dataset
adult.csv
, conditioned on the column "sex" with value "Female". however it doesn't seem to work.What I already tried
I tried to put the value between ".." or '...', tried with other categories/values, but the result doesn't change. The one-hot-vector that is generated only contains zeros.
The command is the following:
(note that the number of epochs is very low just for testing the command and reproducing the error). The traceback is the following:
Any hint? Am I using it wrong?
The text was updated successfully, but these errors were encountered: