Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need Help on benchmark function for real data #23

Closed
ArupDukeBanerjee opened this issue Mar 20, 2020 · 4 comments
Closed

Need Help on benchmark function for real data #23

ArupDukeBanerjee opened this issue Mar 20, 2020 · 4 comments
Assignees
Labels
question General question about the software

Comments

@ArupDukeBanerjee
Copy link

ArupDukeBanerjee commented Mar 20, 2020

benchmark function requires a my_synthesizer_function which takes input real data, categorical, ordinal features and make output of synthesized data. Though the documentation provided is not sufficient for a novice like me and hence facing issue in implementing and moreover in benchmark function it's showing up that it is taking data from predefined defult_datasets which has its own metdata file stored in server in json format, hence not allowing me to benchmark on my data as I don't have metadata ready for my data sets, there are quite a few and they are large.

so any detailed documentation on how to use this benchmark function more efficiently will be helpful.
Thanks a lot for such a beautiful package.
I am new to this domain

@csala
Copy link
Contributor

csala commented Mar 20, 2020

Hi @ArupDukeBanerjee at the moment SDGym is not intended to be used with your own dataset, but rather to only evaluate and compare the performance of data synthesis methods over a set of well-known datasets.

For the scenario that you mention, we are working on a separated package called SDMetrics that will be made public in the upcoming days.

@csala csala self-assigned this Mar 20, 2020
@csala csala added the question General question about the software label Mar 20, 2020
@ArupDukeBanerjee
Copy link
Author

Hi @csala
I just wanted to know one thing about this package. Can it be used for only data generation for real data as you already stated benchmark is yet to come, meanwhile can I use/leverage different generators on my own set of data. Thanks a lot in advance!

Thanks,
Arup

@csala
Copy link
Contributor

csala commented Jun 23, 2020

@ArupDukeBanerjee Yes, SDGym synthesizers can be used for modeling and sampling your own data, but this is just a secondary effect of having all the synthesizers here implemented with a uniform API.

I would rather recommend you to use the CTGAN package, which is simpler to use and will give you better results in the long term, since it is an actively maintained package with ease of use and sampling quality in mind while SDGym's goal is only to provide benchmark.

@csala csala closed this as completed Jun 23, 2020
@ArupDukeBanerjee
Copy link
Author

ArupDukeBanerjee commented Jun 23, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software
Projects
None yet
Development

No branches or pull requests

2 participants