-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need Help on benchmark function for real data #23
Comments
Hi @ArupDukeBanerjee at the moment SDGym is not intended to be used with your own dataset, but rather to only evaluate and compare the performance of data synthesis methods over a set of well-known datasets. For the scenario that you mention, we are working on a separated package called SDMetrics that will be made public in the upcoming days. |
Hi @csala Thanks, |
@ArupDukeBanerjee Yes, SDGym synthesizers can be used for modeling and sampling your own data, but this is just a secondary effect of having all the synthesizers here implemented with a uniform API. I would rather recommend you to use the CTGAN package, which is simpler to use and will give you better results in the long term, since it is an actively maintained package with ease of use and sampling quality in mind while SDGym's goal is only to provide benchmark. |
Hi Carles,
Thanks a lot for replying to me. I got your point on benchmarking and
also CTGAN is a great package, but while my *data has missing values, it
throws errors*. As a part of realistic data generation missing values is
also something that needs to be handled. Having said that I mean to say I
intend to generate a realistic missing values in my synthetic data, which I
believe is not there in CTGAN package. It would be great if you please let
me know about the resolution of missing data handling.
Thanks a lot!
Regards,
Arup
…On Tue, Jun 23, 2020 at 2:18 PM Carles Sala ***@***.***> wrote:
@ArupDukeBanerjee <https://github.com/ArupDukeBanerjee> Yes, SDGym
synthesizers can be used for modeling and sampling your own data, but this
is just a secondary effect of having all the synthesizers here implemented
with a uniform API.
I would rather recommend you to use the CTGAN <http:///sdv-dev/CTGAN>
package, which is simpler to use and will give you better results in the
long term, since it is an actively maintained package with ease of use and
sampling quality in mind while SDGym's goal is only to provide benchmark.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#23 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIMRW7VF4DUG3EIWPAYBMNTRYBT7NANCNFSM4LQMTUJQ>
.
|
benchmark function requires a my_synthesizer_function which takes input real data, categorical, ordinal features and make output of synthesized data. Though the documentation provided is not sufficient for a novice like me and hence facing issue in implementing and moreover in benchmark function it's showing up that it is taking data from predefined defult_datasets which has its own metdata file stored in server in json format, hence not allowing me to benchmark on my data as I don't have metadata ready for my data sets, there are quite a few and they are large.
so any detailed documentation on how to use this benchmark function more efficiently will be helpful.
Thanks a lot for such a beautiful package.
I am new to this domain
The text was updated successfully, but these errors were encountered: