-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_dataset fails for #253
Comments
Hi @samgalen, the SDGym documentation website contains a reference to all the features that we support. The get_available_datasets function is listed but load_dataset is not -- meaning, it's not currently a supported feature. Note that we are in the process of cleaning up our library so older, unsupported features may still be present in the code. So we ask that you please bear with us as we clean our repo! BTW -- I'm curious about your use case? We found that loading datasets ad-hoc was not a frequently used feature, as most of our users are directly coming to benchmark synthesizers. If this would be helpful to you, we could track it as a feature request. |
Hi @npatki - Thanks for the response. My use case is that I'm trying to replicate prior work which uses the If there's a way to see that easily in the current version of SDgym, that would be ideal. |
No problem! SDGym uses the SDV library for a majority of the predefined synthesizers. It also reads from the same demo datasets. So one options is to directly pull from the SDV instead of SDGym. It should be automatically installed if you have SDGym already. from sdv.datasets.demo import get_available_demos
from sdv.datasets.demo import download_demo
# get a table of all demos
# this should have the same datasets as what SDGym returns
all_demos = get_available_demos(modality='single_table')
# select a particular dataset name to download
data, metadata = download_demo(
modality='single_table',
dataset_name='fake_hotel_guests'
) For more resources see:
Let me know if you have any more Qs! |
Hi @samgalen, I'm closing this issue off since it has been inactive for some time and we've answered the original question. I've filed a separate feature request in #261 to allow the ability to download and inspect datasets prior to running them in the benchmark. I've also copied over the workaround where you can access the datasets directly from the SDV library. Feel free to reply if there is more to discuss and we can always reopen the issue. Alternatively, we can continue the conversation in the new feature request. |
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
My understanding is that calling the
load_dataset
method should download the demo datasets from your AWS bucket. However, when I run this command, it produces an error that appears to be related to the credentials or lack thereof.Steps to reproduce
I am including the traceback of this for the census dataset, however I've also tested it with adult and credit as well as a few others.
Edit: I should also mention that
get_available_datasets
does work and produces outputThe text was updated successfully, but these errors were encountered: