You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it would be useful to be able to access demo data that I could use to test the SDV out with in a Sandbox like environment.
We want to have functionality similar to the current demo module that allows users to see available demo datasets, and download them.
Acceptance criteria
Add a file called demo.py to the datasets module
Add a function called download_demo
The function should have the following parameters:
(required) modality: One of: 'single_table', 'multi_table', 'sequential'
(required) dataset_name: A string with the name of a dataset
output_folder_name: The name of the local folder where the metadata and data should be stored
(default) None: Do not save the data locally. Just load it as Python objects.
'': Create a subfolder in the desired location to store the data. Note: only store metadata_v1.json and not metadata_v0.json.
Returns tuple of (data, metadata)
data:
(single table and sequential) A pandas DataFrame object
(multi table) A dictionary mapping table name (string) to a pandas DataFrame object
metadata
(single table and sequential) A SingleTableMetadata object
(multi table) A MultiTableMetadata object
Errors
If dataset name isn't provided: Error: Missing required parameter 'dataset_name'.
If the dataset name exists in our bucket but under a different modality Error: Dataset name '<name>' is a <modality> dataset. Use 'load_<modality>_demo' to load this dataset.
for eg. Error: Dataset name 'heart_rate' is a sequential dataset. Use 'load_sequential_demo' to load this dataset.
If the dataset name doesn't exist in our bucket: Error: Invalid dataset name 'dataset_1'. Use 'list_available_demos' to get a list of demo datasets.
If there is already a folder that exists Error: Folder 'my_datasets/student_placements/' already exists. Please specify a different name or use 'load_from_csv' to load from an existing folder.
We should no longer be using the bucket that is currently used to get demo data. Instead we want to switch to the new demo data bucket. The new bucket has a designated folder for each modality which should make downloading the correct data easier.
We also no longer want to save the csv every time and load data from there. Instead, we should download the data directly from the bucket and obtain the DataFrame from the bits that S3 returns.
The text was updated successfully, but these errors were encountered:
@npatki I believe the error messages are outdated now. I think the error conditions should probably have some changes.
In the case where the dataset name exists in our bucket but under a different modality, I think it would be weird to raise an error and tell them to change the modality they pass. We should either raise a warning and return it anyway, or just crash and say perhaps it is in a different modality. To actually confirm that it is in a different folder but not return it seems strange.
If the dataset name doesn't exist in our bucket: Error: Invalid dataset name 'dataset_1'. Use 'list_available_demos' to get a list of demo datasets.
In this case, list_available_demos should be get_available_demos
If there is already a folder that exists Error: Folder 'my_datasets/student_placements/' already exists. Please specify a different name or use 'load_from_csv' to load from an existing folder.
In this case load_from_csv should be load_csvs
Problem Description
As a user, it would be useful to be able to access demo data that I could use to test the SDV out with in a Sandbox like environment.
We want to have functionality similar to the current demo module that allows users to see available demo datasets, and download them.
Acceptance criteria
demo.py
to thedatasets
moduledownload_demo
modality
: One of: 'single_table', 'multi_table', 'sequential'dataset_name
: A string with the name of a datasetoutput_folder_name
: The name of the local folder where the metadata and data should be storedSingleTableMetadata
objectMultiTableMetadata
objectError: Missing required parameter 'dataset_name'.
Error: Dataset name '<name>' is a <modality> dataset. Use 'load_<modality>_demo' to load this dataset.
for eg.
Error: Dataset name 'heart_rate' is a sequential dataset. Use 'load_sequential_demo' to load this dataset.
Error: Invalid dataset name 'dataset_1'. Use 'list_available_demos' to get a list of demo datasets.
Error: Folder 'my_datasets/student_placements/' already exists. Please specify a different name or use 'load_from_csv' to load from an existing folder.
Expected behavior
Additional context
The text was updated successfully, but these errors were encountered: