Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"One or several metadata. were found, but not in the same directory or in a parent directory" #5193

Closed
lambda-science opened this issue Nov 2, 2022 · 5 comments

Comments

@lambda-science
Copy link

Describe the bug

When loading my own dataset, on loading it I get an error.
Here is my dataset link: https://huggingface.co/datasets/corentinm7/MyoQuant-SDH-Data
And the error after loading with:

from datasets import load_dataset

load_dataset("corentinm7/MyoQuant-SDH-Data")
Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.34k/3.34k [00:00<00:00, 4.45MB/s]
Using custom data configuration SDH_16k-53e7301a92ab0025
Downloading and preparing dataset None/SDH_16k to /home/corentin/.cache/huggingface/datasets/corentinm7___imagefolder/SDH_16k-53e7301a92ab0025/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f...
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.28M/3.28M [00:00<00:00, 4.31MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.75s/it]
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.13G/1.13G [00:15<00:00, 74.3MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:16<00:00, 16.09s/it]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.16s/it]
Traceback (most recent call last):                                                                                                                                                                                                                                
  File "<stdin>", line 1, in <module>
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/load.py", line 1742, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/builder.py", line 814, in download_and_prepare
    self._download_and_prepare(
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1423, in _download_and_prepare
    super()._download_and_prepare(
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/builder.py", line 905, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/builder.py", line 1374, in _prepare_split
    for key, record in logging.tqdm(
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/corentin/code-project/hugging_face_play/.venv/lib/python3.10/site-packages/datasets/packaged_modules/folder_based_builder/folder_based_builder.py", line 394, in _generate_examples
    raise ValueError(
ValueError: One or several metadata. were found, but not in the same directory or in a parent directory of /home/corentin/.cache/huggingface/datasets/downloads/extracted/60c4aa8d4da3065bb3d310de4373dffd73bd4dc331aedcb4ee867febe4fdb7cd/validation/sick/2_CG_SDH_TAM_Bin1cKO_ko_pla_4_1640.tif.

However the test command is working fine. datasets-cli test hugging_face_play/ds_test/SDH_16k.py --save_info --all_configs --force_redownload

Using custom data configuration SDH_16k
Testing builder 'SDH_16k' (1/1)
Downloading and preparing dataset sdh_16k/SDH_16k to /home/corentin/.cache/huggingface/datasets/sdh_16k/SDH_16k/1.0.0/21b584239a638aeeda33cba1ac2ca4869d48e4b4f20fb22274d5a5ddc487659d...
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.13G/1.13G [00:14<00:00, 76.5MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:15<00:00, 15.66s/it]
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.28M/3.28M [00:02<00:00, 1.44MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.21s/it]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 11586.48it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:13<00:00, 13.42s/it]
Dataset sdh_16k downloaded and prepared to /home/corentin/.cache/huggingface/datasets/sdh_16k/SDH_16k/1.0.0/21b584239a638aeeda33cba1ac2ca4869d48e4b4f20fb22274d5a5ddc487659d. Subsequent calls will reuse this data.                                              
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 605.27it/s]
Dataset card saved at hugging_face_play/ds_test/README.md
Test successful.

Steps to reproduce the bug

Simply run on python

from datasets import load_dataset

load_dataset("corentinm7/MyoQuant-SDH-Data")

Expected behavior

As the test command worked, this error should not appear

Environment info

  • datasets version: 2.6.1
  • Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
  • Python version: 3.10.6
  • PyArrow version: 10.0.0
  • Pandas version: 1.5.1
@lambda-science
Copy link
Author

Also unrelated but still: https://huggingface.co/docs/datasets/image_dataset#generate-the-dataset
If your loading script passed the test, you should now have a dataset_infos.json file in your dataset folder.
It's not the case anymore as it's now in the readme.md, it was confusing to me

@lambda-science
Copy link
Author

And here is my data loader script: https://huggingface.co/datasets/corentinm7/MyoQuant-SDH-Data/blob/main/SDH_16k.py
I have one file archive to download that contains the images for all splits and one metadata.jsonl to download that contains the informations about what image goes into what split.

@polinaeterna
Copy link
Contributor

Hi @lambda-science! It seems that your repo is recognized as a packaged module ImageFolder, not as a dataset with the custom loading script, because loader looks for a script that has the same name as the dataset repo. So please try to rename your script to MyoQuant-SDH-Data.py, this should help.

@lambda-science
Copy link
Author

Hi @lambda-science! It seems that your repo is recognized as a packaged module ImageFolder, not as a dataset with the custom loading script, because loader looks for a script that has the same name as the dataset repo. So please try to rename your script to MyoQuant-SDH-Data.py, this should help.

Hi !

Thank you for your answer. That was... embarrassingly easy, sorry for this issue, everything is fixed now !

Have a nice day ! :)

@polinaeterna
Copy link
Contributor

@lambda-science that's not embarrassing at all! it's actually not clear from the documentation that the script should have the same name, so thank you for the issue, we'll add this information to the docs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants