Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error downloading file from sub-directory in zenodo repository #336

Closed
djmannion opened this issue Jan 28, 2023 · 3 comments · Fixed by #340
Closed

error downloading file from sub-directory in zenodo repository #336

djmannion opened this issue Jan 28, 2023 · 3 comments · Fixed by #340
Labels
bug Report a problem that needs to be fixed

Comments

@djmannion
Copy link

Description of the problem:

Thanks very much for this very useful project!

I'm trying to download a file from the zenodo repository at doi:10.5281/zenodo.7352326 (link). The file name is GreatApeDictionary-v1.1.zip and it seems to be in a Wild-Minds sub-directory in the repository. I receive an error that the file is "not found in data archive" (see below for full error message).

Looking in the pooch/downloaders.py code (v1.6.0), it seems that the files information that is returned by the zenodo API has the filename as Wild-Minds/GreatApeDictionary-v1.1.zip - but the file name that is passed to the downloader is the last item of a split("/") operation and so loses the Wild-Milds part. If I replace the line file_name=parsed_url["path"].split("/")[-1], with file_name="/".join(parsed_url["path"].split("/")[1:]), then it seems to work fine.

I noticed that the code has changed in the current git version, so I tried that also (v1.6.0.post16+gf20fe3a). I get a different error: ValueError: Archive with doi:10.5281/zenodo.7347607/Wild-Minds not found (see https://doi.org/10.5281/zenodo.7347607/Wild-Minds). Is the DOI correct? - so the sub-directory seems to be being included as part of the DOI.

Apologies if I have just missed something - thanks for any assistance.

Full code that generated the error

raw_data_path = pooch.retrieve(
    url="doi:10.5281/zenodo.7347607/Wild-Minds/GreatApeDictionary-v1.1.zip",
    known_hash="md5:8c40d1bce25619548f4daa16d63f36a3",
)

Full error message

Downloading data from 'doi:10.5281/zenodo.7347607/Wild-Minds/GreatApeDictionary-v1.1.zip' to file '/home/damien/.cache/pooch/66afcc98f960ee876a64b8ab89dbec3e-GreatApeDictionary-v1.1.zip'.
Traceback (most recent call last):
  File "/home/damien/venv_study/ape_gestures/code/ape_gestures/ape_gestures/data.py", line 15, in <module>
    get_raw_data_path()
  File "/home/damien/venv_study/ape_gestures/code/ape_gestures/ape_gestures/data.py", line 6, in get_raw_data_path
    raw_data_path = pooch.retrieve(
  File "/home/damien/venv/ape_gestures/lib/python3.9/site-packages/pooch/core.py", line 240, in retrieve
    stream_download(url, full_path, known_hash, downloader, pooch=None)
  File "/home/damien/venv/ape_gestures/lib/python3.9/site-packages/pooch/core.py", line 772, in stream_download
    downloader(url, tmp, pooch)
  File "/home/damien/venv/ape_gestures/lib/python3.9/site-packages/pooch/downloaders.py", line 573, in __call__
    download_url = converters[repository](
  File "/home/damien/venv/ape_gestures/lib/python3.9/site-packages/pooch/downloaders.py", line 633, in zenodo_download_url
    raise ValueError(
ValueError: File 'GreatApeDictionary-v1.1.zip' not found in data archive https://zenodo.org/record/7352326 (doi:10.5281/zenodo.7347607).

System information

  • Operating system: Linux
  • Python installation (Anaconda, system, ETS): System
  • Version of Python: 3.9.16
  • Version of this package: 1.6.0
@djmannion djmannion added the bug Report a problem that needs to be fixed label Jan 28, 2023
@santisoler
Copy link
Member

Hi @djmannion! Thanks for opening this issue.

This is definitely a bug, although a one we haven't anticipated. Zenodo doesn't allow you to upload directories with files inside, only files (in the strict sense of the word). This is why we usually upload zip files if we want our content to have some hierarchy. So, the problem is that for some reason it allowed to upload a file with a forward slash in its name: Wild-Minds/GreatApeDictionary-v1.1.zip doesn't stand for a zip file inside a directory here, but it's the full name of the file.

The JSON that Zenodo offers through its API was helpful to identify this: https://zenodo.org/api/records/7352326 (check under files, the key is the name of that file.

This behaviour is tricky to support. In the current release of Pooch (v1.6.0) we only support downloading through DOI from Zenodo and figshare. In both cases, the doi is formed by a prefix/suffix/filename scheme. But in #318 support for Dataverse was included. Their DOIs can be prefix/another-prefix/suffix/filename, like 10.11588/data/TKCFEF. In both cases, the parse of the url was done by splitting it by the slashes: the last portion was the filename, the rest was part of the doi.

In your case, the filename actually has a slash what breaks the previous workflow, now the url can follow prefix/another-prefix/suffix/file/name.

I've opened a PR to add support to filenames with slashes. I'll ping you there!

@djmannion
Copy link
Author

Hi @santisoler! Thanks very much for looking into it.

That's very interesting about the forward-slash in the filename - I hadn't considered that. Looking at a few other zenodo repositories, it seems to be encountered quite frequently (e.g., https://zenodo.org/record/7524976, https://zenodo.org/record/7517342, https://zenodo.org/record/7581649). A common element seems to be github integration - it seems like data that is formed from a github release has the github username then a forward-slash prepended to the release filename.

@santisoler
Copy link
Member

I see. Well in that case we be sure to fully support it. And it makes sense that these files aren't uploaded through the webapp, but through API interaction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Report a problem that needs to be fixed
Projects
None yet
2 participants