Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug: update Zenodo downloader for new API #373

Closed
wants to merge 4 commits into from
Closed

Conversation

santisoler
Copy link
Member

Update the Zenodo downloader to work with the new Zenodo API. Use the filename key for getting the filenames of all files in a repository. Update the generated download url based on heuristics: the link present in the API reference doesn't work at the moment). Update the method that populates the registry: use the new filename key and specify that the checksums are now md5.

Relevant issues/PRs:

Fixes #371

Update the Zenodo downloader to work with the new Zenodo API. Use the
`filename` key for getting the filenames of all files in a repository.
Update the generated download url based on heuristics: the link present
in the API reference doesn't work at the moment). Update the method that
populates the registry: use the new `filename` key and specify that the
checksums are now md5.
@santisoler
Copy link
Member Author

@khaeru, I think this is ready. All tests are passing now, although I had to rerun a few because Zenodo had some downtime or returned bad requests while running the tests.

Would you mind taking a look at the PR?

@ericpre
Copy link

ericpre commented Oct 17, 2023

Just tested, it does fix #371! :)

@khaeru
Copy link

khaeru commented Oct 18, 2023

Hi @santisoler —thanks for the confirmation and quick action here. I will check the code shortly.

However, I think it is probably worth getting in touch with Zenodo support and explicitly asking them whether the API change is intentional and permanent (case 1 vs. case 2 in the issue I filed). I'm worried about something like this:

  • The API change was actually unintentional; Zenodo staff/devs are not currently aware of it.
  • You release a new version of pooch with this change.
  • Zenodo eventually become aware and they choose to "fix" the API by restoring the old (pre-migration) JSON structures. (They might do this if, for instance, other code besides pooch has been broken by the change, leading to complaints from multiple angles and them feeling this to be the best fix.)
  • Pooch will again be broken, because it now expects the "new" (actually "broken") JSON structures.

Then you'd have to revert the fix or something like that, and again make a new release. Better to confirm the situation first, right?

@dokempf
Copy link
Contributor

dokempf commented Oct 18, 2023

@khaeru This is part of a long term transition plan to the new InvenioRDM system: https://blog.zenodo.org/2022/12/07/2022-12-07-zenodo-on-inveniordm/ that was also announced in a blog post: https://blog.zenodo.org/2023/09/25/2023-09-25-migration-postponed/

@ericpre
Copy link

ericpre commented Oct 18, 2023

Thank you @dokempf for sharing the links. It seems that the API used in this PR is documented at: https://inveniordm.docs.cern.ch/reference/rest_api_drafts_records/#download-a-record-file

@khaeru, would that be enough?

Considering that this API changes is breaking any code using pooch to retrieve zenodo files, I imagine that it will be break many codes and test suites and that it would be appreciated if a fix could be released quickly! Thank you! 😃

@khaeru
Copy link

khaeru commented Oct 18, 2023

Hi @dokempf —per the first sentence at #371, I'm well aware of the migration, although you see that I only link to the one blog post and two help pages that were linked by Zenodo themselves from their header banner. Those omit points and details that are included in the two older posts you link. It doesn't seem like Zenodo have provided all the info relevant to the migration in any single place.

In particular at https://blog.zenodo.org/2022/12/07/2022-12-07-zenodo-on-inveniordm/ the "FAQs" section only increases confusion:

Will your APIs be backward compatibile? [sp]
Yes, we do our outmost to ensure backward compatibility of our APIs, and your integration will continue to work on the new platform as well.

Will I be required to update my API integration?
After Zenodo on InvenioRDM has been launched (Autumn 2023), we will deprecate some of our existing APIs. We will provide a migration period of 1-year for the transition. New features, will only be available on the new API.

Will feature X still work?
Yes, we will provide backward compatiblity [sp] for all existing features.

It seems these do not actually describe what has been done.

Where do I find documentation of your new REST API?
You can find the documentation for our new REST API in the InvenioRDM documentation.

As @ericpre notes, this appears to describe the JSON that the API is now serving. However, scroll to the footer of any page at https://zenodo.org, and click the "REST API" link. This appears to be something different; I guess they have just forgotten to update this?

Again, I would think it is an easy, no-regret step to simply contact Zenodo and ask them what the current status and plans are. But I'm not a maintainer of pooch, so if you all would like to go ahead based on the current guesses, please do.

@santisoler
Copy link
Member Author

santisoler commented Oct 18, 2023

I've just sent a message to Zenodo explaining the issue of the new API breaking backward compatibility and asking if these changes are planned to be permanent or this is just a bug that was created during the migration process. I'm looking forward to their reply and we'll come back to you all with updates as soon as I receive it.

For now, I'm holding this PR to be merged until further notice.

Thanks @ericpre , @dokempf and @khaeru for joining the conversation!

@ericpre
Copy link

ericpre commented Oct 19, 2023

Thank you @santisoler, is there an issue that we can track somewhere?
It would be useful to have a idea of timeline, because this is quite disruptive and we have no way to mitigate the issue by pinning dependencies, etc.

@ericpre ericpre mentioned this pull request Oct 19, 2023
12 tasks
@santisoler
Copy link
Member Author

Thank you @santisoler, is there an issue that we can track somewhere? It would be useful to have a idea of timeline, because this is quite disruptive and we have no way to mitigate the issue by pinning dependencies, etc.

I sent them the message through their Contact form and left my email so they can reply me. I also invited them join the conversation here or in #371.

I noticed some people started opening issues on Zenodo's Github about similar issues with the new API (zenodo/zenodo#2487, zenodo/zenodo#2486). I felt inclined to use their contact form since I don't see they are too responsive on Issues.

@ericpre
Copy link

ericpre commented Oct 19, 2023

Considering the scope of the contact form, it doesn't give me great confidence because it would see a lot of traffic and the chance that it get lost/slow to reply is not negligible - it does have a bug category, though!

I would still suggest to merge and make a release because:

  1. the fix is trivial and ready to go,
  2. worse case scenario, zenodo comes back and says something is wrong, then it can be changed again, not a big deal, as this can be a trivial fix too
  3. Even if they revert it, the new key was already existing before (if I recall correctly), so it should still work
  4. fix broken test suite and people's code

I am to make the case that merging this PR is highly beneficial with minimal because it fixes test suite and people's code -remember that there are no mitigating solution other than changing to other the pooch httpdownloader or stop using pooch, which both would require significantly more work.

@santisoler
Copy link
Member Author

santisoler commented Oct 20, 2023

Hi @ericpre. I get you point and it's a priority for us to fix the Zenodo downloader as soon as possible without compromising the user experience or making decisions that will bite us in the future.

We noticed that Zenodo published another blogpost today describing some other issues they experienced during the migration: https://blog.zenodo.org/2023/10/19/2023-10-19-upgrade-issues/
Apparently they were having issues with upload/download speeds, but they haven't commented on issues about the API. So I think your point is well taken: maybe restoring backward compatibility it's not a top priority at the moment for them so we probably need to fix our downloader instead of waiting for them to make a decision.

Considering this, we just had a discussion in our weekly meeting (you can see the meeting notes) and agreed to wait until Monday 23rd for a reply from Zenodo. If we don't receive a reply by the morning of Tuesday 24th (my morning in PDT) I'll merge a bugfix and make a new release that includes it so we can continue using Pooch for downloading files from Zenodo.

Regarding the bugfix, we decided to take another approach. We want the downloader to be able to interact with both APIs (the "legacy" and the "new" one) and automatically decide which one it should use. We basically need to check if the "key" key is present in each file and decide from that weather to use the old or the new API. This way, we won't be breaking backward compatibility within Pooch, and still be able to interact with the new Zenodo API.

I'll do my best to hit the deadline. I'd love to hear your inputs on this (cc @dokempf @khaeru).

@leouieda
Copy link
Member

Hi all, just to say that I agree with @santisoler's approach here. I've use their contact form in the past and they were very good at answering so it's very possible that they'll respond (unless they got swamped with bug reports like this).

@santisoler
Copy link
Member Author

santisoler commented Oct 20, 2023

As @leouieda pointed out, Zenodo replies quickly! I received the following reply from Jose:

Dear Santiago,

Thank you for your report and sorry for the troubles this is causing you.
We will try to address this issues ASAP. Please, be patience with us as we have a long queue of support this week. We will come back to you as soon as we have news.

Best regards,
Jose

Considering this reply, I'm sticking with the previous plan of waiting for a reply with further details on the API until next Monday, or release a Pooch version with a bugfix by next Tuesday.

@santisoler
Copy link
Member Author

I've received another reply from Zenodo:

Dear Santiago,

Sorry for any inconvenience you may have experienced.

We have deployed some fixes regarding compatibility with the old REST API format.

May I ask you to try again using the API and see if the examples you mentioned are working properly? Do not hesitate to come back to us if you are still experiencing problems with your publishing workflow.

Please let me know if there is anything else I can help you with.

Best regards,

Zenodo Support

After checking out how the new API is working I've noticed that:

  • The key for the filename is still "filename", while the old one was "key".
  • The "checksum" only contains the checksum, without the "md5:" heading piece.
  • The download link under the "self" key is not downloading the file (as it was in the previous API), but now the link in the "download" key works fine for downloading the file. Although I've noticed that in some repositories the link in "download" is not working.

Considering this, I'm planning to merge #375 and make a new release, so we can add support for the new API but still supporting the old one in case Zenodo make changes to fulfill backward compatibility.

@santisoler
Copy link
Member Author

santisoler commented Oct 24, 2023

I'm closing this in favor of #375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: 'key' in ZenodoRepository.download_url() after Zenodo migration
5 participants