Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable DOI resolution for some DOIs #947

Closed
Tracked by #1106 ...
choldgraf opened this issue Mar 2, 2024 · 3 comments
Closed
Tracked by #1106 ...

Unreliable DOI resolution for some DOIs #947

choldgraf opened this issue Mar 2, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@choldgraf
Copy link
Member

Description

It seems like DOI resolution is occasionally flaky and returns different results depending on the run. I noticed that locally my build would sometimes hang on building a page with many DOIs, sometimes not. In the built page, a subset of DOIs would have their metadata added as a reference, and others wouldn't be.

Not sure how to reproduce this reliably, but as an example here are two builds from a recent experiment I was running at using the MyST CLI (from this czi report repository).

Here's one run that builds the repository with the action logs linked below:

https://github.com/2i2c-org/report-czi-2021/actions/runs/8118256669/job/22192166630#step:6:19

And a relevant excerpt:

year2.md:20 No bibtex available from doi.org for doi:10.5281/zenodo.7025288
   To resolve this error, visit https://doi.org/10.5281/zenodo.7025288 and add citation info to .bib file
📖 Built year2.md in 1.05 min.
⚠️  year2.md:74 No bibtex available from doi.org for doi:10.5281/zenodo.7287626
   To resolve this error, visit https://doi.org/10.5281/zenodo.7287626 and add citation info to .bib file
🪄  Linked 5 DOIs in 1.07 min for year3.md
⚠️  year3.md:76 No bibtex available from doi.org for doi:10.5281/zenodo.7667299
   To resolve this error, visit https://doi.org/10.5281/zenodo.7667299 and add citation info to .bib file
⚠️  year3.md:143 No bibtex available from doi.org for doi:10.5281/zenodo.8[22](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118256669/job/22192166630#step:6:23)8653
   To resolve this error, visit https://doi.org/10.5281/zenodo.8228653 and add citation info to .bib file
📖 Built year3.md in 1.07 min.
⚠️  year3.md:80 No bibtex available from doi.org for doi:10.5281/zenodo.10081003
   To resolve this error, visit https://doi.org/10.5[28](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118256669/job/22192166630#step:6:29)1/zenodo.10081003 and add citation info to .bib file

And from another run with more-or-less the same content, we get a different subset of DOI resolution errors:

link to run

⚠️  year2.md:61 No bibtex available from doi.org for doi:10.5281/zenodo.7025288
   To resolve this error, visit https://doi.org/10.5281/zenodo.7025288 and add citation info to .bib file
⚠️  year2.md:74 No bibtex available from doi.org for doi:10.5281/zenodo.7287626
📖 Built year2.md in 1.05 min.
   To resolve this error, visit https://doi.org/10.5281/zenodo.7287626 and add citation info to .bib file
🪄  Linked 3 DOIs in 1.07 min for year3.md
⚠️  year3.md:71 No bibtex available from doi.org for doi:10.5281/zenodo.8184298
   To resolve this error, visit https://doi.org/10.5281/zenodo.8184298 and add citation info to .bib file
⚠️  year3.md:73 No bibtex available from doi.org for doi:10.5281/zenodo.7662828
   To resolve this error, visit https://doi.org/10.5281/zenodo.7662828 and add citation info to .bib file
⚠️  year3.md:142 No bibtex available from doi.org for doi:10.5281/zenodo.100756[21](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118323947/job/22192365850#step:6:22)
   To resolve this error, visit https://doi.org/10.5281/zenodo.10075621 and add citation info to .bib file
⚠️  year3.md:143 No bibtex available from doi.org for doi:10.5281/zenodo.8[22](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118323947/job/22192365850#step:6:23)8653
   To resolve this error, visit https://doi.org/10.5281/zenodo.8228653 and add citation info to .bib file
⚠️  year3.md:74 No bibtex available from doi.org for doi:10.5281/zenodo.78922[24](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118323947/job/22192365850#step:6:25)
   To resolve this error, visit https://doi.org/10.5[28](https://github.com/2i2c-org/report-czi-2021/actions/runs/8118323947/job/22192365850#step:6:29)1/zenodo.7892224 and add citation info to .bib file

Proposed solution

Perhaps this could be fixed with either:

  • A more helpful error message to help understand what's going on
  • Some kind of DOI retrying mechanism in case it's just a flaky API thing
  • Documentation about this behavior to understand when it is likely to happen
  • The ability to turn off DOI resolution if this is flaky enough to not be reliable, but people still want to use DOIs (ref Add option to *not* build citations from links to dois #196)
  • Some other way to more reliably scrape this data
@choldgraf choldgraf added the bug Something isn't working label Mar 2, 2024
@rowanc1
Copy link
Member

rowanc1 commented Mar 11, 2024

If it is taking over a minute to resolve requests to doi.org I suspect that their service was down or degraded. Some of those links are now working and respond fast, for example:

curl -L -H 'Accept: application/x-bibtex' https://doi.org/10.5281/zenodo.8184298

We are caching the requests to DOI (as of a few weeks ago), so these are faster for local builds, but that is not persistent across CI builds. We don't currently retry in the build, but could add that logic. doi.org is the canonical way to access this information, so I don't think we should change that.

Have you consistently come across this, or was this a point in time DOI issue with their service being down?

Probably the best way forward is the retry+backoff and raise a more sensible error if links are taking more than a few seconds to resolve (this should be on the order of milliseconds...).

@choldgraf
Copy link
Member Author

I don't yet have the data to know if I've consistently come across this across days, only across hours. So maybe we can use this issue to see if others chime in with "me too" or not.

Maybe an easy thing would be something like:

  1. Try to resolve a DOI
  2. If it times out, try again two more times (or if we can differentiate between "time out" and "not a correct DOI" we might have two different outcomes)
  3. If it still doesn't work, give a message like `We couldn't resolve this DOI, double check that it is correct, or see to determine if the DOI service has an outage"

@rowanc1
Copy link
Member

rowanc1 commented Apr 11, 2024

There are multiple improvements that we have made to the DOI resolution:

This means we are trying doi.org at least twice, with two different content-negotiation strategies, caching successful results, and have updated the error messages (and internal ways we are saving citation data, now in CSL not bibtex strings).

I am going to close this specific issue, feel free to reopen if these DOIs remain troublesome. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants