Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retrieval of Metrologia entries from IOP #2

Closed
ronaldtse opened this issue Sep 21, 2020 · 25 comments
Closed

Support retrieval of Metrologia entries from IOP #2

ronaldtse opened this issue Sep 21, 2020 · 25 comments
Assignees
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

No description provided.

@ronaldtse ronaldtse transferred this issue from relaton/relaton-data-bipm Sep 21, 2020
@ronaldtse ronaldtse changed the title Consider supporting Metrologia from IOP Support retrieval of Metrologia entries from IOP Jan 3, 2021
@ronaldtse ronaldtse added the enhancement New feature or request label Jan 3, 2021
@ronaldtse
Copy link
Contributor Author

e.g. DOI https://doi.org/10.1088/1681-7575/aa7b3f

[9] Flowers-Jacobs N-E, Pollarolo A, Coakley J J, Fox A E, Rogalla H, Tew W L and Benz S P 2017 A Boltzmann constant determination based on Johnson noise thermometry Metrologia 54, 730-737 (8 pp)

Gets forwarded to https://iopscience.iop.org/article/10.1088/1681-7575/aa7b3f

Screen Shot 2021-01-03 at 9 02 14 PM

@ronaldtse
Copy link
Contributor Author

We have many citations to Metrologia in Metanorma due to handling of BIPM documents. We need to support citation of Metrologia articles. If necessary we will need to update the Relaton BIPM bibdata model (or establish a new one for academic articles).

@andrew2net
Copy link
Contributor

@ronaldtse I'm unable to find Metrologia articles index or any search form. Do you have an idea of how to search Metrologia articles?

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Jan 6, 2021

I think we should have a syntax that fetches per:

  • "volume"
  • "issue"
  • "page number" to identify the article

The main Metrologia page is this:
https://iopscience.iop.org/journal/0026-1394
Screen Shot 2021-01-07 at 4 33 23 AM

Volumes:
Screen Shot 2021-01-07 at 4 33 39 AM

Issues in that Volume:
https://iopscience.iop.org/volume/0026-1394/29
Screen Shot 2021-01-07 at 4 34 02 AM

The first issue in that list: https://iopscience.iop.org/issue/0026-1394/29/6

Screen Shot 2021-01-07 at 4 34 23 AM

The first paper/article in that issue:
https://iopscience.iop.org/article/10.1088/0026-1394/29/6/001
Screen Shot 2021-01-07 at 4 35 00 AM

Notice the citation writes this:
"E C Morris 1993 Metrologia 29 373"

I believe we can have two types of searches for this article:

  1. Citation locator string: "Metrologia 29 6 373" or "Metrologia 29 373" => this article.
  2. Cite by DOI: either one should lead to this article

For citing the Issue, we can do:

  1. Citation locator string: "Metrologia 29 6" => this Issue.
  2. Cite by DOI: either one should lead to this Issue

For citing the Volume, we can do:

  1. Citation locator string: "Metrologia 29" => this Volume.
  2. Cite by DOI: either one should lead to this Volume

For citing the Series, we can do:

  1. Citation locator string: "Metrologia" => the full series.
  2. Cite by DOI: either one should lead to this series

@ronaldtse
Copy link
Contributor Author

@andrew2net
Copy link
Contributor

@ronaldtse I can guess how to map an article to BibModel but I have no idea how to map Issue, Volume, and Series. Do you have a suggestion?

andrew2net added a commit that referenced this issue Jan 8, 2021
@opoudjis
Copy link

Volume = seriees/number
Issue = series/partnumber
Series = series/title

@andrew2net
Copy link
Contributor

@opoudjis as I understand Ronald means Volume, Issue, and Series to be separated documents. I'm asking what data can we map from the Volume, Issue, and Series pages to the BibliographicItem model?

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Jan 14, 2021

@andrew2net reports he has encountered rate-limiting via Captcha after several fetches. This is not appropriate for users who compile documents. I don't know whether it is surmountable using User-Agent (please try).

Will also seek advice from BIPM.

EDIT: have sought advice. Pending reply.

@andrew2net
Copy link
Contributor

@ronaldtse I've tried to use random User-Agent but it seems the opscience.iop.org allows only 6 requests per minute. After 2 minutes it starts redirecting to captcha.

@ronaldtse
Copy link
Contributor Author

Got it. Let’s wait for BIPM’s response.

@ronaldtse
Copy link
Contributor Author

The BIPM team has inquired with IOPP (the publisher) and they recommended the following:

Our first recommendation is that they use the CrossRef API. It contains all the article metadata (including the references) and If they just need metadata about our articles then that API should cover everything they need.

It’s a very well documented API. The starting point to the documentation is https://www.crossref.org/education/retrieve-metadata/.

Can you help implement the connection to CrossRef? Thanks.

@andrew2net
Copy link
Contributor

@ronaldtse yes I can. They ask for an email in HTTP requests. They need an email for contact us in case our script cause problems. Requests without email won't be redirected to more relaible servers. Do you have an email for this purpose?

@ronaldtse
Copy link
Contributor Author

Let me ask them. It would be strange to use our email address when users (not us) are doing the requests.

@ronaldtse
Copy link
Contributor Author

@andrew2net it seems that the email address is optional?

Screen Shot 2021-01-27 at 4 07 39 AM

Let's implement without the email first. Later on we can make a config option with Relaton CLI so users can set their own email address for CrossRef.

@andrew2net
Copy link
Contributor

andrew2net commented Jan 26, 2021

@ronaldtse yes, it's optional but without an email, it will work slower https://github.com/CrossRef/rest-api-doc#good-manners--more-reliable-service

@andrew2net
Copy link
Contributor

@ronaldtse here is API status page https://status.crossref.org/#system-metrics you can see that "Polite API" average response time is about 1s while "Public API" averge response time is about 7s.

@ronaldtse
Copy link
Contributor Author

7s!??!?!?!? Why don't we just use a random email address based on the IP address.

https://www.ipify.org :

require "net/http"
ip = Net::HTTP.get(URI("https://api.ipify.org"))
puts "My public IP Address is: " + ip

Then sha256 it and truncate to 16 for the name. We can use relaton.org for the domain to indicate it is a Relaton request.

i.e. "fa9514ae...@relaton.org".

@andrew2net
Copy link
Contributor

andrew2net commented Jan 27, 2021

Anyway, the API works too slow. Only "OpenURL" and paid "Plus" services have an acceptable response time. I'll investigate OpenURL.
And I haven't been able to find a way to search volumes, issues, and articles. Seems they have only a journal and articles in the DB.
In case we won't be successful with the Crossref we can make a relaton-data-bipm-iop repository on GitHub and slowly fetch documents from iopscience.iop.org. What do you think?

@ronaldtse
Copy link
Contributor Author

I've sent this to BIPM, let's see what their response is.


We’re now experimenting the CrossRef API, but it’s not ideal:

  1. The CrossRef API only accepts a “fuzzy” search with limited filtering options.

There is no mechanism to obtain exactly the Metrologia article unless the author provides the full title and authorship information. It is nearly impossible to locate a particular article with confidence.

  1. It’s very slow. For normal requests, it takes up to 7 seconds (or more with filters). Even if we use the “polite API”, where an email is provided, it takes nearly 2 seconds per request. They have a “plus API” that is on average 0.5 seconds, but it requires payment from the user.

Here’s a real example from the Candela definition MEP:
https://www.bipm.org/utils/en/pdf/si-mep/SI-App2-candela.pdf

Screen Shot 2021-01-28 at 9 00 21 AM

NOTE: this reference actually has the wrong title — the correct title is "Predictable Quantum Efficient Detector II: Characterization and confirmed responsivity”, this has an effect on the resulting search. This is why auto-fetch is important — to mitigate authoring errors.

The metadata attributes available here are: author, title, year, issue and page numbers. The intention with auto-fetching is to allow the author to enter minimal identifiable input (i.e. enough information to find this unique reference).

e.g.
journal name: Metrologia
issue number: 50
page number start: 395

However, the CrossRef API does not provide enough parameters to locate this information. In particular, CrossRef does not support search/filtering by volumes, issues, or page numbers.

In order to use the CrossRef API, the author will be forced to provide the full title and some authorship information:
journal name: Metrologia
author name: at least one author
full title: Predictable Quantum Efficient Detector II: Characterization results

Here are two attempts to find out if it works.

Attempt 1 with author given title

The best effort in finding this article in the CrossRef API is the following command:

curl "https://api.crossref.org/works?query.bibliographic=Predictable%20Quantum%20Efficient%20Detector%20II%3A%20Characterization%20results&query.author=M%C3%BCller&query.container-title=Metrologia&filter=issn:0026-1394,prefix:10.1088”

This means, “find items that match the following criteria":

  • given the bibliographic information "Predictable Quantum Efficient Detector II: Characterization results” in the bibliographic information (CrossRef does not support searching by title)
  • given the author “Müller”
  • inside the container “Metrologia”, which has the ISSN of 0026-1394
  • the publisher IOP has a prefix of 10.1088

And it returns 20 results, where the desired article is the 3rd. This query took 7 seconds.

=> Not possible to find article

Attempt 2 with corrected title

Since the first attempt failed I did a search online and found the correct title, which is "Predictable quantum efficient detector: II. Characterization and confirmed responsivity”.

Now we refine the command to:

curl "https://api.crossref.org/works?query.bibliographic=Predictable%20Quantum%20Efficient%20Detector%20II%3A%20Characterization%20and%20confirmed%20responsivity&query.author=M%C3%BCller&query.container-title=Metrologia&filter=issn:0026-1394,prefix:10.1088"

Now it returns 7 results, where the desired article is the 1st. This query took 10 seconds.

=> Works when author and title information are fully accurate.

Conclusion

The CrossRef API is unable to facilitate location of a unique article with certainty because it only supports fuzzy search, and does not support searching by volume, issue or page numbers.

It could only locate an article if and only if the article title and authorship information given is fully accurate, and it would return conflicting results when the title contains words that are also used in another article’s title. For example, these two citations will return ambiguous results, even though the volume, issue and years are vastly different:

M G Cox, The evaluation of key comparison data, Metrologia, 39, 6, 589-595, 2002.

M G Cox, The evaluation of key comparison data: determining the largest consistent subset, Metrologia, 44, 3, 2007.

(both from the Kilogram definition MEP)

@ronaldtse
Copy link
Contributor Author

ronaldtse commented Jan 28, 2021

In any case, I do think that we should support CrossRef separately in say relaton-crossref. There is also a Ruby client gem for CrossRef: https://github.com/sckott/serrano

What do you think?

@andrew2net
Copy link
Contributor

@ronaldtse yes I used the serrano gem.
Which documents do you suppose to fetch from CrossRef?
I think we can use CrossRef but it works too slow sometimes.

@andrew2net
Copy link
Contributor

@ronaldtse sine we have relaton-doi gem, which fetches documents from crossref.ogr, can we close this issue?

@ronaldtse
Copy link
Contributor Author

@andrew2net we now have the full data set of Metrologia from BIPM. I will create a new issue and will close this one.

@ronaldtse
Copy link
Contributor Author

Closing in favour of #28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants