Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talk to the CrossRef API/Solr #39

Closed
jure opened this issue Sep 8, 2014 · 5 comments
Closed

Talk to the CrossRef API/Solr #39

jure opened this issue Sep 8, 2014 · 5 comments
Assignees

Comments

@jure
Copy link
Contributor

jure commented Sep 8, 2014

https://github.com/articlemetrics/alm-report/blob/master/lib/solr_request.rb needs to be able to talk both to the PLOS and the CrossRef API.

An example search in all indexed fields and returning journal articles, looks like this for PLOS:

http://api.plos.org/search?facet=false&fl=id,pmid,publication_date,received_date,accepted_date,title,cross_published_journal_name,author_display,editor_display,article_type,affiliate,subject,financial_disclosure&fq=!article_type_facet:%22Issue%20Image%22&hl=false&q=everything:cancer&rows=25&wt=json

And like this for CrossRef's API:

http://api.crossref.org/works?query=biology&filter=type:journal-article

PLOS allows us to filter by these fields:

PLOS Search Field Name Description Note
id DOI (Digital Object Identifier) Extended for partial documents
everything All text in the article Includes Meta information
title Article Title
title_display Article Title For display purposes only
alternate_title Alternative Title
author Author Can have multiple values
author_display Author For display purposes only
author_without_collab_display Author For display purposes only. All the authors except for collaborative authors
author_collab_only_display Author For display purposes only. Collaborative authors only
abstract Abstract section
abstract_primary_display Abstract section For display purposes only. Primary abstract only
introduction Introduction section
materials_and_methods Materials and Methods section
results_and_discussion Results and discussion section
conclusions Conclusions section
supporting_information Supporting Information section
reference Reference section Can have multiple values
body Most sections of the article Without Abstract or References
publication_date Publication Date Requires start and end date
received_date Received Date Requires start and end date
accepted_date Accepted Date Requires start and end date
journal Full Journal Name
volume Volume
issue Issue
article_type Article Type
subject Subject Category Can have multiple values. The most recent subject categories are listed at the PLOS ONE taxonomy.
subject_level_1 Subject Category Can have multiple values. Contains only the top level subjects. The most recent subject categories are listed at the PLOS ONE taxonomy.
eissn electronic ISSN
pissn print ISSN
elocation_id Electronic Location Used by Pub Med Central
journal_id_pmc Journal ID at PMC Used by Pub Med Central
journal_id_nlm_ta Journal ID at NLM Used by the National Library of Medicine
journal_id_publisher Publisher of this Journal Short identifier
publisher Publisher of this Article Full name
pagecount Total number of pages Not all articles have page count
editor Editor Can have multiple values
editor_display Editor For display purposes only.
affiliate Affiliate Can have multiple values
author_notes Author Notes
competing_interest Competing Interest Statement
financial_disclosure Financial Disclosure Statement
counter_total_all Total views, all time
counter_total_month Total views, last 30 days
timestamp Time of last index
copyright copyright-statement copyright information
figure_table_caption Figure and table captions
PLOS-specific indexes for articles that appear in multiple journals
cross_published_journal_name Cross Published Journal Name
cross_published_journal_key Cross Published Journal Key
cross_published_journal_eissn Cross Published Journal EISSN

And CrossRef allows us to filter by these fields:

filter possible values description
has-funder metadata which includes one or more funder entry
funder {funder_id} metadata which include the {funder_id} in FundRef data
prefix {owner_prefix} metadata belonging to a DOI owner prefix {owner_prefix} (e.g. 10.1016 )
member {member_id} metadata belonging to a CrossRef member
from-index-date {date} metadata indexed since (inclusive) {date}
until-index-date {date} metadata indexed before (inclusive) {date}
from-deposit-date {date} metadata last (re)deposited since (inclusive) {date}
until-deposit-date {date} metadata last (re)deposited before (inclusive) {date}
from-update-date {date} Metadata updated since (inclusive) {date}. Currently the same as from-deposit-date.
until-update-date {date} Metadata updated before (inclusive) {date}. Currently the same as until-deposit-date.
from-first-deposit-date {date} metadata first deposited since (inclusive) {date} [^*]
until-first-deposit-date {date} metadata first deposited before (inclusive) {date} [^*]
from-pub-date {date} metadata where published date is since (inclusive) {date}
until-pub-date {date} metadata where published date is before (inclusive) {date}
has-license metadata that includes any <license_ref> elements.
license.url {url} metadata where <license_ref> value equals {url}
license.version {string} metadata where the <license_ref>'s applies_to attribute is {string}
license.delay {integer} metadata where difference between publication date and the <license_ref>'s start_date attribute is <= {integer} (in days)
has-full-text metadata that includes any full text <resource> elements.
full-text.version {string} metadata where <resource> element's content_version attribute is {string}.
full-text.type {mime_type} metadata where <resource> element's content_type attribute is {mime_type} (e.g. application/pdf).
public-references metadata where publishers allow references to be distributed publically. [^*]
has-references metadata for works that have a list of references
has-archive metadata which include name of archive partner
archive {string} metadata which where value of archive partner is {string}
has-orcid metadata which includes one or more ORCIDs
orcid {orcid} metadata where <orcid> element's value = {orcid}
issn {issn} metadata where record has an ISSN = {issn}. Format is xxxx-xxxx.
type {type} metadata records whose type = {type}. Type must be an ID value from the list of types returned by the /types resource
directory {directory} metadata records whose article or serial are mentioned in the given {directory}. Currently the only supported value is doaj.
doi {doi} metadata describing the DOI {doi}
updates {doi} metadata for records that represent editorial updates to the DOI {doi}
is-update metadata for records that represent editorial updates
has-update-policy metadata for records that include a link to an editorial update policy
container-title metadata for records with a publication title exactly with an exact match
publisher-name metadata for records with an exact matching publisher name
category-name metadata for records with an exact matching category label
type-name metadata for records with an exacty matching type label
award.number {award_number} metadata for records with a matching award nunber. Optionally combine with award.funder
award.funder {funder doi or id} metadata for records with an award with matching funder. Optionally combine with award.number

I'll update this issue as I go along.

@jure jure self-assigned this Sep 8, 2014
@jure jure added this to the Iteration 2 milestone Sep 8, 2014
@kjw
Copy link

kjw commented Sep 8, 2014

@jure Hi - can you elaborate more on your tweet? Are you looking for more parity between PLoS / CrossRef APIs?

Note that you can query on all metadata via the query parameter:

api.crossref.org/works?query=fish

Which could act as a catch all for filter fields supported by PLoS but not CrossRef.

Can add author as a textual filter field. What others?

@jure
Copy link
Contributor Author

jure commented Sep 8, 2014

Hi @kjw, how nice of you to drop by :) Let's start the party then! Looks like it's not possible to search specifically in the title field or author field, which would be a nice addition. Basically two use cases:

  • give me all papers with author x
  • give me all papers with title x

And both of those are present in your schema, correctly indexed as well (text and text_name field types should work fine).

Yes, I'm looking for more parity between these two APIs, so that we can replace the PLOS API powering http://almreports.plos.org/ with CrossRef's search.

@kjw, would it help if I tell you when I'm done going through both of the APIs and figure out what would be crucial to add on your side to make the switch painless?

@jure
Copy link
Contributor Author

jure commented Sep 8, 2014

@kjw: Not to make the APIs exactly the same, which doesn't make sense, but seeing the differences between the two APIs, you might agree that some features would be beneficial to have on your side as well.

@kjw
Copy link

kjw commented Sep 8, 2014

@jure No problem at all. Always looking for feedback on the CrossRef API.

Yes if possible, the best path here for me would be a list of suggested changes in an issue on the http://github.com/CrossRef/rest-api-doc repo.

@jure
Copy link
Contributor Author

jure commented Sep 8, 2014

Great, I'll let you know then at https://github.com/CrossRef/rest-api-doc, today or tomorrow morning, when I figure out what exactly would be good to add. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants