This notebook shows how crossreferences can be explored in python.

As a preparatory step you need to install "crossrefapi":

> pip install crossrefapi

A good introduction demo is https://pypi.org/project/crossrefapi/1.0.3/, here we focus at the main funcionality.


To start with we set up the environment.

In [14]:
from crossref.restful import Works
works = Works()

Choose a DOI and retrive the article information.

Here you might change to an "own" DOI

In [15]:
DOI = '10.1017/jfm.2017.541'
dct = works.doi(DOI)

An readable output format is as a json object.

In [12]:
import json
print(json.dumps(dct, sort_keys=True, indent=4))

{
    "DOI": "10.1017/jfm.2017.541",
    "ISSN": [
        "0022-1120",
        "1469-7645"
    ],
    "URL": "http://dx.doi.org/10.1017/jfm.2017.541",
    "abstract": "<jats:p>The pressure-driven growth model is used to determine the shape of a foam front propagating into an oil reservoir. It is shown that the front, idealised as a curve separating surfactant solution downstream from gas upstream, can be subdivided into two regions: a lower region (approximately parabolic in shape and consisting primarily of material points which have been on the foam front continuously since time zero) and an upper region (consisting of material points which have been newly injected onto the foam front from the top boundary). Various conjectures are presented for the shape of the upper region. A formulation which assumes that the bottom of the upper region is oriented in the same direction as the top of the lower region is shown to fail, as (despite the orientations being aligned) there is a mismatch

E.g. we want to look up some author information:

In [19]:
authors = dc.get('author')
print(len(authors))
authors[0]



4


{'ORCID': 'http://orcid.org/0000-0001-5236-1850',
 'authenticated-orcid': False,
 'given': 'P.',
 'family': 'Grassia',
 'sequence': 'first',
 'affiliation': []}

Get the abstract.

In [22]:
abstract = dct.get('abstract')
print(abstract)

<jats:p>The pressure-driven growth model is used to determine the shape of a foam front propagating into an oil reservoir. It is shown that the front, idealised as a curve separating surfactant solution downstream from gas upstream, can be subdivided into two regions: a lower region (approximately parabolic in shape and consisting primarily of material points which have been on the foam front continuously since time zero) and an upper region (consisting of material points which have been newly injected onto the foam front from the top boundary). Various conjectures are presented for the shape of the upper region. A formulation which assumes that the bottom of the upper region is oriented in the same direction as the top of the lower region is shown to fail, as (despite the orientations being aligned) there is a mismatch in location: the upper and lower regions fail to intersect. Alternative formulations are developed which allow the upper region to curve sufficiently so as to intersect

Get the references

In [28]:
references = dct.get('reference')
len(references)

29

In [29]:
references[4]

{'key': 'S0022112017005419_r3',
 'doi-asserted-by': 'publisher',
 'DOI': '10.1016/j.cis.2012.07.002'}

In [30]:
references[4].get('DOI')

'10.1016/j.cis.2012.07.002'

Looping over the references we can display the DOI of each reference.

In [38]:
for k in range(0, len(references)):
    ref_doi = references[k].get('DOI')
    print(k, ': ', ref_doi)

0 :  None
1 :  10.2118/164891-PA
2 :  None
3 :  10.1016/j.colsurfa.2017.03.059
4 :  10.1016/j.cis.2012.07.002
5 :  None
6 :  None
7 :  10.2118/166244-PA
8 :  10.2118/39102-PA
9 :  10.2118/169104-PA
10 :  10.1017/jfm.2014.287
11 :  None
12 :  10.2118/165282-PA
13 :  None
14 :  None
15 :  10.1021/ba-1994-0242.ch001
16 :  None
17 :  None
18 :  10.1016/j.colsurfa.2016.07.064
19 :  10.1021/acs.iecr.6b01424
20 :  10.1007/978-3-662-05441-3
21 :  10.2118/88811-PA
22 :  10.1006/jcph.1999.6345
23 :  10.1016/j.jcis.2015.10.017
24 :  10.1007/b98879
25 :  10.1016/j.colsurfa.2014.12.023
26 :  None
27 :  10.1137/S1064827500373413
28 :  10.1016/j.colsurfa.2015.06.023


Recall the possibility to download a "pdf" file by sci-hub, for a known DOI:

In [56]:
from scidownl import scihub_download

# paper = "https://doi.org/10.1017/jfm.2017.541"
doi = "10.1017/jfm.2017.541"
out = "./paper/"
# scihub_download(doi, paper_type="doi")
scihub_download(doi, out=out)

[INFO] | 2022/09/03 17:40:48 | Choose scihub url [0]: http://sci-hub.se
[INFO] | 2022/09/03 17:40:49 | <- Request: scihub_url=http://sci-hub.se, source=DoiSource[type=doi, id=10.1017/jfm.2017.541]
[INFO] | 2022/09/03 17:40:49 | -> Response: status_code=200, content_length=7091
[INFO] | 2022/09/03 17:40:49 | * Extracted information: {'url': 'https://zero.sci-hub.se/6541/767bbda4d809eb8d01cbceed9673512f/grassia2017.pdf', 'title': 'Foam front advance during improved oil recovery  similarity solutions at early times near the top of the front. Journal of Fluid Mechanics, 828, 527–572'}


  0% [                                                        ]      0 / 664669  1% [                                                        ]   8192 / 664669  2% [.                                                       ]  16384 / 664669  3% [..                                                      ]  24576 / 664669  4% [..                                                      ]  32768 / 664669  6% [...                                                     ]  40960 / 664669  7% [....                                                    ]  49152 / 664669  8% [....                                                    ]  57344 / 664669  9% [.....                                                   ]  65536 / 664669 11% [......                                                  ]  73728 / 664669 12% [......                                                  ]  81920 / 664669 13% [.......                                                 ]  90112 / 664669 14% [........                         

[INFO] | 2022/09/03 17:40:50 | ↓ Successfully download the url to: ./paper/Foam front advance during improved oil recovery  similarity solutions at early times near the top of the front. Journal of Fluid Mechanics, 828, 527–572 (1).pdf


 27% [...............                                         ] 180224 / 664669 28% [...............                                         ] 188416 / 664669 29% [................                                        ] 196608 / 664669 30% [.................                                       ] 204800 / 664669 32% [.................                                       ] 212992 / 664669 33% [..................                                      ] 221184 / 664669 34% [...................                                     ] 229376 / 664669 35% [....................                                    ] 237568 / 664669 36% [....................                                    ] 245760 / 664669 38% [.....................                                   ] 253952 / 664669 39% [......................                                  ] 262144 / 664669 40% [......................                                  ] 270336 / 664669 41% [.......................          

Now we can download all pdf files where the reference contains a DOI.
We have to weed out the cases where a reference does not have a DOI.

In [55]:
for k in range(0, len(references)):	
    ref_doi=references[k].get('DOI')
    if ref_doi != "None":
        if(ref_doi):
            print(k, ref_doi)
            # scihub_download(ref_doi, out=out)
        else:
            print("---------------------------")

---------------------------
1 10.2118/164891-PA
---------------------------
3 10.1016/j.colsurfa.2017.03.059
4 10.1016/j.cis.2012.07.002
---------------------------
---------------------------
7 10.2118/166244-PA
8 10.2118/39102-PA
9 10.2118/169104-PA
10 10.1017/jfm.2014.287
---------------------------
12 10.2118/165282-PA
---------------------------
---------------------------
15 10.1021/ba-1994-0242.ch001
---------------------------
---------------------------
18 10.1016/j.colsurfa.2016.07.064
19 10.1021/acs.iecr.6b01424
20 10.1007/978-3-662-05441-3
21 10.2118/88811-PA
22 10.1006/jcph.1999.6345
23 10.1016/j.jcis.2015.10.017
24 10.1007/b98879
25 10.1016/j.colsurfa.2014.12.023
---------------------------
27 10.1137/S1064827500373413
28 10.1016/j.colsurfa.2015.06.023


...now a question is how we can also access to "cross references" that is DOI's articles that reference an article with a given DOI.


We do allow searches for the counts against a DOI for cited by but at the moment there is no way in the REST API to search for cited by metadata for works, as members can only retrieve cited-by metadata for their own content.
More information can be found here: https://www.crossref.org/services/cited-by/https://github.com/CrossRef/rest-api-doc/issues/374

Not from Crossref, but from @opencitations there is COCI, the OpenCitations Index of Crossref open DOI-to-DOI references, cf. this blogpost https://opencitations.wordpress.com/2018/07/12/coci/ which allows such queries on the open part of the Crossref citation data.









It seems that "Opencitations" is the better tool.
https://github.com/opencitations
https://opencitations.net/index/coci/api/v1#/citations/{doi}


There are HTTP requests, which can be done from python.
https://opencitations.net/index/coci/api/v1/references/10.1017/jfm.2017.541
https://opencitations.net/index/coci/api/v1/citations/10.1017/jfm.2017.541

In [69]:
import requests
# from requests import get

DOI = "10.1017/jfm.2017.541"


API_CALL_CIT= "https://opencitations.net/index/coci/api/v1/citations/"
API_CALL = API_CALL_CIT + DOI
HTTP_HEADERS = {"authorization": "YOUR-OPENCITATIONS-ACCESS-TOKEN"} # You can read the FAQs and get your token here: https://opencitations.net/accesstoken 

HTTP_HEADERS = {"authorization": "8bd01ec8-0c5e-44fc-9b56-c7b7565dd487"}

response = requests.get(API_CALL, headers = HTTP_HEADERS)
response_dict = json.loads(response.text)

# print(json.dumps(response_dict, sort_keys=True, indent=4))


Figuring out the formate of the response.

In [78]:
response_dict[0]

{'oci': '0200101040036142519143618020001086301010601086307-0200100010736191522370200010737050401',
 'cited': '10.1017/jfm.2017.541',
 'author_sc': 'no',
 'timespan': 'P0Y4M',
 'creation': '2018-01',
 'citing': '10.1140/epje/i2018-11618-7',
 'journal_sc': 'no'}

So, the citing url is accessible by the label "citing".

In [80]:
response_dict[0].get('citing')

'10.1140/epje/i2018-11618-7'

In [85]:
for k in range(0, len(response_dict)):
    doi = response_dict[k].get('citing')
    if(doi):
        print(k, doi)
        # scihub_download(doi, out=out)
    else:
        print("---------------------------")

0 10.1140/epje/i2018-11618-7
1 10.1098/rspa.2018.0290
2 10.1098/rspa.2019.0637
3 10.1103/physrevfluids.5.083604
4 10.1098/rspa.2020.0573
5 10.1098/rspa.2020.0691


It works for both citations and references.


In [65]:
API_CALL_REF = "https://opencitations.net/index/coci/api/v1/references/"
API_CALL = API_CALL_REF + DOI
response = requests.get(API_CALL, headers = HTTP_HEADERS)
response_dict = json.loads(response.text)

print(json.dumps(response_dict, sort_keys=True, indent=4))

[
    {
        "author_sc": "no",
        "cited": "10.1006/jcph.1999.6345",
        "citing": "10.1017/jfm.2017.541",
        "creation": "2017-09-05",
        "journal_sc": "no",
        "oci": "0200100010736191522370200010737050401-02001000006361912251737010909093706030405",
        "timespan": "P17Y10M"
    },
    {
        "author_sc": "no",
        "cited": "10.1007/978-3-662-05441-3",
        "citing": "10.1017/jfm.2017.541",
        "creation": "2017-09-05",
        "journal_sc": "no",
        "oci": "0200100010736191522370200010737050401-02001000007360907086303630606026300050404016303",
        "timespan": "P13Y"
    },
    {
        "author_sc": "no",
        "cited": "10.1007/b98879",
        "citing": "10.1017/jfm.2017.541",
        "creation": "2017-09-05",
        "journal_sc": "no",
        "oci": "0200100010736191522370200010737050401-0200100000736110908080709",
        "timespan": "P14Y"
    },
    {
        "author_sc": "no",
        "cited": "10.1016/j.cis.2012.07.0

Here some possiblities for queries with "crossrefs:

In [41]:
works = Works()
works.query('design thinking').url

'https://api.crossref.org/works?query=design+thinking'

In [42]:
works.query('design thinking').filter(from_online_pub_date='2020').count()

177241

In [46]:
wq=works.query('design thinking').filter(from_online_pub_date='2020')


Other demos
* https://pypi.org/project/crossrefapi/1.0.3/
* https://github.com/CrossRef/rest-api-doc/blob/master/demos/crossref-api-demo.ipynb
