# DOI Negotiation

## 1 Introduction

DOIs provide a persistent link to content. They identify many types of work, from journal articles to research data sets. Typically, someone interacting with DOIs will be a researcher, who will resolve DOIs found in scholarly references to content using a DOI resolver. Such researchers may not even realise they are using DOIs and a DOI resolver since they may follow links with embedded DOIs.

Yet DOIs can provide more than a permanent, indirect link to content. DOI registration agencies such as CrossRef, DataCite and mEDRA collect bibliographic metadata about the works they link to. This metadata can be retrieved from a DOI resolver too, using content negotiation to request a particular representation of the metadata.

For some DOIs content negotiation can be used to retrieve different representations of a work. For example, some DataCite DOIs identify data sets that may be available in a number of data formats and container formats.



## 2 Redirection

The DOI resolver at doi.org will normally redirect a user to the resource location of a DOI. For example, the DOI "10.1126/science.169.3946.635" redirects to a landing page describing the article, "The Structure of Ordinary Water". Content negotiated requests to doi.org that ask for a content type which isn't "text/html" will be redirected to a metadata service hosted by the DOI's registration agency. CrossRef, DataCite and mEDRA support content negotiated DOIs via https://data.crossref.org, https://data.datacite.org and http://data.medra.org respectively.

<div class="alert alert-warning" role="alert" style="margin: 10px">

       GET "Accept: text/html"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
       Publisher landing page
https://www.sciencemag.org/content/169/3946/635
</div>

Normal browser requests or explicit requests for text/html redirect to the content's landing page.

<div class="alert alert-warning" role="alert" style="margin: 10px">

             GET "Accept: application/rdf+xml"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
CrossRef metadata service
http://data.crossref.org/10.1126/science.169.3946.635
</div>

Requests for a data type redirect to a registration agency's metadata service.

## 3 What is Content Negotiation?

Content negotiation allows a user to request a particular representation of a web resource. DOI resolvers use content negotiation to provide different representations of metadata associated with DOIs.

A content negotiated request to a DOI resolver is much like a standard HTTP request, except server-driven negotiation will take place based on the list of acceptable content types a client provides.

### 3.1 The Accept Header

Making a content negotiated request requires the use of a HTTP header, "Accept". Content types that are acceptable to the client (those that it knows how to parse), each with an optional "quality" value indicating its relative suitability. For example, a client that wishes to receive citeproc JSON if it is available, but which can also handle RDF XML if citeproc JSON is unavailable, would make a request with an Accept header listing both "application/citeproc+json" and "application/rdf+xml":

In [None]:
import requests

url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
r.json()


## JSON 
When exchanging data between a browser and a server, the data can only be text.

JSON is text, and we can convert any JavaScript object into JSON, and send JSON to the server.

We can also convert any JSON received from the server into JavaScript objects.

This way we can work with the data as JavaScript objects, with no complicated parsing and translations.

Different python libraries are oriented to manage JSON objects or files, and the information can be parsed easily. From the previous request, we can get the answer in JSON format and sotore it in a variable:

In [None]:
import requests
import json

url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
data = json.loads(r.text) #Data is now a json object
print(data)

In order to know the different elements in the JSON, we can run a loop:

In [None]:
for elem in data:
    print(elem)

For getting the value:

In [None]:
data['URL']

And combine both:

In [None]:
for elem in data:
    print(elem,":", data[elem])

### 3.2 Response Codes

Code	Meaning<br>
200	The request was OK.<br>
204	The request was OK but there was no metadata available.<br>
404	The DOI requested doesn't exist.<br>
406	Can't serve any requested content type.<br>

Individual RA metadata services may utilise additional response codes but they will always use the response codes above in event of the case described.

If multiple content types specified by the client are supported by a DOI then the content type with the highest "q" value (or, if no "q" values are specified, the one that appears first in the "accept" header) will be returned.



## 4 Formatted Citations

CrossRef, DataCite and mEDRA support formatted citations via the text/bibliography content type. These are the output of the Citation Style Language processor, citeproc-js. The content type can take two additional parameters to customise its response format. A "style" can be chosen from the list of style names found in the CSL style repository. Many styles are supported, including common styles such as apa and harvard3:

In [None]:
import requests

url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'text/x-bibliography; style=apa'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

### Let's try with a DOI at Zenodo

In [None]:
url = "https://doi.org/10.5281/zenodo.842715" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5, application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

## Exercise 1
Show title and description

# Solving PIDs
With Handle

By default, handle redirects you to the URL field in PID

In [None]:
import requests

url = "http://hdl.handle.net/1895.22/1013" #PID solver URL
r = requests.get(url) #GET
print(r.status_code)
print(r.text)

The handle System has different options that we can manage:

http://www.handle.net/proxy_servlet.html

For example, we can tell the server not to redirect to URL field:

In [None]:
import requests
import json

url = "http://hdl.handle.net/1895.22/1013?noredirect" #PID URL with ?noredirect
r = requests.get(url) #POST with headers
print(r.status_code)
print(r.text)

### Query Parameters

This proxy server system REST API is CORS-compliant, however, JSONP callbacks are also supported using a "callback" query parameter.

The presence of the "pretty" query parameter instructs the server to pretty-print the JSON output.

The "auth" query parameter instructs the proxy server to bypass its cache and query a primary handle server directly for the newest handle data.

The "cert" query parameter instructs the proxy server to request an authenticated response from the source handle server. Not generally needed by end users.

The "type" and "index" query parameters allow the resolution response to be restricted to specific types and indexes of interest. Multiple "type" and "index" parameters are allowed and values are returned which match any of the specified types or indexes. For example,

For example, http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse yields the response

```JSON
processResponse({
   "responseCode":1,
   "handle":"4263537/4000",
   "values":[
      {
         "index":1,
         "type":"URL",
         "data":{ "format":"string", "value":"http://www.handle.net/index.html" },
         "ttl":86400,
         "timestamp":"2001-11-21T16:21:35Z"
      },
      {
         "index":2,
         "type":"EMAIL",
         "data":{ "format":"string", "value":"hdladmin@cnri.reston.va.us" },
         "ttl":86400,
         "timestamp":"2000-04-10T22:41:46Z"
      }
   ]
});
```

In [None]:
import requests
import json

url = "http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse" #PID URL with ?noredirect
headers = {'Content-Type': 'application/json'} #Type of response accpeted
r = requests.get(url, headers=headers) #POST with headers
print(r.text) 


# Exercise 2

* 1: Try to find *TWO* different repositories that manage PIDs or DOIs (e.g figshare.com, DataONE)
* 2: Check an example file or the documentation to get the identifier from a resource
* 3: Resolve the identifier using python request
* 4: Show relevant information, like author, title, description...
* 5: Is there any difference managing the DOIs/PIDs between the different repositories?