# Creating test DOIs with the DataCite DOI

```{admonition} TODO
- [ ] Get section reviewed by a beta reader
- [ ] Add "where next?" links
```

This notebook shows how to use the Python `faker` package to randomly generate DOI metadata for testing purposes, and to create DOIs in the test service using that metadata.

## Setting up

Because we are making changes, we will need to **authenticate** ourselves (prove that we are permitted to make these changes) by providing a Repository ID and password. Rather than saving them in this notebook directly (which would reveal them to _anyone_ with access to the notebook), we instead store them in a separate file called `.env`. `.env` files are a common way of doing this and are supported in Python by the [`python-dotenv`](https://saurabh-kumar.com/python-dotenv/) library. You can specify your own ID and password by creating a file called `.env` in the same folder as this notebook, with contents similar to the following:

```sh
REPO_ID=<repository ID here>
REPO_PW=<repository password here>
```

We load these into Python as follows:

In [1]:
from dotenv import load_dotenv
from os import getenv

load_dotenv()
REPO_ID = getenv('REPO_ID')
REPO_PW = getenv('REPO_PW')

We access the DataCite API using Hypertext Transfer Protocol (HTTP), the same protocol that your web browser uses to load this page! There are a number of Python libraries that make this easier, and here we will use one of the most popular, [Requests](https://requests.readthedocs.io/en/latest/). This is the same library that the example Python code on [DataCite REST API reference](https://support.datacite.org/reference/introduction) uses.

To avoid having to specify the ID and password with every request we make to the API, we create a Session object holding persistent parameters that will be added to every request we make:

In [2]:
import requests

s = requests.Session()
s.auth = (REPO_ID, REPO_PW)

## Using the API

Now we're ready to make our first API call! We do this by sending a HTTP `GET` request to a specific URL, which includes the name of the API server (`api.test.datacite.org` for testing), the name of the "endpoint" (`/clients`) and our repository ID converted to lower case. This returns us some useful information about the Repository, including the list of available prefixes.

_N.B. DataCite used to refer to Repositories as "clients" and this terminology is still present in the API for backward compatibility with older code._

In [3]:
response = s.get(f'https://api.test.datacite.org/clients/{REPO_ID.lower()}')
client_info = response.json()

print("Available prefixes:")
print("\n".join(p['id'] for p in client_info['data']['relationships']['prefixes']['data']))

Available prefixes:
10.80604


Ok! Our access works, as we've successfully fetched a list of prefixes that we're allowed to use with this Repository. Usually, as in this case, there will only be one but occasionally you may find several.

## Creating new DOIs

For convenience we will set some more parameters that we can reuse later to save repeating ourselves and risking making a mistake:

In [4]:
DOI_COUNT = 5
PREFIX = client_info['data']['relationships']['prefixes']['data'][0]['id']
PUBLISHER = "University of Poppleton"
PUB_YEAR = 2021

We're creating test DOIs, so the metadata for these DOIs could be any nonsense we choose. However, it'll be easier to spot problems if we choose "real-looking" data for these. Let's use a Python library called [Faker](https://faker.readthedocs.io/en/master/) which has a lot of utilities for randomly generating text of different types, like names, addresses, sentences, paragraphs, etc.

In [5]:
from faker import Faker

Next we'll define a function that will fill in a template to generate a minimal set of valid metadata given only a prefix, publisher and publication year. Breaking up your code into reusable pieces like this is a good practice: it minimises the chance of errors from duplicating code, and breaks your code up into smaller units that are easier to understand at a glance.

We will send this data to DataCite in JSON format, but for convenience we can write it out in plain Python code as a dictionary and let Requests convert it to JSON behind the scenes. The structure of this data is as shown in the [Add a new DOI](https://support.datacite.org/reference/post_dois) reference documentation.

- Setting `event` to `"publish"` will immediately publish the DOI in the Discoverable state if there are no errors in the metadata; it can also be set to `"hide"` for a Draft DOI (only visible in Fabrica) or `"register"` for a Registered DOI (resolvable but not returned in search results)
- Setting `prefix` and not specifying the DOI will cause the API to generate a random DOI suffix for us, of the form `XXXX-XXXX`
- There are many more attributes that can be specified (see the [DataCite Schema](http://schema.datacite.org/) for a full list); this is the minimal set required to publish a DOI

In [14]:
def make_fake_doi(prefix, publisher, pub_year, fake=Faker()):
    return {
      "data": {
        "type": "dois",
        "attributes": {
          "event": "publish",
          "prefix": prefix,
          "creators": [{"name": fake.name()} for _ in range(3)],
          "titles": [{"title": fake.sentence()[:-1]}],
          "publisher": publisher,
          "publicationYear": pub_year,
          "types": {
            "resourceType": "Report",
            "resourceTypeGeneral": "Text",
          },
          "url": fake.url(schemes=('https',)),
          "schemaVersion": "http://datacite.org/schema/kernel-4",
        }
      }
    }

Now we're ready to make some DOIs! To create a new DOI we need to send another HTTP request, this time a `POST` request to the `/dois` endpoint, including the metadata in JSON format. The response comes back in JSON format too, which we need to convert to a Python dictionary to inspect. This is such a common pattern that Requests makes it fairly easy for us. Since we didn't explicitly set a suffix the API will have generated one for us, and we will be able to pull that from the `id` field in the returned data; we collect these new DOIs in the `test_dois` list.

In [15]:
test_dois = []
for i in range(DOI_COUNT):
    params = make_fake_doi(PREFIX, PUBLISHER, PUB_YEAR)
    response = s.post('https://api.test.datacite.org/dois', json=params)
    response.raise_for_status()
    doi_info = response.json()
    new_doi = doi_info['data']['id']
    test_dois.append(new_doi)

This hasn't given us any output, but it hasn't shown any warnings or errors either so it must have worked. To be sure, let's try fetching those DOIs back from the API, a `GET` request to the `/dois` endpoint, and display some of their metadata.

In [16]:
for doi in test_dois:
    response = s.get(f'https://api.test.datacite.org/dois/{doi}')
    response.raise_for_status()
    doi_data = response.json()['data']
    print(doi_data['id'],
          doi_data['attributes']['url'],
          doi_data['attributes']['titles'][0]['title'])

10.80604/01b6-3f48 https://www.ortega.com/ Research land give threat population
10.80604/t38p-x492 https://diaz.net/ Picture within challenge themselves
10.80604/yg56-7v87 https://www.harris.com/ Realize the anything movie
10.80604/7tke-wf62 https://adams.info/ View citizen public answer take
10.80604/cr5f-xd90 https://www.ramirez.com/ Order radio drug establish


Yes, that looks promising. It's also possible to check them by logging into [Fabrica Test](https://doi.test.datacite.org) on the web.

## Updating DOIs

The final thing we can do is to update existing DOIs. Although in this case, we'll just generate another fake URL, we would usually know what changes we want to make to a set of DOIs, and specify them in one of two ways:

1. Create a spreadsheet listing the new metadata for each DOI we want to update, and read it into Python using a library like Pandas before making the changes:
    - `updates = pandas.read_excel("DOI_updates.xlsx")`
2. Specify a transformation directly using Python, such as replacing `pure.poppleton.ac.uk` with `openaccess.poppleton.ac.uk` in every URL:
    - `new_url = old_url.replace("pure.poppleton.ac.uk", "openaccess.poppleton.ac.uk")`
    
Updating an existing DOI requires us to make a `PUT` request to a special endpoint including the DOI itself, `/dois/<prefix>/<suffix>` (where `<prefix>` and `<suffix>` are replaced with the actual prefix and suffix for the DOI):

In [13]:
fake = Faker()

for doi in test_dois:
    new_url = fake.url(schemes=('https',))
    print("Updating", doi, "to", new_url)
    update_params = {'data': {'attributes': {'url': new_url}}}
    response = s.put(f'https://api.test.datacite.org/dois/{doi}', json=update_params)
    response.raise_for_status()
    
for doi in test_dois:
    response = s.get(f'https://api.test.datacite.org/dois/{doi}')
    response.raise_for_status()
    doi_data = response.json()['data']
    print(doi_data['id'], "now points to", doi_data['attributes']['url'])

Updating 10.80604/apkn-8a80 to https://www.jones.com/
Updating 10.80604/8c3v-4n24 to https://www.george.biz/
Updating 10.80604/5hqm-p271 to https://www.mitchell-harper.com/
Updating 10.80604/qqnn-ng49 to https://petty.info/
Updating 10.80604/0ph3-sf17 to https://spencer.biz/
10.80604/apkn-8a80 now points to https://www.jones.com/
10.80604/8c3v-4n24 now points to https://www.george.biz/
10.80604/5hqm-p271 now points to https://www.mitchell-harper.com/
10.80604/qqnn-ng49 now points to https://petty.info/
10.80604/0ph3-sf17 now points to https://spencer.biz/


## What's next?

To learn more about the DataCite API in detail, take a look at {doc}`../resources`.