# Working with OAI-PMH

The Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH)
provides an XML-based API interface. This interface supports common actions
useful for "metadata harvesting," such as querying, export or gathering
of data in standard metadata formats, and support for collections or sets
of content. 

This notebook explores OAI-PMH with the help of a python library
called [OAIPMH Scythe](https://afuetterer.github.io/oaipmh-scythe/latest/).

In [None]:
# uncomment the following if you need to install
#!python -m pip install oaipmh-scythe

Import the module:

In [None]:
from oaipmh_scythe import Scythe

## Query an OAI-PMH endpoint

As seen below, the endpoint is opened with the `Scythe()` function, then the responses are looped through, for each an identifier is printed:

In [None]:
with Scythe("http://jajohnst.si676.si.umich.edu/omeka-s/oai") as scythe:
    records = scythe.list_records()
    for record in records:
        print(record.header.identifier)
#    next(records)


## Identify the endpoint

Returns information about the host repository

In [None]:
repository = scythe.identify()
for info in repository:
    print(info)

## Request metadata formats

Determine which metadata standard outputs can be requested.
Mirrors the `format` verb in OAI-PMH:

In [None]:
metadata_formats = scythe.list_metadata_formats()
for format in metadata_formats:
    print(format.metadataPrefix)

## Request a single record

How can you request a single record?

In [None]:
record = scythe.get_record("oai:jajohnst.si676.si.umich.edu:284")

for data in record:
    print(data)

## Retrieving XML data

OAI-PMH was developed to be communicated in XML formats.
As the above show, the OAIPMH Scythe library works with the responses
as Python data.
However, it is possible to also request the full XML responses
using a method called `OAIReponseIterator`.

In [None]:
from oaipmh_scythe.iterator import OAIResponseIterator

scythe = Scythe("http://jajohnst.si676.si.umich.edu/omeka-s/oai", iterator=OAIResponseIterator)
responses = scythe.list_records()
next(responses)
# <OAIResponse ListRecords>

And, save to a local file:

In [None]:
with open("oai-xml-responses.xml", "w") as f:
    f.write(next(responses).raw.encode("utf-8"))