Polymatheia supports accessing metadata records via the OAI-PMH protocol. It is a generic protocol designed for harvesting large amounts of metadata from an archive and was initially intended primarily as an archive-to-archive metadata exchange protocol. As a result of this focus, the OAI-PMH protocol provides only very limited facilities for filtering data on the server side. Instead the standard pattern is to download complete data-sets from the server and then filter and transform locally.
OAI-PMH supports filtering on the server-side via the concept of a set. To find the available sets use the ~polymatheia.data.reader.OAISetReader
:
python
from polymatheia.data.reader import OAISetReader
reader = OAISetReader('http://www.digizeitschriften.de/oai2/') for setSpec in reader: print(setSpec)
An OAI-PMH server can provide the same metadata using different formats. To find which formats the server provides use the ~polymatheia.data.reader.OAIMetadataFormatReader
:
python
from polymatheia.data.reader import OAIMetadataFormatReader
reader = OAIMetadataFormatReader('http://www.digizeitschriften.de/oai2/') for format in reader: print(format)
To retrieve all records use the ~polymatheia.data.reader.OAIRecordReader
:
python
from polymatheia.data.reader import OAIRecordReader
reader = OAIRecordReader('http://www.digizeitschriften.de/oai2/') for record in reader: print(record)
Warning
This will retrieve ALL records provided by the OAI-PMH server, which can take a significant amount of time.
To retrieve the first n records:
python
from polymatheia.data.reader import OAIRecordReader
reader = OAIRecordReader('http://www.digizeitschriften.de/oai2/', max_records=10) for record in reader: print(record)
Note
Whether running this code repeatedly returns the same records depends on the OAI-PMH server implementation. Polymatheia cannot guarantee any order.
To retrieve only those records that are in a set, provide the set specifier via the set_spec
parameter:
python
from polymatheia.data.reader import OAIRecordReader
reader = OAIRecordReader('http://www.digizeitschriften.de/oai2/', max_records=10, set_spec='EU') for record in reader: print(record)
To retrieve the records in a specific metadata format, provide the format identifier via the metadata_prefix
parameter:
python
from polymatheia.data.reader import OAIRecordReader
reader = OAIRecordReader('http://www.digizeitschriften.de/oai2/', max_records=10, metadata_prefix='mets') for record in reader: print(record)