# OAI-PMH

La documentación del protocolo la podéis encontrar aquí:

https://www.openarchives.org/OAI/openarchivesprotocol.html

### Librerías necesarias para la práctica

```python
import xml.etree.ElementTree as ET
import requests
```

### Identify

This verb is used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Repositories may also employ the Identify verb to return additional descriptive information.

In [12]:
import xml.etree.ElementTree as ET
import requests

oai = requests.get('https://zenodo.org/oai2d?verb=Identify') #Peticion al servidor

#Para crear el arbol XML
xmlTree = ET.ElementTree(ET.fromstring(oai.text))
root = xmlTree.getroot()

iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag+": "+elem.text)
    
elementos = xmlTree.findall('//{http://www.openarchives.org/OAI/2.0/}baseURL')
for e in elementos:
    print("BaseURL:", e.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH: 
  
{http://www.openarchives.org/OAI/2.0/}responseDate: 2017-12-14T08:00:18Z
{http://www.openarchives.org/OAI/2.0/}request: https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}Identify: 
    
{http://www.openarchives.org/OAI/2.0/}repositoryName: Zenodo
{http://www.openarchives.org/OAI/2.0/}baseURL: https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}protocolVersion: 2.0
{http://www.openarchives.org/OAI/2.0/}adminEmail: info@zenodo.org
{http://www.openarchives.org/OAI/2.0/}earliestDatestamp: 2013-05-06T23:27:15Z
{http://www.openarchives.org/OAI/2.0/}deletedRecord: no
{http://www.openarchives.org/OAI/2.0/}granularity: YYYY-MM-DDThh:mm:ssZ
BaseURL: https://zenodo.org/oai2d




### Listar tipos de metadatos

ListMetadataFormats

This verb is used to retrieve the metadata formats available from a repository. An optional argument restricts the request to the formats available for a specific item.


In [14]:
import xml.etree.ElementTree as ET
import requests

oai = requests.get('https://zenodo.org/oai2d?verb=ListMetadataFormats') #Peticion al servidor

#Para crear el arbol XML
xmlTree = ET.ElementTree(ET.fromstring(oai.text))

iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag+": "+elem.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH: 
  
{http://www.openarchives.org/OAI/2.0/}responseDate: 2017-12-14T08:05:11Z
{http://www.openarchives.org/OAI/2.0/}request: https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}ListMetadataFormats: 
    
{http://www.openarchives.org/OAI/2.0/}metadataFormat: 
      
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: oai_dc
{http://www.openarchives.org/OAI/2.0/}schema: http://www.openarchives.org/OAI/2.0/oai_dc.xsd
{http://www.openarchives.org/OAI/2.0/}metadataNamespace: http://www.openarchives.org/OAI/2.0/oai_dc/
{http://www.openarchives.org/OAI/2.0/}metadataFormat: 
      
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: oai_datacite3
{http://www.openarchives.org/OAI/2.0/}schema: http://schema.datacite.org/meta/kernel-3/metadata.xsd
{http://www.openarchives.org/OAI/2.0/}metadataNamespace: http://datacite.org/schema/kernel-3
{http://www.openarchives.org/OAI/2.0/}metadataFormat: 
      
{http://www.openarchives.org/OAI/2.0/

Podemos capturar sólo la información relevante

In [18]:
iterator = xmlTree.iter()
for elem in iterator:
    if (elem.tag == '{http://www.openarchives.org/OAI/2.0/}metadataPrefix') or (elem.tag == '{http://www.openarchives.org/OAI/2.0/}schema'):
        print(elem.tag+": "+elem.text)

{http://www.openarchives.org/OAI/2.0/}metadataPrefix: oai_dc
{http://www.openarchives.org/OAI/2.0/}schema: http://www.openarchives.org/OAI/2.0/oai_dc.xsd
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: oai_datacite3
{http://www.openarchives.org/OAI/2.0/}schema: http://schema.datacite.org/meta/kernel-3/metadata.xsd
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: marc21
{http://www.openarchives.org/OAI/2.0/}schema: http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: datacite
{http://www.openarchives.org/OAI/2.0/}schema: http://schema.datacite.org/meta/kernel-3/metadata.xsd
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: marcxml
{http://www.openarchives.org/OAI/2.0/}schema: http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
{http://www.openarchives.org/OAI/2.0/}metadataPrefix: datacite3
{http://www.openarchives.org/OAI/2.0/}schema: http://schema.datacite.org/meta/kernel-3/metadata.xsd
{http://www.openar

O de forma más "elegante"

In [30]:
for e in xmlTree.findall('//{http://www.openarchives.org/OAI/2.0/}metadataFormat'):
    metadataPrefix = e.find('{http://www.openarchives.org/OAI/2.0/}metadataPrefix').text
    schema = e.find('{http://www.openarchives.org/OAI/2.0/}schema').text
    print(metadataPrefix, ':', schema)

oai_dc : http://www.openarchives.org/OAI/2.0/oai_dc.xsd
oai_datacite3 : http://schema.datacite.org/meta/kernel-3/metadata.xsd
marc21 : http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
datacite : http://schema.datacite.org/meta/kernel-3/metadata.xsd
marcxml : http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd
datacite3 : http://schema.datacite.org/meta/kernel-3/metadata.xsd
oai_datacite : http://schema.datacite.org/meta/kernel-3/metadata.xsd


  if __name__ == '__main__':


Podemos describir el último schema

In [36]:
oai_datacite = requests.get('http://schema.datacite.org/meta/kernel-3/metadata.xsd') #Peticion al servidor
print(oai_datacite.text)
#Para crear el arbol XML
xmlTree = ET.ElementTree(ET.fromstring(oai_datacite.text))

iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag)

<?xml version="1.0" encoding="UTF-8"?>
<!-- Revision history
     2010-08-26   Complete revision according to new common specification by the metadata work group after review. AJH, DTIC
	 2010-11-17 Revised to current state of kernel review, FZ, TIB 
	 2011-01-17 Complete revsion after community review. FZ, TIB
	 2011-03-17 Release of v2.1: added a namespace; mandatory properties got minLength; changes in the definitions of relationTypes
	 IsDocumentedBy/Documents and isCompiledBy/Compiles; changes type of property "Date" from xs:date to xs:string. FZ, TIB
	 2011-06-27 v2.2: namespace: kernel-2.2, additions to controlled lists "resourceType", "contributorType", "relatedIdentifierType", and "descriptionType". Removal of intermediate include-files.
     2013-05 v3.0: namespace: kernel-3.0; delete LastMetadataUpdate & MetadateVersionNumber; additions to controlled lists "contributorType", "dateType", "descriptionType", "relationType", "relatedIdentifierType" & "resourceType"; deletion of 

Si analizamos la descripción del estándar, vemos que las etiquetas de los metadatos están almacenados en la etiqueta "element", en el atributo "name". Podemos obetener un listado de lo que esperamos encontrar

In [41]:
namespaces = {'xs' : 'http://www.w3.org/2001/XMLSchema'}
for tags in xmlTree.findall('//xs:element',namespaces):
    print ('Metadato: ',tags.attrib['name'])

Metadato:  resource
Metadato:  identifier
Metadato:  creators
Metadato:  creator
Metadato:  creatorName
Metadato:  nameIdentifier
Metadato:  affiliation
Metadato:  titles
Metadato:  title
Metadato:  publisher
Metadato:  publicationYear
Metadato:  subjects
Metadato:  subject
Metadato:  contributors
Metadato:  contributor
Metadato:  contributorName
Metadato:  nameIdentifier
Metadato:  affiliation
Metadato:  dates
Metadato:  date
Metadato:  language
Metadato:  resourceType
Metadato:  alternateIdentifiers
Metadato:  alternateIdentifier
Metadato:  relatedIdentifiers
Metadato:  relatedIdentifier
Metadato:  sizes
Metadato:  size
Metadato:  formats
Metadato:  format
Metadato:  version
Metadato:  rightsList
Metadato:  rights
Metadato:  descriptions
Metadato:  description
Metadato:  br
Metadato:  geoLocations
Metadato:  geoLocation
Metadato:  geoLocationPoint
Metadato:  geoLocationBox
Metadato:  geoLocationPlace


  from ipykernel import kernelapp as app


### ListIdentifiers

This verb is an abbreviated form of ListRecords, retrieving only headers rather than records. Optional arguments permit selective harvesting of headers based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted.

Devuelve la cabecera de los registros, para conocer la información esencial.

Necesita el prefijo del formato de metadatos

In [43]:
#prefijo: oai_datacite
oai = requests.get('https://zenodo.org/oai2d?verb=ListIdentifiers&metadataPrefix=oai_datacite')
xmlTree = ET.ElementTree(ET.fromstring(oai.text))

iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag+": "+elem.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH: 
  
{http://www.openarchives.org/OAI/2.0/}responseDate: 2017-12-14T08:34:44Z
{http://www.openarchives.org/OAI/2.0/}request: https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}ListIdentifiers: 
    
{http://www.openarchives.org/OAI/2.0/}header: 
      
{http://www.openarchives.org/OAI/2.0/}identifier: oai:zenodo.org:164231
{http://www.openarchives.org/OAI/2.0/}datestamp: 2017-05-30T03:54:31Z
{http://www.openarchives.org/OAI/2.0/}setSpec: user-biosyslit
{http://www.openarchives.org/OAI/2.0/}header: 
      
{http://www.openarchives.org/OAI/2.0/}identifier: oai:zenodo.org:252141
{http://www.openarchives.org/OAI/2.0/}datestamp: 2017-05-30T05:01:49Z
{http://www.openarchives.org/OAI/2.0/}setSpec: user-biosyslit
{http://www.openarchives.org/OAI/2.0/}header: 
      
{http://www.openarchives.org/OAI/2.0/}identifier: oai:zenodo.org:252977
{http://www.openarchives.org/OAI/2.0/}datestamp: 2017-05-30T05:01:47Z
{http://www.openarchives.org/OA

### ListRecords

Listar los registros

This verb is used to harvest records from a repository. Optional arguments permit selective harvesting of records based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted. No metadata will be present for records with deleted status.

In [46]:
import xml.etree.ElementTree as ET
import requests

oai = requests.get('https://zenodo.org/oai2d?verb=ListRecords&metadataPrefix=oai_datacite')
xmlTree = ET.ElementTree(ET.fromstring(oai.text))
iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag)
    print(elem.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH

  
{http://www.openarchives.org/OAI/2.0/}responseDate
2017-12-14T08:39:09Z
{http://www.openarchives.org/OAI/2.0/}request
https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}ListRecords

    
{http://www.openarchives.org/OAI/2.0/}record

      
{http://www.openarchives.org/OAI/2.0/}header

        
{http://www.openarchives.org/OAI/2.0/}identifier
oai:zenodo.org:164231
{http://www.openarchives.org/OAI/2.0/}datestamp
2017-05-30T03:54:31Z
{http://www.openarchives.org/OAI/2.0/}setSpec
user-biosyslit
{http://www.openarchives.org/OAI/2.0/}metadata

        
{http://schema.datacite.org/oai/oai-1.0/}oai_datacite

          
{http://schema.datacite.org/oai/oai-1.0/}isReferenceQuality
true
{http://schema.datacite.org/oai/oai-1.0/}schemaVersion
3.1
{http://schema.datacite.org/oai/oai-1.0/}datacentreSymbol
CERN.ZENODO
{http://schema.datacite.org/oai/oai-1.0/}payload

            
{http://datacite.org/schema/kernel-3}resource

              


Muestra sólo el identificador y las palabras clave (subject)

Registros entre dos fechas

In [47]:
import xml.etree.ElementTree as ET

bounds = "&from='2017-01-01'&until='2017-12-31'" #Para filtrar por fechas
oai = requests.get('https://zenodo.org/oai2d?verb=ListRecords&metadataPrefix=oai_datacite'+bounds)
xmlTree = ET.ElementTree(ET.fromstring(oai.text))
iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag)
    print(elem.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH

  
{http://www.openarchives.org/OAI/2.0/}responseDate
2017-12-14T08:39:36Z
{http://www.openarchives.org/OAI/2.0/}request
https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}ListRecords

    
{http://www.openarchives.org/OAI/2.0/}record

      
{http://www.openarchives.org/OAI/2.0/}header

        
{http://www.openarchives.org/OAI/2.0/}identifier
oai:zenodo.org:164231
{http://www.openarchives.org/OAI/2.0/}datestamp
2017-05-30T03:54:31Z
{http://www.openarchives.org/OAI/2.0/}setSpec
user-biosyslit
{http://www.openarchives.org/OAI/2.0/}metadata

        
{http://schema.datacite.org/oai/oai-1.0/}oai_datacite

          
{http://schema.datacite.org/oai/oai-1.0/}isReferenceQuality
true
{http://schema.datacite.org/oai/oai-1.0/}schemaVersion
3.1
{http://schema.datacite.org/oai/oai-1.0/}datacentreSymbol
CERN.ZENODO
{http://schema.datacite.org/oai/oai-1.0/}payload

            
{http://datacite.org/schema/kernel-3}resource

              


OBtener DOI de un recurso. Pista: Mirar atributos (identifierType="DOI")

In [51]:
import xml.etree.ElementTree as ET

bounds = "&from='2017-01-01'&until='2017-12-31'" #Para filtrar por fechas
oai = requests.get('https://zenodo.org/oai2d?verb=ListRecords&metadataPrefix=oai_datacite'+bounds)
xmlTree = ET.ElementTree(ET.fromstring(oai.text))
iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag)
    print(elem.text)



{http://www.openarchives.org/OAI/2.0/}OAI-PMH

  
{http://www.openarchives.org/OAI/2.0/}responseDate
2017-12-14T08:43:21Z
{http://www.openarchives.org/OAI/2.0/}request
https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}ListRecords

    
{http://www.openarchives.org/OAI/2.0/}record

      
{http://www.openarchives.org/OAI/2.0/}header

        
{http://www.openarchives.org/OAI/2.0/}identifier
oai:zenodo.org:164231
{http://www.openarchives.org/OAI/2.0/}datestamp
2017-05-30T03:54:31Z
{http://www.openarchives.org/OAI/2.0/}setSpec
user-biosyslit
{http://www.openarchives.org/OAI/2.0/}metadata

        
{http://schema.datacite.org/oai/oai-1.0/}oai_datacite

          
{http://schema.datacite.org/oai/oai-1.0/}isReferenceQuality
true
{http://schema.datacite.org/oai/oai-1.0/}schemaVersion
3.1
{http://schema.datacite.org/oai/oai-1.0/}datacentreSymbol
CERN.ZENODO
{http://schema.datacite.org/oai/oai-1.0/}payload

            
{http://datacite.org/schema/kernel-3}resource

              


### GetRecord

This verb is used to retrieve an individual metadata record from a repository. Required arguments specify the identifier of the item from which the record is requested and the format of the metadata that should be included in the record. Depending on the level at which a repository tracks deletions, a header with a "deleted" value for the status attribute may be returned, in case the metadata format specified by the metadataPrefix is no longer available from the repository or from the specified item.

Obtener objeto digital de ese recurso

In [53]:
import xml.etree.ElementTree as ET
import urllib
#https://zenodo.org/oai2d?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:zenodo.org:252670
oai = requests.get('https://zenodo.org/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:zenodo.org:252977')
xmlTree = ET.ElementTree(ET.fromstring(oai.text))
iterator = xmlTree.iter()
for elem in iterator:
    print(elem.tag)
    print(elem.text)

{http://www.openarchives.org/OAI/2.0/}OAI-PMH

  
{http://www.openarchives.org/OAI/2.0/}responseDate
2017-12-14T08:59:39Z
{http://www.openarchives.org/OAI/2.0/}request
https://zenodo.org/oai2d
{http://www.openarchives.org/OAI/2.0/}GetRecord

    
{http://www.openarchives.org/OAI/2.0/}record

      
{http://www.openarchives.org/OAI/2.0/}header

        
{http://www.openarchives.org/OAI/2.0/}identifier
oai:zenodo.org:252977
{http://www.openarchives.org/OAI/2.0/}datestamp
2017-09-08T08:01:40Z
{http://www.openarchives.org/OAI/2.0/}setSpec
user-biosyslit
{http://www.openarchives.org/OAI/2.0/}metadata

        
{http://schema.datacite.org/oai/oai-1.0/}oai_datacite

          
{http://schema.datacite.org/oai/oai-1.0/}isReferenceQuality
true
{http://schema.datacite.org/oai/oai-1.0/}schemaVersion
3.1
{http://schema.datacite.org/oai/oai-1.0/}datacentreSymbol
CERN.ZENODO
{http://schema.datacite.org/oai/oai-1.0/}payload

            
{http://datacite.org/schema/kernel-3}resource

              
{h

In [56]:
import xml.etree.ElementTree as ET
#https://zenodo.org/oai2d?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:zenodo.org:252670
headers = {'Accept': 'application/rdf+xml;q=0.5'} #Type of response accpeted
r = requests.post('https://dx.doi.org/10.5281/zenodo.252363', headers=headers) #POST with headers
print(r.text)


<?xml version='1.0' encoding='utf-8' ?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:schema='http://schema.org/'>
<rdf:Description rdf:about='http://treatment.plazi.org/id/03F926428F14B14F30BBFB1A56F23C5B'>
<schema:citation>
<schema:ImageObject rdf:about='https://doi.org/10.5281/zenodo.252363'>
<schema:author>
<schema:Person rdf:nodeID='b2'>
<schema:familyName>Sääksjärvi</schema:familyName>

<schema:givenName>Ilari E.</schema:givenName>

<schema:name>Ilari E. Sääksjärvi</schema:name>

</schema:Person>

</schema:author>
 <schema:author>
<schema:Person rdf:nodeID='b0'>
<schema:familyName>Veijalainen</schema:familyName>

<schema:givenName>Anu</schema:givenName>

<schema:name>Anu Veijalainen</schema:name>

</schema:Person>

</schema:author>
 <schema:author>
<schema:Person rdf:nodeID='b1'>
<schema:familyName>Broad</schema:familyName>

<schema:givenName>Gavin R.</schema:givenName>

<schema:name>Gavin R. Broad</schema:name>

</schema:Person>

</schema:author>

<schem

# Ejercicio

* 1: Busca al menos dos repositorios digitales compatibles con OAI-PMH
* 2: Utiliza los verbos del protocolo para encontrar los recursos de un tema en concreto (puedes buscar en la descripción o en las palabras clave.
* 4: Obtén un listado de los títulos, identificadores (DOI o cualquier otro) y la dirección del recurso.
* 5: ¿Qué problemas has encontrado? ¿Has tenido que manejar más de un formato de metadatos? ¿Qué limitaciones tiene el protocolo OAI-PMH? ¿Cómo lo mejorarías?