Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


OAI-ORE support #999

kaplun opened this Issue · 1 comment

2 participants


Originally on 2012-04-10

Open Archives Initiative Object Reuse and Exchange defines standards for the description and exchange of aggregations of Web resources [...]

In Invenio aggregation might come from different sources:

  • the DB:
    • collections aggregate records
    • records links to documents
    • documents links to revisions of documents
    • revisions of document link to formats of documents
  • MARC 76x-78x fields (at CERN e.g. these are used to link a record to the official publication, or a conference with its talks, a talk with the contribution and the proceeding, etc. or can be used to link a photo to a photoshoot). (in OpenAIREplus datasets will be linked to publications).

Implementation details will be added as comments to this ticket.


Originally on 2013-03-08

OAI-ORE prototype notesUse cases of OAI-ORE:

  • Data exchange (primary):
    • CDS, Inspire, ADS, arXiv
    • OpenAIREplus Orphan Repository, OpenAIREplus repository.
  • Visualisation (secondary):
    • Enhanced publications: Browsing the archives via Firefox plugin (small showcase). Candidate aggregation examples:
  • General examples:
    • Collection aggregating:
      • Collections
      • Records
      • Feed
    • Record aggregating:
      • Metadata record (perhaps via OAI-PMH)
      • Authors
      • Documents (PDFs, Images, Videos, Audio)
      • Bibliographic descriptions: BibTex, MARC, MARCXML
      • Comments
      • External links
      • Similar to relationship: DOI, arXiv id
      • See also
    • Record citations (this could also just be added as a relationship to other resources)
      • Records
    • Record translations
      • Records
    • Documents aggregating:
      • Revisions
    • Revisions aggregating:
      • Formats
  • Specific examples:
    • Logs
    • Login information?
    • Photo shoot aggregating:
      • Photos (isn’t this the same as a record aggregating documents?)
    • Conference aggregating:
      • Contributions
      • Proceeding
      • Notes
      • Posters
      • Talks
      • Slides
    • Book aggregating:
      • Chapters
    • Periodical
      • Journals
        • Volumes -> Issues -> Record
    • OpenAIRE:
      • Funding scheme aggregating
        • Projects
          • Records (data, publications)
      • Publications aggregating
        • Data
        • Project(s)
        • Funding scheme
      • Data aggregating
        • Publications
        • Project(s)
        • Funding scheme
  • See also videos
  • Similar records

Abstract data model

  • Resource: anything of interest - resources are identified by HTTP URIs
    • Information resource: Any kind of document, image, video etc that when you access the URI get information back (i.e like we know the web).
    • Non-information resource: The HTTP URI doesn’t return information - just a name for a “real-world” object
  • Aggregation: a set of resources (a non-information resource).
  • Aggregated resource: a resource in an aggregation (which can be an aggregation). Important: Anything that should be in an aggregation, must have a URL (e.g project, funding scheme, etc)
  • Resource map: a description of one aggregation (i.e an information resource)
  • Proxy: used for ordering

HTTP implementation

  • Each URI defined in resource maps must resolve
  • Separate resource maps
    • Model 1:
      • http://foo/aggregation/a (aggregation - redirects with 303 via content negotiation)
      • http://foo/aggregation/a.html (resource)
      • http://foo/aggregation/a.rdf (resource map)
      • http://foo/aggregation/a.atom (resource map)
    • Model 2:
      • http://foo/aggregation/a.rdf#aggregation (aggregation)
      • http://foo/aggregation/a.html (resource)
      • http://foo/aggregation/a.rdf (resource map)
    • Pros:
      • Clear standalone resource map
    • Cons:
      • Redirects will degrade harvester performance
  • Embedded resource map via RDFa:
    • Model 3 (without redirect):
      • http://foo/aggregation/a.html#aggregation (aggregation)
      • http://foo/aggregation/a.html (resource map + resource)
    • Model 4 (with redirect):
      • http://foo/aggregation/a (aggregation)
      • http://foo/aggregation/a.html (resource + resource map)
    • Pros:
      • Resource map is embedded in splash page (no redirects needed)
    • Cons:
      • Size of HTML (perhaps with gzip compression it’s negligible).
      • Depending on size
      • Load issues during harvesting
  • Resource Map discovery:
    • Generate site map xml
    • Generate atom feed
    • Via OAI-PMH (could possibly avoid redirects from aggregation to resource map)
    • Insert link-tag in HTML


  • Inclusion of other relationships and metadata:
    • How much (see 4.5 Relationships to other Resources and Types)? Citation links, translations.
  • Exporting very large aggregations
  • HTTP implementation of OAI-ORE incompatible with Invenio URL scheme?
  • Efficiency of protocol
    • Redirects in resource map discovery
    • One aggregation per resource map (means lots of HTTP requests to harvest #records).
  • Enforcing structural constraints of aggregation graph

Relation with OAI-PMH

  • OAI-PMH be used to support resource map discovery
  • OAI-ORE can be used to include a link to a OAI-PMH metadata record

Integration in Invenio

  • URL Scheme + Data model
    • Anything that needs to be referenced from an aggregation needs a HTTP URI (there are ways to express relationships with other entities though).
    • The data model and URL scheme is tightly connected.
  • Resource Map generation framework:
    • Mapping of Invenio data to resource maps
    • Module for mapping anything in Invenio to the OAI-ORE data model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.