Convert Combine Archive (OMEX) to RO Bundle
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src
.gitignore
.travis.yml
LICENSE.md
README.md
pom.xml

README.md

ro-combine-archive

Build Status doi:10.5281/zenodo.10439

Convert Combine Archive (OMEX) to/from Research Object Bundle.

(c) University of Manchester 2014 http://www.mygrid.org.uk/

License: MIT License

This tool enrich/convert OMEX Combine Archives so that they are also valid RO Bundles.

This is achieved by parsing the OMEX manifest and creating the equivalent RO Bundle manifest (or vice versa), using the RO Bundle API.

Note that the actual implementation for parsing and generating the OMEX manifest is now a standard part of the RO Bundle API. For details, see the Java package org.purl.wf4ever.robundle.manifest.combine

Authors

Slides

Slides

Slides 2014-06-13

Mechanism: Parsing OMEX

OMEX Combine Archives have similar mechanism of describing the bundled resources in a manifest.

An OMEX archive can also be a valid RO Bundle, and an RO Bundle can also be an OMEX archive, simply by having both manifests included in the ZIP archive. (as long as you don't then modify the archive without updating both manifests!)

The conversion therefore parses the OMEX manifest using an XSD Schema, and creates equivalent entries in the RO Bundle manifest, copying over the format information to either mediatype or conformsTo (depending if it is an absolute URI).

When parsing the OMEX manifest.xml, location is interpreted as an unescaped path within the bundle (relative to the manifest.xml) Absolute URIs in location are supported, recorded as an uri aggregation in the RO bundle.

The base URI for files in the archive is generated using the app:// URI scheme according to the RO bundle specifications for absolute URIs. So this means URIs in files are parsed with a base URI like: app://5226267e-75b1-48c4-b9da-ea587023adda/manifest.xml - thus also allowing for slash-based URI references like / and /metadata.rdf.

In OMEX, metadata is always given in metadata.* files. We'll parse metadata.rdf, and look for any subjects that are aggregated by the archive (or is the archive itself), and then add metadata.rdf to annotations in the RO bundle manifest for each described file. (TODO: Support additional metadata files)

The metadata file is also examined for dcterms:created, dcterms:modified and dcterms:modified annotations, which are propagated into the RO Bundle manifest. The timestamp of affected files in the ZIP archive will also be set to the specified modified time from this metadata, if it exists.

Note that the RO Bundle mimetype file is not added, as an OMEX archive (and its specialization) can be viewed as a more application-specific specialization of RO Bundles, as allowed for in the RO bundle container specification. OMEX archives however do not currently have such a specific mimetype, just an extension varying with the application type (e.g. .sbex for SBML archive).

Building

Prerequisites:

To build, simply do mvn clean install:

stain@biggie:~/src/ro-combine-archive$ mvn clean install
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Combine Archive / RO interoperability 0.1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
...
[INFO] --- maven-install-plugin:2.4:install (default-install) @ ro-combine-archive ---
[INFO] Installing /home/stain/src/ro-combine-archive/target/ro-combine-archive-0.1.0-SNAPSHOT.jar to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT.jar
[INFO] Installing /home/stain/src/ro-combine-archive/pom.xml to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT.pom
[INFO] Installing /home/stain/src/ro-combine-archive/target/ro-combine-archive-0.1.0-SNAPSHOT-sources.jar to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT-sources.jar
[INFO] Installing /home/stain/src/ro-combine-archive/target/ro-combine-archive-0.1.0-SNAPSHOT-test-sources.jar to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT-test-sources.jar
[INFO] Installing /home/stain/src/ro-combine-archive/target/ro-combine-archive-0.1.0-SNAPSHOT-tests.jar to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT-tests.jar
[INFO] Installing /home/stain/src/ro-combine-archive/target/ro-combine-archive-0.1.0-SNAPSHOT-standalone.jar to /home/stain/.m2/repository/org/researchobject/ro-combine-archive/ro-combine-archive/0.1.0-SNAPSHOT/ro-combine-archive-0.1.0-SNAPSHOT-standalone.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 45.570s
[INFO] Finished at: Fri Apr 25 19:51:41 BST 2014
[INFO] Final Memory: 41M/303M
[INFO] ------------------------------------------------------------------------

Downloading

Alternatively you can download the ro-combine-archive 0.1.0 standalone.

Note that this download might not reflect the latest features as when building from source.

Executing

To execute from the self-contained JAR, try:

java -jar target/ro-combine-archive-0.1.0-SNAPSHOT-standalone.jar [omex-file ...]

Note that the given OMEX Combine Archive will be updated in-place to also include .ro/manifest.json, making it an RO bundle.

Example

This examples copies the example Boris.omex file to target (because we'll change it), then executes the ro-combine-archive converter:

stain@biggie:~/src/ro-combine-archive$ cp src/test/resources/Boris.omex target

stain@biggie:~/src/ro-combine-archive$ java -jar target/ro-combine-archive-0.1.0-SNAPSHOT-standalone.jar target/Boris.omex 
log4j:WARN No appenders could be found for logger (org.apache.jena.riot.stream.JenaIOEnvironment).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Apr 25, 2014 7:56:26 PM org.purl.wf4ever.robundle.utils.RDFUtils literalAsFileTime
INFO: Literal not an XSDDateTime, but: class java.lang.String 2013-05-28T17:50:43.999+01:00

Inspecting the modified archive will reveal the added .ro/manifest.json:

stain@biggie:~/src/ro-combine-archive$ cd target/
stain@biggie:~/src/ro-combine-archive/target$ unzip Boris.omex
Archive:  Boris.omex
  inflating: BorisEJB.xml            
  inflating: manifest.xml            
  inflating: metadata.rdf            
   creating: paper/
  inflating: paper/Kholodenko2000.pdf  
   creating: .ro/
  inflating: .ro/manifest.json       

Inspecting .ro/manifest.json:

    {
      "@context" : [ "https://w3id.org/bundle/context" ],
      "id" : "/",
      "manifest" : [ "/.ro/manifest.json", "/manifest.xml" ],
      "createdOn" : "2013-05-28T16:50:43.999Z",
      "aggregates" : [ {
        "file" : "/paper/Kholodenko2000.pdf",
        "folder" : "/paper/",
        "mediatype" : "application/pdf",
        "createdOn" : "2013-04-05T08:16:08Z",
        "proxy" : "urn:uuid:ca91dfa3-729b-4494-a059-b73b9b3c4261"
      }, {
        "file" : "/metadata.rdf",
        "folder" : "/",
        "createdOn" : "2014-04-25T18:17:00Z",
        "conformsTo" : "http://identifiers.org/combine.specifications/omex-metadata",
        "proxy" : "urn:uuid:850812ed-88d5-4999-9675-8257355f6e3e"
      }, {
        "file" : "/BorisEJB.xml",
        "folder" : "/",
        "createdOn" : "2012-10-29T10:58:38Z",
        "conformsTo" : "http://identifiers.org/combine.specifications/sbml",
        "proxy" : "urn:uuid:054c9840-0283-4a3c-8319-b0e24b6593b4"
      }, {
        "uri" : "http://www.ebi.ac.uk/biomodels-main/BIOMD0000000010",
        "conformsTo" : "http://identifiers.org/combine.specifications/sbml"
      } ],
      "annotations" : [ {
        "about" : "/",
        "content" : "/metadata.rdf"
      } ]
    }