# Transforming metadata with RDFLib 

I've been asked to think about how to get our current Research Data Curation projects that exist in our digital collections into various other repositories and aggregators. Some of these are relatively easy to accomplish via "triggers", while others have a richer schema that differs significantly with our ontology.  

This notebook will experiment with ways to utilize the [Python library RDFLib](http://rdflib.readthedocs.io/en/stable/gettingstarted.html) (and a [json-ld extension](https://github.com/RDFLib/rdflib-jsonld) for it) to transform our data into the different target forms. The most common transform would likely be to serialize our data as JSON and/or JSON-LD, then from there some simple transformations of the elements. The first example will be to align with the forthcoming BIOCaddie schema.  

In our actual production environment, since we are a Hydra/Ruby shop, this will likely take place more with the RDF.rb gem, but since I am stronger with Python, it made more sense to play around using that. 

Let's first load up the rdflib libraries we'll need:

In [97]:
from rdflib import Graph, plugin
from rdflib.serializer import Serializer

## Import the data
For the source data, I am working with data from our staging server, which is locked behind a login. So, the normal way of rdflib parsing a URL will throw 404/file not found errors. Luckily we can save the data from the DAMS as a Turtle (.ttl) file. Then we tell rdflib to parse this graph data as Turtle:

In [98]:
g = Graph().parse("datamares.ttl", format="turtle")

## Output (or serialize) the data

RDF data can typically be output, or serialized, in many different formats: RDF/XML, N-Triples, and JSON-LD, to name a few. The list of serializations will depend on the parser of the system you're working with, but rdflib will allow us to serialize the data ourselves. So let's serialize the existing Turtle data into JSON-LD:

In [99]:
print ( g.serialize(format='json-ld', indent=4) )

b'[\n    {\n        "@id": "_:ub30bL54C40",\n        "@type": [\n            "http://library.ucsd.edu/ontology/dams#Relationship"\n        ],\n        "http://library.ucsd.edu/ontology/dams#personalName": [\n            {\n                "@id": "http://library.ucsd.edu/ark:/20775/bd9879500h"\n            }\n        ],\n        "http://library.ucsd.edu/ontology/dams#role": [\n            {\n                "@id": "http://library.ucsd.edu/ark:/20775/bb5086960s"\n            }\n        ]\n    },\n    {\n        "@id": "_:ub30bL7C40",\n        "@type": [\n            "http://library.ucsd.edu/ontology/dams#Copyright"\n        ],\n        "http://library.ucsd.edu/ontology/dams#copyrightJurisdiction": [\n            {\n                "@value": "US"\n            }\n        ],\n        "http://library.ucsd.edu/ontology/dams#copyrightNote": [\n            {\n                "@value": "Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work be

## Interpreting and transforming the data

That's kind of ugly with all the newline characters. We could work within python for other solutions, but we don't have to. We can export the above data as JSON using the rdflib command-line tool. For reference, I ran the following:  

`$ rdfpipe -i turtle -o json-ld datamares.ttl > datamares.json`

The cool thing about having data in JSON-LD format is that it is still JSON. This means we can use the many tools that developers use on JSON data. One such tool is `jq`, which can query and transform JSON pretty easily. Let's load the exported json file from above:

In [100]:
DATA="datamares.json"

Now we can take a look at everything with the `.` operator from `jq`:

In [101]:
!cat $DATA | jq '.'

[1;39m[
  [1;39m{
    [0m[34;1m"@graph"[0m[1;39m: [0m[1;39m[
      [1;39m{
        [0m[34;1m"@id"[0m[1;39m: [0m[0;32m"_:ub1bL58C40"[0m[1;39m,
        [0m[34;1m"@type"[0m[1;39m: [0m[1;39m[
          [0;32m"http://library.ucsd.edu/ontology/dams#Relationship"[0m[1;39m
        [1;39m][0m[1;39m,
        [0m[34;1m"http://library.ucsd.edu/ontology/dams#personalName"[0m[1;39m: [0m[1;39m[
          [1;39m{
            [0m[34;1m"@id"[0m[1;39m: [0m[0;32m"http://library.ucsd.edu/ark:/20775/bd9879500h"[0m[1;39m
          [1;39m}[0m[1;39m
        [1;39m][0m[1;39m,
        [0m[34;1m"http://library.ucsd.edu/ontology/dams#role"[0m[1;39m: [0m[1;39m[
          [1;39m{
            [0m[34;1m"@id"[0m[1;39m: [0m[0;32m"http://library.ucsd.edu/ark:/20775/bb1673895k"[0m[1;39m
          [1;39m}[0m[1;39m
        [1;39m][0m[1;39m
      [1;39m}[0m[1;39m,
      [1;39m{
        [0m[34;1m"@id"[0m[1;39m: [0m[0;32m"_:ub1