# Transforming metadata with RDFLib 

I've been asked to think about how to get our current Research Data Curation projects that exist in our digital collections into various other repositories and aggregators. Some of these are relatively easy to accomplish via "triggers", while others have a richer schema that differs significantly with our ontology.  

This notebook will experiment with ways to utilize the [Python library RDFLib](http://rdflib.readthedocs.io/en/stable/gettingstarted.html) (and a [json-ld extension](https://github.com/RDFLib/rdflib-jsonld) for it) to transform our data into the different target forms. The most common transform would likely be to serialize our data as JSON and/or JSON-LD, then from there some simple transformations of the elements. The first example will be to align with the forthcoming BIOCaddie schema.  

In our actual production environment, since we are a Hydra/Ruby shop, this will likely take place more with the RDF.rb gem, but since I am stronger with Python, it made more sense to play around using that. 

Let's first load up the rdflib libraries we'll need:

In [69]:
from rdflib import Graph, plugin
from rdflib.serializer import Serializer
import json

## Import the data
For the source data, I am working with data from our staging server, which is locked behind a login. So, the normal way of rdflib parsing a URL will throw 404/file not found errors. Luckily we can save the data from the DAMS as a Turtle (.ttl) file. Then we tell rdflib to parse this graph data as Turtle:

In [70]:
g = Graph().parse("datamares.ttl", format="turtle")

## Output (or serialize) the data

RDF data can typically be output, or serialized, in many different formats: RDF/XML, N-Triples, and JSON-LD, to name a few. The list of serializations will depend on the parser of the system you're working with, but rdflib will allow us to serialize the data ourselves. So let's serialize the existing Turtle data into JSON-LD.  

One caveat, though. Our Turtle declares namespaces at the head of the document for all the vocabularies used in the data. JSON-LD will also need that, which it contains within the `context` field. rdflib-jsonld provides guidance on that. Following that we can make the context:

In [71]:
context = {"mads": "http://www.loc.gov/mads/rdf/v1#", "damsid": "http://library.ucsd.edu/ark:/20775/", "owl": "http://www.w3.org/2002/07/owl#", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "dams": "http://library.ucsd.edu/ontology/dams#"}

So now we can serialize the data as JSON-LD, by adding the context variable as a parameter:

In [72]:
jason = print ( g.serialize(format='json-ld', context=context, indent=4) )

b'{\n    "@context": {\n        "dams": "http://library.ucsd.edu/ontology/dams#",\n        "damsid": "http://library.ucsd.edu/ark:/20775/",\n        "mads": "http://www.loc.gov/mads/rdf/v1#",\n        "owl": "http://www.w3.org/2002/07/owl#",\n        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n    },\n    "@graph": [\n        {\n            "@id": "_:ub12bL7C40",\n            "@type": "dams:Copyright",\n            "dams:copyrightJurisdiction": "US",\n            "dams:copyrightNote": "Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by \\"fair use\\" requires written permission of the UC Regents. Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.",\n            "dams:copyrightPurposeNote": "Use: This work is availabl

In [73]:
json.loads(str.[jason])

SyntaxError: invalid syntax (<ipython-input-73-5ddacf6520ce>, line 1)

## Interpreting and transforming the data

That's kind of ugly with all the newline characters. We could work within python for other solutions, but we don't have to. We can export the above data as JSON using the rdflib command-line tool. For reference, I ran the following:  

`$ rdfpipe -i turtle -o json-ld datamares.ttl > datamares.json`

The cool thing about having data in JSON-LD format is that it is still JSON. This means we can use the many tools that developers use on JSON data. One such tool is `jq`, which can query and transform JSON pretty easily. Let's load the exported json file from above:

In [None]:
DATA="datamares.json"

Now we can take a look at everything with the `.` operator from `jq`:

In [None]:
!cat $DATA | jq '.'