# Metadata access with JSON-LD example

This is a demonstartion of how you can use JSON-LD to standardise the presentation of metadata that is independent of how the data is serialised in a JSON document. 

For an intro to JSON-LD see here https://json-ld.org. If you thought the Semantic Web is just some academic fantasy, check out how all the search engines use JSON-LD to extract metadata from the web to build their knowledge graphs https://developers.google.com/search/docs/guides/intro-structured-data. JSON-LD essentially gives you a mechanisms to assign semantics to concepts in a JSON document. We can use ontologies as a mechanism to precisely define those semantics. 

Let’s take "Organ" as a concept. We know that in HCA this is an important concept. Despite all the revisions of the HCA schema, the concept of organ has remained the same, even if where it sits in a JSON hierarchy or which schema was defined in has changed. JSON-LD gives us a mechanism to describe a concept like “Organ” independently of how we choose to represent it in a JSON document. Another way to think about it is that we can assign stable ids to fields in the schema, the field names may change even when the concept id stays the same. 

Here’s some examples. We use a new reserved JSON-LD keyword to define a context for our JSON. The context is used to map fields to ontology terms. Here was say organ maps to the UBERON ontology concept for organ. Here’s a simplified bit of sample json. 

In [None]:
sample_v1 = {
"@context" : {
   "organ" : "http://purl.obolibrary.org/obo/UBERON_0000062"
 },
 "biomaterial_id": "Specimen_PBMC2",
 "ncbi_taxon_id": 9606,
 "organ": "blood"
}


I can use a standard JSON-LD query language called SPARQL to extract the value out of this JSON document based on the ontology and not the field name. Here’s my SPARQL query 

In [None]:
get_organ_query = "SELECT ?organ WHERE { ?s <http://purl.obolibrary.org/obo/UBERON_0000062> ?organ}"


We use the RDFlib package to load the JSON-JD and query it. 

In [None]:
from rdflib.plugin import register, Parser
from rdflib import Graph, ConjunctiveGraph
import json

register('application/ld+json', Parser, 'rdflib_jsonld.parser', 'JsonLDParser')

graph = ConjunctiveGraph()

graph.parse(data=json.dumps(sample_v1), format="json-ld")

qres = graph.query(get_organ_query)

for row in qres:
    print("%s" % row)

This returns “blood” as a value. 

Now let’s consider a new version of samples where we moved the field. Here’s we’ve nested it and put the value in a field called text, much like it is in the HCA schema. 

In [None]:
sample_v2 = {
 "@context" : {
    "organ" : "@nest",
    "text" : "http://purl.obolibrary.org/obo/UBERON_0000062"
  },
  "biomaterial_id": "Specimen_PBMC2",
  "ncbi_taxon_id": 9606,
  "organ": {
   "text" : "blood"
  }
}

Now the structure of the documents has changed and we’ve updated the context to reflect that. However, I can execute exactly the same query on this document by asking for the “organ” concept. 

In [None]:
graph = ConjunctiveGraph()

graph.parse(data=json.dumps(sample_v2), format="json-ld")

qres = graph.query("SELECT ?organ WHERE { ?s <http://purl.obolibrary.org/obo/UBERON_0000062> ?organ}")

for row in qres:
    print("%s" % row)

This return “blood” as before. This shows how JSON-LD can provide a consistent way to access concepts in our metadata schema without consumers having to be bothered with how it is structure in JSON. We can rearrange the JSON, whilst preserving the semantics. 