# Human Cell Atlas SPARQL

## Download/Start blazegraph store

Get the graph store from https://www.blazegraph.com/download/. Save the jar to the directory containing this notebook.

You can then start the graph store in a terminal using the command below.

In [11]:
#java -server -Xmx4g -jar blazegraph.jar

## Add some ontologies to graph store

Which minimum ontologies are needed to work with the metadata? What are the risks of adding more ontologies to the store?

In [7]:
from pymantic import sparql
import os 

server = sparql.SPARQLServer('http://127.0.0.1:9999/blazegraph/sparql')
dir_path = os.getcwd() # We need the absolute path to load local files into the store

In [9]:

# Download the EFO OWL

!wget http://www.ebi.ac.uk/efo/efo.owl -O efo.owl


# Load the OWL to the graph store
server.update("load <file://{}/{}>".format(dir_path, "efo.owl"))

# Executing query to show we have successfully added some triplets
result = server.query('select * where {?s ?p ?o }')
for b in result['results']['bindings']:
    print('s:', b['s']['value'])
    print('o:', b['o'])
    print('p:', b['p'])
    break

--2018-03-21 21:21:52--  http://www.ebi.ac.uk/efo/efo.owl
Resolving www.ebi.ac.uk (www.ebi.ac.uk)... 193.62.193.80
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.193.80|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.ebi.ac.uk/efo/efo.owl [following]
--2018-03-21 21:21:53--  https://www.ebi.ac.uk/efo/efo.owl
Connecting to www.ebi.ac.uk (www.ebi.ac.uk)|193.62.193.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 54055133 (52M)
Saving to: ‘efo.owl’


2018-03-21 21:22:08 (3.65 MB/s) - ‘efo.owl’ saved [54055133/54055133]

('s:', u'http://purl.obolibrary.org/obo/BTO_0000551')
('o:', {u'type': u'literal', u'value': u'Cancer cell of the major organ of respiration the lung.'})
('p:', {u'type': u'uri', u'value': u'http://purl.obolibrary.org/obo/IAO_0000115'})


In [25]:
# load other ontologies?


## Download some existing metadata examples

https://github.com/HumanCellAtlas/metadata-schema/blob/master/examples/bundles/v5/GlioblastomaSS2/quakeGlioblastoma1_biomaterial_bundle_0.json

We'll get a bundle from the DSS.

In [17]:
import requests
DSS_URL = "https://dss.dev.data.humancellatlas.org/v1"
bundle_uuid = "4be0071d-b36e-4414-a7ee-7b879f60be7c"
r = requests.get("{}/bundles/{}?replica=aws".format(DSS_URL, bundle_uuid))
bundle = r.json()
print(bundle['bundle']['uuid'])

4be0071d-b36e-4414-a7ee-7b879f60be7c


## Modify the metadata to provide JSON-LD @context

Perhaps flatten the resulting files or bundle JSON? http://json-ld.org/spec/latest/json-ld/#flattened-document-form

In [18]:
import bundle_to_rdf
file_name = bundle_to_rdf.bundle_to_rdf(bundle)
print(file_name)

application/json; dcp-type="metadata/project"
application/json; dcp-type="metadata/biomaterial"
application/json; dcp-type="metadata/file"
application/json; dcp-type="metadata/process"
application/json; dcp-type="metadata/protocol"
application/json; dcp-type="metadata/links"
Wrote file: 4be0071d-b36e-4414-a7ee-7b879f60be7c.ttl
4be0071d-b36e-4414-a7ee-7b879f60be7c.ttl


## Load RDF into Graph Store

In [19]:
server.update("load <file://{}/{}>".format(dir_path, file_name))

({'content-length': '452',
  'content-type': 'text/html; charset=UTF-8',
  'server': 'Jetty(9.2.z-SNAPSHOT)',
  'status': '200'},
 '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html><head><meta http-equiv="Content-Type" content="text&#47;html;charset=UTF-8"><title>blazegraph&trade; by SYSTAP</title\n></head\n><body<p>totalElapsed=7ms, elapsed=7ms, connFlush=0ms, batchResolve=0, whereClause=0ms, deleteClause=0ms, insertClause=0ms</p\n><hr><p>COMMIT: totalElapsed=106ms, commitTime=1521702165291, mutationCount=222</p\n></html\n>')

## Demonstrate getting back bundle data

In [20]:
query = """# Get organ type for bundles

PREFIX hca:<http://rdf.data.humancellatlas.org/>
PREFIX bundle:<https://schema.humancellatlas.org/bundle/5.1.0/>

SELECT distinct ?bioid ?name ?tissue ?uberon WHERE {
  ?bundle a bundle:biomaterial ;
          hca:biomaterials
            [hca:content
             [hca:organ
               [hca:text ?tissue ;
                 hca:ontology ?uberon]]] .
  ?bundle a bundle:biomaterial ;
          hca:biomaterials
            [hca:content
              [hca:biomaterial_core
               [hca:biomaterial_id ?bioid ; hca:biomaterial_name ?name]]]

}

"""
result = server.query(query)
print(result['results']['bindings'])

[{u'uberon': {u'type': u'uri', u'value': u'http://purl.obolibrary.org/obo/UBERON_0000955'}, u'tissue': {u'type': u'literal', u'value': u'brain'}, u'bioid': {u'type': u'literal', u'value': u'GSM2243439'}, u'name': {u'type': u'literal', u'value': u'Single cell from Tumor,1001000173.G8'}}]


## Use ontological reasoning to verify tissue location

Tissue sites and organs are hierarchically related. Use the ontology to answer the question of whether the local tissue location is part of the organ, or to discover the possible "facets" for a given organ.

## Render into a pageable table

A table that shows a selected set of attributes for files in the store.

## Move above process to a lambda

With a standing graph service, set up a lambda that indexes the bundles into a graph store. The pattern follows the [dss-azul-indexer](https://github.com/BD2KGenomics/dss-azul-indexer), which subscribes to the blue box for changes.