# Querying catalogs of registries with the Linked Data Platform

We have integrated metadata about registries from four datasources but aligning them to a common schema that is being proposed by team working on EJP WP11 tasks 1.1. The registry data currently comes from the ERDRI, Orphanet and RD conncet catalog of registries and RD connect catalog of biobanks. The data has been harmonised based on an RDF model from the DCAT vocabulary so that we can query across the metadata using SPARQL. The data has been loaded into a Link Data Platform (LDP) instance running at http://training.fairdata.solutions/DAV/home/EJP_HACK/Jupp/ where the LDP model is used to represents individual datasets as containes that can be navigated using linked data principles (i.e. using stable URIs to access the data and a follow your nose apporach to finding linked datasets).  

In this notebook you will see how we can query the LDP data to answer some use-cases relating to discovery of patient registries across catalogues. First we'll connect to the SPARQL endpoint where the data resides.

In [2]:
from SPARQLWrapper import SPARQLWrapper, JSON

endpoint = 'http://training.fairdata.solutions/sparql'
sparql = SPARQLWrapper(endpoint)

prefixes = """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX fo: <http://www.w3.org/1999/XSL/Format#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX iao: <http://purl.obolibrary.org/obo/iao.owl#>
PREFIX schema: <http://schema.org/>
PREFIX sc: <http://purl.org/science/owl/sciencecommons/>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://purl.obolibrary.org/obo/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX ldp: <http://www.w3.org/ns/ldp#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ejp: <http://purl.org/ejp-rd/vocabulary/>
PREFIX orphanet: <http://www.orpha.net/ORDO/Orphanet_>

"""


## Use case sceanrio 1 

A user of the virtual platform (VP) wants to discover all patient registries for a particular disease in their country. I this example we will get the catalog name and registry name for all registries in France. 


In [4]:
uc1_query = prefixes+"""
SELECT DISTINCT ?catalog_title ?reg_title ?country
WHERE {

  <http://training.fairdata.solutions/DAV/home/EJP_HACK/Jupp/> ldp:contains ?catalog_container .

  GRAPH ?catalog_container {
    ?catalog a ejp:CatalogOfRegistries .
    ?catalog dct:title ?catalog_title .
    ?catalog dcat:dataset ?registry .
    ?catalog_container ldp:contains ?registry_container
  }
  GRAPH ?registry_container {
    ?registry a ?registry_type .
    FILTER (?registry_type IN (ejp:BiobankDataset, ejp:PatientRegistryDataset))
    ?registry dct:title ?reg_title .
    ?registry dcat:theme ?disease .
    ?registry dct:publisher [ dct:spatial [ ejp:country ?country] ]
    FILTER (?country = "France")

  }
}
"""

sparql.setQuery(uc1_query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()


for result in results["results"]["bindings"]:
    print("{}   {}  {}".format(result['catalog_title']['value'],result['reg_title']['value'],result['country']['value'] ))

European Directory of Registries   FPHR  France
RD-Connect Registry & Biobank Finder   The global aHUS Registry  France
RD-Connect Registry & Biobank Finder   French national registry for thrombotic microangiopathy  France
RD-Connect Registry & Biobank Finder   French registry of atypical child hemolytic uremic syndrome  France
RD-Connect Registry & Biobank Finder   French cystinosis registry  France
RD-Connect Registry & Biobank Finder   DM Scope (Myotonic dystrophy patient registry in France)  France
RD-Connect Registry & Biobank Finder   Leukofrance database and biobank  France
RD-Connect Registry & Biobank Finder   RaDiCo-COBBALT - French national nohort on Bardet-Biedl and Alström syndromes  France
RD-Connect Registry & Biobank Finder   RaDiCo-PP - French cohort for clinical, genetic and socio-economic study of Periodic Paralysis  France
RD-Connect Registry & Biobank Finder   RaDiCo-MPS - French national cohort on mucopolysaccharidosis in the era of specific therapeutics.  France


## Use case scenario 2

Find all registries that deal with rare pulmonary hypertension (Orphanet code Orphanet:71198)


In [5]:
uc2_query = prefixes+"""
SELECT DISTINCT ?catalog_title ?reg_title ?country 
WHERE {

  <http://training.fairdata.solutions/DAV/home/EJP_HACK/Jupp/> ldp:contains ?catalog_container .

  GRAPH ?catalog_container {
    ?catalog a ejp:CatalogOfRegistries .
    ?catalog dct:title ?catalog_title .
    ?catalog dcat:dataset ?registry .
    ?catalog_container ldp:contains ?registry_container
  }
  GRAPH ?registry_container {
    ?registry a ?registry_type .
    FILTER (?registry_type IN (ejp:BiobankDataset, ejp:PatientRegistryDataset))
    ?registry dct:title ?reg_title .
    ?registry dcat:theme ?disease .
    ?registry dct:publisher [ dct:spatial [ ejp:country ?country] ]
    FILTER (?disease = orphanet:71198)

  }

}

"""

sparql.setQuery(uc2_query)
sparql.setReturnFormat(JSON)
results = sparql.query().convert()


for result in results["results"]["bindings"]:
    print("{}   {}  {}".format(result['catalog_title']['value'],result['reg_title']['value'],result['country']['value'] ))

European Directory of Registries   FPHR  France
Orphanet   French registry of rare pulmonary hypertension (HTAP)  FRANCE
Orphanet   REHIPED - Spanish Registry for Pediatric Pulmonary Hypertension  SPAIN
