# SPARQL Playground

<a href="https://githubtocolab.com/gleanerio/archetype/blob/master/networks/commons/sparql.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.png" alt="Open in Colab"/></a>


## requirements.txt

In [None]:
!pip install -q minio
!pip install -q kglab

## imports

In [4]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)  ## remove pandas future warning
import kglab
from minio import Minio
from rdflib import Graph, plugin
import plotly.express as px
import pandas as pd
from urllib.request import urlopen
import os

In [5]:
def ensure_directory_exists(path):
    if not os.path.exists(path):
        os.makedirs(path)

def popper(input):
    lines = input.decode().split('\n') # Split input into separate lines
    modified_lines = []

    for line in lines:
        newline = line.replace("http://schema.org", "https://schema.org")
        segments = newline.split(' ')

        if len(segments) > 3:
            segments.pop()   # Remove the last two segment
            segments.pop()
            new_line = ' '.join(segments) + ' .'
            modified_lines.append(new_line)

    result_string = '\n'.join(modified_lines)

    return(result_string)

def publicurls(client, bucket, prefix):
    urls = []
    objects = client.list_objects(bucket, prefix=prefix, recursive=True)
    for obj in objects:
        result = client.stat_object(bucket, obj.object_name)

        if result.size > 0:  #  how to tell if an objet   obj.is_public  ?????
            url = client.presigned_get_object(bucket, obj.object_name)
            # print(f"Public URL for object: {url}")
            urls.append(url)

    return urls


In [16]:
# Check for using GPU, in case you want to ensure your GPU is used
# gc = kglab.get_gpu_count()
# print(gc)

In [6]:
# if you need to list the current URLs as a public S3, use something like this to get it

client = Minio("ossapi.oceaninfohub.org:80",  secure=False) # Create client with anonymous access.
urls = publicurls(client, "public", "graph")
for u in urls:
    print(u)

http://ossapi.oceaninfohub.org/public/graphs/summonedafricaioc_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedaquadocs_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedcioos_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonededmerp_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonededmo_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedemodnet_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinanodc_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinvemardocuments_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinvemarexperts_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinvemarinstitutions_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinvemartraining_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedinvemarvessels_v1_release.nq
http://ossapi.oceaninfohub.org/public/graphs/summonedmarinet

## URLs

At this point we have the URLs, and we could either loop load all of them or pull one out manually and use.  This code could
be used as a basis for any of these approaches.


In [8]:
dgurl = "http://ossapi.oceaninfohub.org/public/graphs/summonedcioos_v1_release.nq"
# df = urlopen(dgurl)
dg = urlopen(dgurl).read()
rp = popper(dg)

In [9]:
namespaces = {
    "sh":   "http://www.w3.org/ns/shacl#" ,
    "schema": "https://schema.org/"
}

kg = kglab.KnowledgeGraph(
    name = "Schema.org based datagraph",
    base_uri = "https://example.org/id/",
    namespaces = namespaces,
)

try:
    g = Graph().parse(data=rp, format='nt')
    r = g.serialize(format='nt')
    kg.load_rdf_text(r)
except Exception as e:
    print("Exception: {}\n --".format(str(e)))
    raise e

print("Graph loaded with {} triples".format(len(g)))

Graph loaded with 145779 triples


In [10]:
sparql = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>


SELECT ?p (COUNT(?p) as ?count)
WHERE
{
  ?s ?p ?o .
}
GROUP BY ?p ORDER BY DESC(?count)
"""

pdf = kg.query_as_df(sparql)
# df = pdf   # .to_pandas()  #  breaks with papermill for reasons unknown at this time if to_pandas() is used, needed in my kglab conda env


In [11]:
pdf.head()

Unnamed: 0,p,count
0,rdf:type,25425
1,schema:keywords,19507
2,schema:name,14406
3,schema:url,8505
4,schema:description,6168


In [12]:
sparql = """
PREFIX schema: <https://schema.org/>


SELECT ?s ?desc ?name
WHERE
{
 ?s rdf:type ?type
   FILTER ( ?type IN (schema:ResearchProject, schema:Project, schema:Organization, 
   schema:Dataset, schema:CreativeWork, schema:Person, schema:Map, schema:Course,
   schema:CourseInstance, schema:Event, schema:Vehicle) )
   ?s schema:description ?desc .
   ?s schema:name ?name

}
"""

pdf = kg.query_as_df(sparql)
# df = pdf   # .to_pandas()  #  breaks with papermill for reasons unknown at this time if to_pandas() is used, needed in my kglab conda env


In [13]:
pdf.head(20)

Unnamed: 0,s,desc,name
0,<https://catalogue.cioos.ca/dataset/c279e486-b...,The Sea-Bird SBE 63 Dissolved Oxygen Sensor 63...,Juan de Fuca Strait Oxygen Sensor Deployed 201...
1,<https://catalogue.cioos.ca/dataset/c279e486-b...,The Sea-Bird SBE 63 Dissolved Oxygen Sensor 63...,Juan de Fuca Strait Capteur d'Oxygène déployé ...
2,<https://catalogue.cioos.ca/dataset/c279e486-b...,Ce Sea-Bird SBE 63 Dissolved Oxygen Sensor 630...,Juan de Fuca Strait Oxygen Sensor Deployed 201...
3,<https://catalogue.cioos.ca/dataset/c279e486-b...,Ce Sea-Bird SBE 63 Dissolved Oxygen Sensor 630...,Juan de Fuca Strait Capteur d'Oxygène déployé ...
4,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Sofar Spotter 2 deployments on the south side ...,Température des vagues et de la surface de la ...
5,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Sofar Spotter 2 deployments on the south side ...,Wave and Sea Surface Temperature for Sable Isl...
6,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Déploiements de Sofar Spotter 2 du côté sud de...,Température des vagues et de la surface de la ...
7,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Déploiements de Sofar Spotter 2 du côté sud de...,Wave and Sea Surface Temperature for Sable Isl...
8,<https://catalogue.cioos.ca/dataset/a99dc2e5-2...,Municipal potable water intake sites from unde...,REKEAU Project - Drinking water intake sites (...
9,<https://catalogue.cioos.ca/dataset/c73dcc83-0...,The WET Labs ECO FLRT 3905 was deployed on 202...,Douglas Channel Fluorometer Deployed 2022-07-14


In [14]:
rq_pcount = """SELECT ?p (COUNT(?p) as ?pCount)
WHERE
{
  ?s ?p ?o .
}
GROUP BY ?p 
ORDER BY DESC(?count)
"""

pdf = kg.query_as_df(rq_pcount)
pdf.head()

Unnamed: 0,p,pCount
0,rdf:type,25425
1,schema:sameAs,3222
2,<http://www.w3.org/2006/vcard/ns#role>,2325
3,schema:keywords,19507
4,schema:provider,1515


In [15]:
rq_desc = """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?name ?desc
WHERE
{
  ?s <https://schema.org/name> ?name .
  ?s rdf:type <https://schema.org/Dataset> .
  ?s <https://schema.org/description> ?desc .
}
LIMIT 200
"""

pdf = kg.query_as_df(rq_desc)
pdf.head(10)

Unnamed: 0,s,name,desc
0,<https://catalogue.cioos.ca/dataset/c279e486-b...,Juan de Fuca Strait Oxygen Sensor Deployed 201...,The Sea-Bird SBE 63 Dissolved Oxygen Sensor 63...
1,<https://catalogue.cioos.ca/dataset/c279e486-b...,Juan de Fuca Strait Capteur d'Oxygène déployé ...,The Sea-Bird SBE 63 Dissolved Oxygen Sensor 63...
2,<https://catalogue.cioos.ca/dataset/c279e486-b...,Juan de Fuca Strait Oxygen Sensor Deployed 201...,Ce Sea-Bird SBE 63 Dissolved Oxygen Sensor 630...
3,<https://catalogue.cioos.ca/dataset/c279e486-b...,Juan de Fuca Strait Capteur d'Oxygène déployé ...,Ce Sea-Bird SBE 63 Dissolved Oxygen Sensor 630...
4,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Température des vagues et de la surface de la ...,Sofar Spotter 2 deployments on the south side ...
5,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Wave and Sea Surface Temperature for Sable Isl...,Sofar Spotter 2 deployments on the south side ...
6,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Température des vagues et de la surface de la ...,Déploiements de Sofar Spotter 2 du côté sud de...
7,<https://catalogue.cioos.ca/dataset/ca-cioos_f...,Wave and Sea Surface Temperature for Sable Isl...,Déploiements de Sofar Spotter 2 du côté sud de...
8,<https://catalogue.cioos.ca/dataset/a99dc2e5-2...,REKEAU Project - Drinking water intake sites (...,Municipal potable water intake sites from unde...
9,<https://catalogue.cioos.ca/dataset/c73dcc83-0...,Douglas Channel Fluorometer Deployed 2022-07-14,The WET Labs ECO FLRT 3905 was deployed on 202...
