<a href="https://colab.research.google.com/github/momo54/Sage-Jupy/blob/main/Sage_Jupy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Running SaGe in a Jupyter Notebook

Sage is a SPARQL query engine for public Linked Data providers that implements Web preemption. The SPARQL engine includes a smart Sage client and a Sage SPARQL query server hosting RDF datasets (hosted using PostgreSQL or HDT). SPARQL queries are suspended by the web server after a fixed quantum of time and resumed upon client request. Using Web preemption, Sage ensures stable response times for query execution and completeness of results under high load.

The complete approach and experimental results are available in a Research paper accepted at The Web Conference 2019, available here. Thomas Minier, Hala Skaf-Molli and Pascal Molli. "SaGe: Web Preemption for Public SPARQL Query services" in Proceedings of the 2019 World Wide Web Conference (WWW'19), San Francisco, USA, May 13-17, 2019.

We appreciate your feedback/comments/questions to be sent to our mailing list or our issue tracker on github.

## Installation

We install SaGe just with the HDT backend. There are other backend to store and update data, but not supported directly in Jupyter Notebook.

In [None]:
!python --version

Python 3.7.10


In [None]:
!pip install sage-engine
!pip install pybind11
!pip install hdt


Collecting hdt
  Using cached https://files.pythonhosted.org/packages/51/82/41f1e4a131881da64a1ab2c4675dd93020a1a7109be08a2eb790cb6b92c6/hdt-2.3.tar.gz
Collecting pybind11==2.2.4
  Using cached https://files.pythonhosted.org/packages/f2/7c/e71995e59e108799800cb0fce6c4b4927914d7eada0723dd20bae3b51786/pybind11-2.2.4-py2.py3-none-any.whl
Building wheels for collected packages: hdt
  Building wheel for hdt (setup.py) ... [?25l[?25hdone
  Created wheel for hdt: filename=hdt-2.3-cp37-cp37m-linux_x86_64.whl size=5268840 sha256=9752e9098d221cd504e2c81878c2dabdea1e286d7453a087fb5cef805d2a0f1e
  Stored in directory: /root/.cache/pip/wheels/c6/64/28/ee2f54a78b64368f3e633637a0707549ba7a6e1c30079d966b
Successfully built hdt
Installing collected packages: pybind11, hdt
  Found existing installation: pybind11 2.6.2
    Uninstalling pybind11-2.6.2:
      Successfully uninstalled pybind11-2.6.2
Successfully installed hdt-2.3 pybind11-2.2.4


## Configuration



We need a dataset and to configure the server to use this dataset.


*   config.yaml is a simple configuration file for SaGe


1.   Quantum is fixed to 75ms 
2.   max_results=2000


*   swdf.hdt is the 'semantic web dog foord ' dataset in the HDT format. SaGe can use HDT file, or PostGres Backend or a SQLlite backend... HDT is nice when running in a Jupyter Netbooks.




In [None]:
!wget http://gaia.infor.uva.es/hdt/swdf-2012-11-28.hdt.gz
!gunzip -f swdf-2012-11-28.hdt.gz
## just a config.yaml on my gdrive.
!wget -q "https://drive.google.com/uc?id=1wrg-vO8DNe5Cf7WWn5mmf4GsaF325Y8u&authuser=0&export=download" -O config.yaml
!cat config.yaml

--2021-04-19 18:56:10--  http://gaia.infor.uva.es/hdt/swdf-2012-11-28.hdt.gz
Resolving gaia.infor.uva.es (gaia.infor.uva.es)... 157.88.123.104
Connecting to gaia.infor.uva.es (gaia.infor.uva.es)|157.88.123.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2403825 (2.3M) [application/x-gzip]
Saving to: ‘swdf-2012-11-28.hdt.gz’


2021-04-19 18:56:12 (1.92 MB/s) - ‘swdf-2012-11-28.hdt.gz’ saved [2403825/2403825]

name: SaGe Test server
maintainer: Chuck Norris
quota: 75
max_results: 2000
default_graph_uri: http://localhost:8000/sparql/part
graphs:
-
  name: swdf
  uri: http://example.org/swdf
  description: DBPedia
  backend: hdt-file
  file: swdf-2012-11-28.hdt
 


## Starting the server

The SaGe server is started with 2 workers, a quantum of 75ms and maxpage size of 2000 results

In [None]:
%%bash --bg --out script_out
sage config.yaml -p 8000 -w 2 -h "0.0.0.0" > server_out

Starting job # 0 in a separate thread.


In [None]:
## print server output
!tail server_out

Test if the SaGe Server is running. You should see ""The SaGe SPARQL query server is running!"

---



In [None]:
## just testing the server is running...
!curl http://0.0.0.0:8000

"The SaGe SPARQL query server is running!"

## Running queries

As a web server, SaGe can be queryied in any language. 
Below, we show how to do that in Python (as we are in Jupyter Notebook). We also provide a JS client and JENA client.

Just Call the SaGe server for only one quantum. The server interupt the query after a quantum exhausted or the max results reached.

In [None]:
import requests
from json import dumps     

###
query='select * where {?s ?p ?o}'
####

entrypoint='http://0.0.0.0:8000/sparql'
default_graph_uri='http://example.org/swdf'
headers = {"accept": "text/html",
        "content-type": "application/json",
        "next": None}
payload = {"query": query,
        "defaultGraph": default_graph_uri}
has_next = True                                                                                                                                         
count = 0                                                                                                                                               
nbResults = 0                                                                                                                                           
nbCalls = 0  
limit = 10

## call the server
response = requests.post(entrypoint, headers=headers, data=dumps(payload))

## the results
json_response = response.json() 
nbResults += len(json_response['bindings'])
print(f'got:{nbResults}')

## print some results
for bindings in json_response['bindings']:
    print(str(bindings))
    count += 1
    if count >= limit:
      break

## the link to continue the execution
has_next = json_response['next']                                                                                                                    
payload["next"] = json_response["next"]
nbCalls += 1

print(f'and the next link is {json_response["next"]}')

got:2000
{'?s': '_:b1', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_1', '?o': 'http://data.semanticweb.org/person/barry-norton'}
{'?s': '_:b1', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_2', '?o': 'http://data.semanticweb.org/person/reto-krummenacher'}
{'?s': '_:b1', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '?o': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq'}
{'?s': '_:b10', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_1', '?o': 'http://data.semanticweb.org/person/robert-isele'}
{'?s': '_:b10', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_2', '?o': 'http://data.semanticweb.org/person/anja-jentzsch'}
{'?s': '_:b10', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_3', '?o': 'http://data.semanticweb.org/person/christian-bizer'}
{'?s': '_:b10', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '?o': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq'}
{'?s': '_:b11', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_1', '?o': 'http://da

We can decode the value of the next link
As you can see. The next link contain the state of the suspended query

In [None]:
from sage.http_server.utils import decode_saved_plan, encode_saved_plan
from sage.query_engine.protobuf.iterators_pb2 import (RootTree,SavedProjectionIterator,SavedScanIterator)
next_link=json_response["next"]
print(f'the next link {next_link} contains')
if next_link is not None:
  saved_plan = next_link
  plan = decode_saved_plan(saved_plan)
  root = RootTree()
  root.ParseFromString(plan)
  print(root)

the next link EksSSQolCgI/cxICP3AaAj9vIhdodHRwOi8vZXhhbXBsZS5vcmcvc3dkZiIEMjAwMCoaMjAyMS0wNC0xOVQxODo1Njo1Ny41MTk3MzY= contains
proj_source {
  scan_source {
    pattern {
      subject: "?s"
      predicate: "?p"
      object: "?o"
      graph: "http://example.org/swdf"
    }
    last_read: "2000"
    timestamp: "2021-04-19T18:56:57.519736"
  }
}



If you have understood. Sending the next link back to server allow to restart the query from where it has been stopped. Basically, it works as next/next/next until no more results are available...

In [None]:
if has_next :
  response = requests.post(entrypoint, headers=headers, data=dumps(payload))
  json_response = response.json()                                                                                                                     
  has_next = json_response['next']                                                                                                                    
  payload["next"] = json_response["next"]
  nbResults += len(json_response['bindings'])
  nbCalls += 1
  count=0
  for bindings in json_response['bindings']:
    print(str(bindings))
    count += 1
    if count >= limit:
      break


{'?s': '_:genid164', '?p': 'http://www.w3.org/2002/12/cal/ical#tzid', '?o': '"GMT+1"'}
{'?s': '_:genid164', '?p': 'http://www.w3.org/2002/12/cal/ical#tzid', '?o': '"GMT+8"'}
{'?s': '_:genid164', '?p': 'http://www.w3.org/2003/01/geo/wgs84_pos#lat', '?o': '"48.957911"'}
{'?s': '_:genid164', '?p': 'http://www.w3.org/2003/01/geo/wgs84_pos#long', '?o': '"2.339648"'}
{'?s': '_:genid165', '?p': 'http://data.semanticweb.org/ns/swc/ontology#isRoleAt', '?o': 'http://data.semanticweb.org/conference/eswc/2007'}
{'?s': '_:genid165', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_1', '?o': 'http://data.semanticweb.org/person/danushka-bollegala'}
{'?s': '_:genid165', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_2', '?o': 'http://data.semanticweb.org/person/yutaka-matsuo'}
{'?s': '_:genid165', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#_3', '?o': 'http://data.semanticweb.org/person/mitsuru-ishizuka'}
{'?s': '_:genid165', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '?o': '

If we decode the next link again, we can see

In [None]:
from sage.http_server.utils import decode_saved_plan, encode_saved_plan
from sage.query_engine.protobuf.iterators_pb2 import (RootTree,SavedProjectionIterator,SavedScanIterator)
next_link=json_response["next"]
print(f'the next link {next_link} contains')
if next_link is not None:
  saved_plan = next_link
  plan = decode_saved_plan(saved_plan)
  root = RootTree()
  root.ParseFromString(plan)
  print(root)

the next link EksSSQolCgI/cxICP3AaAj9vIhdodHRwOi8vZXhhbXBsZS5vcmcvc3dkZiIENDAwMCoaMjAyMS0wNC0xOVQxODo1Njo1Ny41MTk3MzY= contains
proj_source {
  scan_source {
    pattern {
      subject: "?s"
      predicate: "?p"
      object: "?o"
      graph: "http://example.org/swdf"
    }
    last_read: "4000"
    timestamp: "2021-04-19T18:56:57.519736"
  }
}



Well, now we iterate until the end

In [None]:
while has_next :
  response = requests.post(entrypoint, headers=headers, data=dumps(payload))
  json_response = response.json()                                                                                                                     
  has_next = json_response['next']                                                                                                                    
  payload["next"] = json_response["next"]
  nbResults += len(json_response['bindings'])
  nbCalls += 1

## print some bindings...
count=0
for bindings in json_response['bindings']:
  print(str(bindings))
  count += 1
  if count >= limit:
    break

print(f'got {nbResults} results')
print(f'made {nbCalls} calls')

{'?s': 'http://data.semanticweb.org/person/bernhard-thalheim', '?p': 'http://xmlns.com/foaf/0.1/homepage', '?o': 'http://www.is.informatik.uni-kiel.de/~thalheim'}
{'?s': 'http://data.semanticweb.org/person/bernhard-thalheim', '?p': 'http://xmlns.com/foaf/0.1/name', '?o': '"Bernhard Thalheim"'}
{'?s': 'http://data.semanticweb.org/person/bill-mcdaniel', '?p': 'http://data.semanticweb.org/ns/swc/ontology#affiliation', '?o': '"Digital Enterprise Research Institute, National University of Ireland, Galway"'}
{'?s': 'http://data.semanticweb.org/person/bill-mcdaniel', '?p': 'http://data.semanticweb.org/ns/swc/ontology#holdsRole', '?o': '_:genid234'}
{'?s': 'http://data.semanticweb.org/person/bill-mcdaniel', '?p': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', '?o': 'http://xmlns.com/foaf/0.1/Person'}
{'?s': 'http://data.semanticweb.org/person/bill-mcdaniel', '?p': 'http://www.w3.org/2000/01/rdf-schema#label', '?o': '"Bill McDaniel"'}
{'?s': 'http://data.semanticweb.org/person/bill-mcdaniel