# SPARQL Endpoint Connector

This workbook connects to the _DBpedia_ endpoint through SPARQL and is the attempt to create an output list of all wars in order by the time they were fought. The eventual goa is to also analyze the circumstances, social, economic, natural and otherwise, surrounding these events to build a predictive framework. 

In [12]:
# Imports
import numpy as np
from numpy import random as rnd
from matplotlib import pyplot as plt
import sys,os,time,datetime,warnings,math,itertools

import pandas as pd

from plotly import express as px

from gastrodon import RemoteEndpoint,QName,ttl,URIRef,inline

# Config
%load_ext autotime
sys.path.append('../..')

The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 415 µs (started: 2023-02-12 22:57:31 +01:00)


In [16]:
prefixes = inline('''
    @prefix : <http://dbpedia.org/resource/> .
    @prefix on: <http://dbpedia.org/ontology/> .
    @prefix pr: <http://dbpedia.org/property/> .
''').graph

time: 700 µs (started: 2023-02-12 23:01:56 +01:00)


In [17]:
endpoint = RemoteEndpoint(
    'http://dbpedia.org/sparql/',
    default_graph='http://dbpedia.org',
    prefixes=prefixes,
    base_uri='http://dbpedia.org/resources/'
)

time: 287 µs (started: 2023-02-12 23:01:57 +01:00)


In [18]:
count = endpoint.select('''
    SELECT  (COUNT(*) AS ?count) {?s ?p ?o}
''')

time: 12.1 s (started: 2023-02-12 23:02:41 +01:00)


In [19]:
count

Unnamed: 0,count
0,1141462733


time: 9.12 ms (started: 2023-02-12 23:02:55 +01:00)


In [20]:
predicates = endpoint.select('''
    SELECT ?p (SUM(1) as ?count)
    {?s ?p ?o .}
    GROUP BY ?p
    ORDER BY DESC(?count)
''')

time: 32.1 s (started: 2023-02-12 23:31:57 +01:00)


In [21]:
predicates

Unnamed: 0_level_0,count
p,Unnamed: 1_level_1
on:wikiPageWikiLink,254069466
rdf:type,146346566
rdfs:label,60428538
owl:sameAs,52572903
rdfs:comment,46587083
...,...
http://purl.org/dc/terms/title,1
http://purl.org/dc/terms/creator,1
pr:party8name,1
pr:rushLeader,1


time: 6.76 ms (started: 2023-02-12 23:32:34 +01:00)


In [28]:
subjects = endpoint.select('''
    SELECT ?s (SUM(1) as ?count)
    {?s ?p ?o .}
    GROUP BY ?s
    ORDER BY DESC(?count)
''').reset_index()

time: 877 ms (started: 2023-02-12 23:37:30 +01:00)


In [48]:
wars = endpoint.select('''
    SELECT ?s ?p ?o
    WHERE {
        ?s ?p ?o .
        FILTER regex(?s,'^.*War.*$')
    }
    GROUP BY ?s
    ORDER BY DESC(?count)
''').reset_index()

time: 34.7 s (started: 2023-02-12 23:49:35 +01:00)


In [50]:
wars['p'].unique()

array(['rdf:type'], dtype=object)

time: 4.67 ms (started: 2023-02-12 23:51:45 +01:00)
