## Olympics and Austrian athletes

Contents
* combines all files to a massive triple store
* lists the name of every gold medallist
* Lists the names of every athlete, with at least one medal, alongside their total number of medals (sorted by the number of medals)
* what else can we ask?

### Uncomment if not installed

In [1]:
# import sys
# !{sys.executable} -m pip install rdflib pandas

In [2]:
from rdflib import Graph
import pandas as pd

### Import ttl files

Switch between the data source folder for turtle files.  
"ttl" imports the files generated with OpenRefine  
"tarql" imports the files generated with tarql

In [3]:
# source = "ttl"
source = "tarql"

In [4]:
g = Graph()
g.parse(source + "/Athletes.ttl", format="turtle")
print(len(g))

15317


In [5]:
g.parse("ttl/NOC_Regions.ttl", format="turtle")
print(len(g))

15547


In [6]:
g.parse(source + "/Games.ttl", format="turtle")
print(len(g))

15832


In [7]:
g.parse(source + "/Events.ttl", format="turtle")
print(len(g))

17143


In [8]:
g.parse("ttl/Medals.ttl", format="turtle")
print(len(g))

17146


In [9]:
g.parse(source + "/Instance.ttl", format="turtle")
print(len(g))

32570


### Lists all Austrians that won a gold medal

In [10]:
result = g.query("""
    PREFIX ex: <http://example.org/ontology/olympics/>
    PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>

    SELECT DISTINCT ?name
    WHERE {
     ?instance ex:athlete ?athlete;
     ex:medal "Gold"@en .
     ?athlete rdfs:label ?name .
    }
""")

pd.set_option('display.max_rows', 500)
df = pd.DataFrame(result, columns=result.vars)
df.index +=1
df

Unnamed: 0,name
1,Doris Neuner
2,Franz Klammer
3,"Anton Engelbert ""Toni"" Sailer"
4,Thomas Schroll
5,Gregor Schlierenzauer
6,Andrea Fischbacher
7,Trude Jochum-Beiser
8,Kurt Oppelt
9,Josef Feistmantl
10,Julia Dujmovits


### Lists the names of every athlete, with at least one medal, alongside their total number of medals (sorted by the number of medals)

In [11]:
result = g.query("""
PREFIX ex: <http://example.org/ontology/olympics/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?name (COUNT(?name) As ?noOfMedals)
WHERE {
  ?instance ex:athlete ?athlete ;
            ex:medal   ?medal .
  ?athlete  rdfs:label    ?name .
}
GROUP BY ?name
ORDER BY DESC(?noOfMedals)
""")

print("Total ",len(result))

df = pd.DataFrame(result, columns=result.vars)
df.index +=1
df

Total  313


Unnamed: 0,name,noOfMedals
1,Felix Gottwald,7
2,Klaus Sulzenbacher,4
3,Gregor Schlierenzauer,4
4,Hermann Maier,4
5,Marlies Schild (-Raich),4
6,Martin Hllwarth,4
7,Stephan Eberharter,4
8,Benjamin Raich,4
9,Mario Stecher,4
10,Thomas Morgenstern,4
