# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [3]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-c41ceeb172-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Sport Workflow Series ("World Records explorative search") 

Consider the following exploratory information need:

> compile a list of athletes that held world records across some disciplines

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wd:Q1189`    | Usain Bolt    | node      |
| `wd:Q766`     | Jamaica       | node |
| `wd:Q688615`  | World Record  | node |
| `wd:Q542`     | athletics     | node |



Also consider

```
?a wdt:P106/wdt:P279 wd:Q2066131
```

is the BGP to retrieve all instances of **athlete**

```
?a wdt:P279 wd:Q688615
```

is the BGP to retrieve the types of **world records**


## Workload Goals

1. Identify the BGP for detaining a world record

2. Identify the BGP for the types of competitions in which a world record is classified

3. How many world records are held by Italian athletes

4. Compare number of world records held across different dimensions
 
   4.1 In which specific sport France has the most world records
   
   4.2 How many world records are held across genders
   
   4.3 Which sport has the highest number of world records


1) here we have the property of a person that held a record.

In [12]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE{
    wd:Q1189 ?p ?o .
    ?p <http://schema.org/name> ?pname 
    FILTER REGEX(?pname, 'record', 'i')
    }
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1000'), ('pname', 'record held')]


1

For exemple here we have the records of Usain Bolt.

In [8]:
queryString = """
SELECT DISTINCT ?o ?oname
WHERE{
    wd:Q1189 wdt:P1000 ?o .
    ?o <http://schema.org/name> ?oname 
    }
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q1066353'), ('oname', "Men's 100 metres world record progression")]
[('o', 'http://www.wikidata.org/entity/Q1053804'), ('oname', "Men's 4 × 100 metres relay world record progression")]
[('o', 'http://www.wikidata.org/entity/Q1187829'), ('oname', "men's 200 metres world record progression")]


3

2) Here there are all the types of World Records.

In [13]:
queryString = """
SELECT DISTINCT ?a ?aname
WHERE{
    ?a wdt:P279 wd:Q688615 .
    ?a <http://schema.org/name> ?aname 
    }
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q1066353'), ('aname', "Men's 100 metres world record progression")]
[('a', 'http://www.wikidata.org/entity/Q3240929'), ('aname', 'largest known prime number')]
[('a', 'http://www.wikidata.org/entity/Q24033838'), ('aname', "women's world record")]
[('a', 'http://www.wikidata.org/entity/Q24033834'), ('aname', "men's world record")]
[('a', 'http://www.wikidata.org/entity/Q24255295'), ('aname', 'junior world record')]
[('a', 'http://www.wikidata.org/entity/Q23580887'), ('aname', 'highest temperature recorded on Earth')]
[('a', 'http://www.wikidata.org/entity/Q16883666'), ('aname', 'speed record')]
[('a', 'http://www.wikidata.org/entity/Q69907823'), ('aname', 'world best time')]
[('a', 'http://www.wikidata.org/entity/Q3422413'), ('aname', "Women's 100 metres hurdles world record progression")]
[('a', 'http://www.wikidata.org/entity/Q1136298'), ('aname', "Men's 110 metres hurdles world record progression")]
[('a', 'http://www.wikidata.org/entity

13

In [14]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE{
    ?a ?p wd:Q688615 .
    ?p <http://schema.org/name> ?pname 
    }
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1552'), ('pname', 'has quality')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pname', 'award received')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pname', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P264'), ('pname', 'record label')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pname', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic")]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P360'), ('pname', 'is a list of')]
[('p', 'http://www.wikidata.org/prop/direct/P460'), ('pname', 'said to be the same as')]
[('p', 'http://www.wikidata.org/prop/direct/P793'), ('pname', 'significant event')]
[('p', 'http://www.wikidata.org/prop/direct/P921'), ('pname', 'main subject')]


11

In [15]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE{
    wd:Q688615 ?p ?a .
    ?p <http://schema.org/name> ?pname 
    }
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pname', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P2910'), ('pname', 'icon')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3417'), ('pname', 'Quora topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3553'), ('pname', 'Zhihu topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('pname', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P460'), ('pname', 'said to be the same as')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('pname', 'Freebase ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8408'), ('pname', 'KBpedia ID')]
[('p', 'http://www.wikidata.org/prop/direct/P910'), ('pname', "topic's main category")]


10

3) First of all we search for all the italians athletes.

In [20]:
queryString = """
SELECT DISTINCT ?a ?aname ?nationality
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 ?n .
     ?n <http://schema.org/name> ?nationality 
     FILTER REGEX(?nationality, 'italy|italian', 'i')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q96754211'), ('aname', 'Cristina Tartarone'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q16531085'), ('aname', 'Barbara Masi'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q3845091'), ('aname', 'Manuela Manetta'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q80350481'), ('aname', 'Andrea Capella'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q3703498'), ('aname', 'Davide Bianchetti'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q50044720'), ('aname', 'Yuri Farneti'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q80354752'), ('aname', 'Francesco Busi'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q477499'), ('aname', 'Amr Swelim'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/entity/Q80350503'), ('aname', 'Josè Facchini'), ('nationality', 'Italy')]
[('a', 'http://www.wikidata.org/en

20

Now let's check the record held by this athletes.

In [26]:
queryString = """
SELECT DISTINCT ?a ?aname ?nationality ?recordname
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 ?n .
     ?a wdt:P1000 ?record .
     ?record <http://schema.org/name> ?recordname .
     ?n <http://schema.org/name> ?nationality 
     FILTER REGEX(?nationality, 'italy|italian', 'i')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q17609944'), ('aname', 'Angela Ramello'), ('nationality', 'Kingdom of Italy'), ('recordname', 'Italian records in athletics')]
[('a', 'http://www.wikidata.org/entity/Q5442'), ('aname', 'Sara Simeoni'), ('nationality', 'Italy'), ('recordname', 'Italian records in athletics')]
[('a', 'http://www.wikidata.org/entity/Q5442'), ('aname', 'Sara Simeoni'), ('nationality', 'Italy'), ('recordname', "Men's high jump world record progression")]
[('a', 'http://www.wikidata.org/entity/Q5444'), ('aname', 'Pietro Mennea'), ('nationality', 'Italy'), ('recordname', "men's 200 metres world record progression")]
[('a', 'http://www.wikidata.org/entity/Q5444'), ('aname', 'Pietro Mennea'), ('nationality', 'Italy'), ('recordname', "Men's 200 metres European record progression")]
[('a', 'http://www.wikidata.org/entity/Q25366209'), ('aname', 'Marcell Jacobs'), ('nationality', 'Italy'), ('recordname', 'Italian records in athletics')]
[('a', 'http://www.wikidata.org/

11

In [27]:
queryString = """
SELECT DISTINCT ?a ?aname ?nationality ?recordname
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 ?n .
     ?a wdt:P1000 ?record .
     ?record wdt:P31 wd:Q688615 .
     ?record <http://schema.org/name> ?recordname .
     ?n <http://schema.org/name> ?nationality 
     FILTER REGEX(?nationality, 'italy|italian', 'i')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q1343248'), ('aname', 'Salvatore Morale'), ('nationality', 'Italy'), ('recordname', "Men's 400 metres hurdles world record progression")]


1

As we can see now only one italian held a world record.

4.1) We retrieve the world records of french athletes.

In [30]:
queryString = """
SELECT ?a ?aname ?nationality ?recordname
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 ?n .
     ?a wdt:P1000 ?record .
     ?record <http://schema.org/name> ?recordname .
     ?n <http://schema.org/name> ?nationality 
     FILTER REGEX(?nationality, 'fr|france', 'i')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q3195752'), ('aname', 'Kevin Mayer'), ('nationality', 'France'), ('recordname', 'decathlon world record progression')]
[('a', 'http://www.wikidata.org/entity/Q937648'), ('aname', 'Jimmy Vicaut'), ('nationality', 'France'), ('recordname', '100 metres')]
[('a', 'http://www.wikidata.org/entity/Q64864060'), ('aname', 'Clarisse Crémer'), ('nationality', 'France'), ('recordname', 'Around the world sailing record')]
[('a', 'http://www.wikidata.org/entity/Q433694'), ('aname', 'Marie Collonvillé'), ('nationality', 'France'), ('recordname', 'decathlon world record progression')]
[('a', 'http://www.wikidata.org/entity/Q1081443'), ('aname', 'Christian Plaziat'), ('nationality', 'France'), ('recordname', "Women's heptathlon world record progression")]
[('a', 'http://www.wikidata.org/entity/Q65065390'), ('aname', 'Sasha Zhoya'), ('nationality', 'France'), ('recordname', 'list of world junior records in athletics')]
[('a', 'http://www.wikidata.org/entity

7

In [50]:
queryString = """
SELECT DISTINCT ?a ?aname ?nationality ?n
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 ?n .
     ?n <http://schema.org/name> ?nationality 
     FILTER REGEX(?nationality, 'france|french', 'i')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q2935075'), ('aname', 'Camille Serme'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q908632'), ('aname', 'Grégory Gaultier'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q16745166'), ('aname', 'Christophe André'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q1067657'), ('aname', 'Thierry Lincou'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q3388824'), ('aname', 'Grégoire Marche'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q51115788'), ('aname', 'Vincent Droesbeke'), ('nationality', 'France'), ('n', 'http://www.wikidata.org/entity/Q142')]
[('a', 'http://www.wikidata.org/entity/Q3155042'), ('aname', 'Isab

20

In [44]:
queryString = """
SELECT ?a ?aname ?recordname ?record
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 wd:Q142 .
     ?a wdt:P1000 ?record .
     ?record wdt:P31 wd:Q688615 .
     ?record <http://schema.org/name> ?recordname 
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q3195752'), ('aname', 'Kevin Mayer'), ('recordname', 'decathlon world record progression'), ('record', 'http://www.wikidata.org/entity/Q1139195')]
[('a', 'http://www.wikidata.org/entity/Q433694'), ('aname', 'Marie Collonvillé'), ('recordname', 'decathlon world record progression'), ('record', 'http://www.wikidata.org/entity/Q1139195')]
[('a', 'http://www.wikidata.org/entity/Q1081443'), ('aname', 'Christian Plaziat'), ('recordname', "Women's heptathlon world record progression"), ('record', 'http://www.wikidata.org/entity/Q83573')]


3

We can see that France has 2 world records in 'decathlon world record progression' (Q1139195).

In [46]:
queryString = """
SELECT COUNT(?aname) as ?numberofrecords ?recordname 
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a <http://schema.org/name> ?aname .
     ?a wdt:P27 wd:Q142 .
     ?a wdt:P1000 ?record .
     ?record wdt:P31 wd:Q688615 .
     ?record <http://schema.org/name> ?recordname 
    }
GROUP BY ?recordname
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('numberofrecords', '2'), ('recordname', 'decathlon world record progression')]
[('numberofrecords', '1'), ('recordname', "Women's heptathlon world record progression")]


2

4.2) We have to retrieve all the athletes with a world record.

In [52]:
queryString = """
SELECT DISTINCT ?aname ?recordname
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a wdt:P1000 ?record .
     ?a <http://schema.org/name> ?aname .
     ?record <http://schema.org/name> ?recordname
     
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('aname', 'Cate Campbell'), ('recordname', 'World record progression 4 × 100 metres freestyle relay')]
[('aname', 'Kang Chae-young'), ('recordname', 'list of Olympic records in archery')]
[('aname', 'Pavel Maslák'), ('recordname', 'Czech records in athletics')]
[('aname', 'Mykola Avilov'), ('recordname', 'decathlon world record progression')]
[('aname', 'Rafer Johnson'), ('recordname', 'decathlon world record progression')]
[('aname', 'David Hemery'), ('recordname', "Men's 400 metres hurdles world record progression")]
[('aname', 'Caitlyn Jenner'), ('recordname', 'decathlon world record progression')]
[('aname', 'Daley Thompson'), ('recordname', 'decathlon world record progression')]
[('aname', 'Julien Saelens'), ('recordname', 'Belgian records in athletics')]
[('aname', 'Colin Jackson'), ('recordname', '60 metres hurdles world record progression')]
[('aname', 'Akilles Järvinen'), ('recordname', 'decathlon world record progression')]
[('aname', 'Yang Chuan-kwang'), ('recordnam

20

Now we have to find the predicate about the gender of an athlete.

In [48]:
queryString = """
SELECT  ?p ?pname
WHERE{
     wd:Q1189 ?p ?o .
     ?p <http://schema.org/name> ?pname .
     FILTER REGEX(?pname, 'gender')
    }
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pname', 'sex or gender')]


1

In [55]:
queryString = """
SELECT  COUNT(?aname) as ?howmanyrecors ?gendername
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a wdt:P1000 ?record .
     ?a <http://schema.org/name> ?aname .
     ?record <http://schema.org/name> ?recordname .
     ?a wdt:P21 ?gender .
     ?gender <http://schema.org/name> ?gendername
     
    }
GROUP BY ?gendername
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('howmanyrecors', '1'), ('gendername', 'transgender male')]
[('howmanyrecors', '1'), ('gendername', 'intersex')]
[('howmanyrecors', '1'), ('gendername', 'transgender female')]
[('howmanyrecors', '153'), ('gendername', 'male')]
[('howmanyrecors', '38'), ('gendername', 'female')]


5

4.3) We count the number of world record held by the athletes group by the type and search for the highest number.

In [60]:
queryString = """
SELECT  COUNT(?aname) as ?howmanyrecors ?recordname
WHERE{
     ?a wdt:P106/wdt:P279 wd:Q2066131 .
     ?a wdt:P1000 ?record .
     ?a <http://schema.org/name> ?aname .
     ?record <http://schema.org/name> ?recordname .
     
     
    }
GROUP BY ?recordname
ORDER BY DESC(COUNT(?aname))
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('howmanyrecors', '30'), ('recordname', 'Marathon world record progression')]


1