# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-bc16e27120-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Sport Workflow Series ("Running explorative search") 

Consider the following exploratory information need:

> investigate the association football players and find the main BGPs related to this sports. Compare awards and participation in the competitions of the players

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wd:Q5`       | Human         | node      |
| `wd:Q9124`    | Eliud Kipchoge| node      |
| `wd:Q853003`| athletics at the 2008 Summer Olympics – men's 200 metres |node|





Also consider

```
wd:Q9124 ?p ?obj .
```

is the BGP to retrieve all **properties of Eliud Kipchoge**


Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.


The workload should:


1. Identify the BGP for runner

2. Identify the BGP for long distance running

3. Return the disciplines of the long distance running (e.g. marathon) (the result set must be a list of couples discipline IRI and label).

4. Consider marathons ran in Europe. Return the number of different runners who won a marathon group by their country of citizenship (the result set must be a triples of country IRI, label and #different runners).

5. Identify the BGP for Olympic Games

6. Consider only the summer olympic games of 2004-2008-2012-2016. Return the number of marathon runners who had partecipated to all these editions group by their country of citizenship (if the runner partecipated in more than one edition of the game, consider him/her only once) (the result set must be a list of country IRI, label and #marathon runners)

7. For each disciplines of the long distance running find who hold the World Record (the result set must be a list of elements with discipline IRI, label and athlete IRI and label).

## Task 1
Identify the BGP for runner

In [15]:
queryString = """
SELECT DISTINCT ?instanceof ?instancename ?subclassof ?subclassname
WHERE {
   wd:Q9124 wdt:P106 ?instanceof .
   ?instanceof sc:name ?instancename.
   ?instanceof wdt:P279 ?subclassof .
   ?subclassof sc:name ?subclassname .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('instanceof', 'http://www.wikidata.org/entity/Q13382460'), ('instancename', 'marathon runner'), ('subclassof', 'http://www.wikidata.org/entity/Q4439155'), ('subclassname', 'long-distance runner')]
[('instanceof', 'http://www.wikidata.org/entity/Q11513337'), ('instancename', 'athletics competitor'), ('subclassof', 'http://www.wikidata.org/entity/Q2066131'), ('subclassname', 'athlete')]
[('instanceof', 'http://www.wikidata.org/entity/Q4439155'), ('instancename', 'long-distance runner'), ('subclassof', 'http://www.wikidata.org/entity/Q12803959'), ('subclassname', 'runner')]
3


Final query for this task

In [9]:
queryString = """
SELECT DISTINCT ?p ?name 
WHERE {
   wd:Q12803959 ?p ?obj .
   ?p sc:name ?name 
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1014'), ('name', 'Art & Architecture Thesaurus ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1036'), ('name', 'Dewey Decimal Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P1296'), ('name', 'Gran Enciclopèdia Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('name', 'GND ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2347'), ('name', 'YSO ID')]
[('p', 'http://www.wikidata.org/prop/direct/P244'), ('name', 'Library of Congress authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('name', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct

## Task 2
Identify the BGP for long distance running

In [34]:
queryString = """
SELECT DISTINCT ?instanceof ?instancename ?field ?fieldname
WHERE {
   wd:Q9124 wdt:P106 ?instanceof .
   ?instanceof sc:name ?instancename.
    ?instanceof wdt:P425 ?field.
   ?field sc:name ?fieldname .

}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('instanceof', 'http://www.wikidata.org/entity/Q13382460'), ('instancename', 'marathon runner'), ('field', 'http://www.wikidata.org/entity/Q40244'), ('fieldname', 'marathon')]
[('instanceof', 'http://www.wikidata.org/entity/Q11513337'), ('instancename', 'athletics competitor'), ('field', 'http://www.wikidata.org/entity/Q542'), ('fieldname', 'athletics')]
[('instanceof', 'http://www.wikidata.org/entity/Q4439155'), ('instancename', 'long-distance runner'), ('field', 'http://www.wikidata.org/entity/Q917206'), ('fieldname', 'long-distance running')]
3


Final query for this task

In [35]:
queryString = """
SELECT DISTINCT ?p ?name 
WHERE {
     wd:Q917206 ?p ?obj.
   ?p sc:name ?name .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('name', 'GND ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2347'), ('name', 'YSO ID')]
[('p', 'http://www.wikidata.org/prop/direct/P244'), ('name', 'Library of Congress authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('name', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P3095'), ('name', 'practiced by')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3417'), ('name', 'Quora topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3553'), ('name', 'Zhihu topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('name', 'part of')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wik

## Task 3
Return the disciplines of the long distance running (e.g. marathon) (the result set must be a list of couples discipline IRI and label).

In [38]:
queryString = """
SELECT DISTINCT ?disciplines ?name 
WHERE {
    ?disciplines wdt:P279 wd:Q917206 .
   ?disciplines sc:name ?name .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('disciplines', 'http://www.wikidata.org/entity/Q26844379'), ('name', 'one-hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q2815830'), ('name', '2 miles run')]
[('disciplines', 'http://www.wikidata.org/entity/Q215677'), ('name', 'half marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q163892'), ('name', '10,000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q240500'), ('name', '5000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q40244'), ('name', 'marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2774730'), ('name', '10K run')]
[('disciplines', 'http://www.wikidata.org/entity/Q500050'), ('name', 'cross country running')]
[('disciplines', 'http://www.wikidata.org/entity/Q26303'), ('name', 'ultramarathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2164200'), ('name', 'one hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q19767716'), ('name', '15K run')]
[('disciplines', 'http://www.wikidata.org/ent

Final query for this task

In [39]:
queryString = """
SELECT DISTINCT ?disciplines ?name 
WHERE {
    ?disciplines wdt:P279 wd:Q917206 .
   ?disciplines sc:name ?name .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('disciplines', 'http://www.wikidata.org/entity/Q26844379'), ('name', 'one-hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q2815830'), ('name', '2 miles run')]
[('disciplines', 'http://www.wikidata.org/entity/Q215677'), ('name', 'half marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q163892'), ('name', '10,000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q240500'), ('name', '5000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q40244'), ('name', 'marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2774730'), ('name', '10K run')]
[('disciplines', 'http://www.wikidata.org/entity/Q500050'), ('name', 'cross country running')]
[('disciplines', 'http://www.wikidata.org/entity/Q26303'), ('name', 'ultramarathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2164200'), ('name', 'one hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q19767716'), ('name', '15K run')]
[('disciplines', 'http://www.wikidata.org/ent

## Task 4
Consider marathons ran in Europe. Return the number of different runners who won a marathon group by their country of citizenship (the result set must be a triples of country IRI, label and #different runners).

In [23]:
queryString = """
SELECT DISTINCT ?marathons ?name ?country ?countryname
WHERE {
    ?marathons wdt:P279 wd:Q40244 .
   ?marathons sc:name ?name .
   ?marathons wdt:P17 ?country .
   ?country sc:name ?countryname .
   ?country wdt:P30 ?continent .
   ?continent sc:name ?continentname .
   FILTER(STR(?continentname)= "Europe")
   ?person wdt:P1344 ?marathons .
   ?person sc:name ?personname
}
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('marathons', 'http://www.wikidata.org/entity/Q161222'), ('name', 'Berlin Marathon'), ('country', 'http://www.wikidata.org/entity/Q183'), ('countryname', 'Germany')]
[('marathons', 'http://www.wikidata.org/entity/Q528634'), ('name', 'Frankfurt Marathon'), ('country', 'http://www.wikidata.org/entity/Q183'), ('countryname', 'Germany')]
[('marathons', 'http://www.wikidata.org/entity/Q1071285'), ('name', 'Paris Marathon'), ('country', 'http://www.wikidata.org/entity/Q142'), ('countryname', 'France')]
[('marathons', 'http://www.wikidata.org/entity/Q578794'), ('name', 'London Marathon'), ('country', 'http://www.wikidata.org/entity/Q145'), ('countryname', 'United Kingdom')]
[('marathons', 'http://www.wikidata.org/entity/Q748757'), ('name', 'Rome Marathon'), ('country', 'http://www.wikidata.org/entity/Q38'), ('countryname', 'Italy')]
5


Final query for this task

In [69]:
queryString = """
SELECT DISTINCT ?country ?countryname (count(?person) AS ?personcount)
WHERE {
    ?marathons wdt:P279 wd:Q40244 .
   ?marathons sc:name ?name .
   ?marathons wdt:P17 ?country .
   ?country sc:name ?countryname .
   ?country wdt:P30 ?continent .
   ?continent sc:name ?continentname .
   FILTER(STR(?continentname)= "Europe")
   ?person wdt:P1344 ?marathons .
   ?person sc:name ?personname
} GROUP BY ?country ?countryname
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q38'), ('countryname', 'Italy'), ('personcount', '2')]
[('country', 'http://www.wikidata.org/entity/Q183'), ('countryname', 'Germany'), ('personcount', '3')]
[('country', 'http://www.wikidata.org/entity/Q142'), ('countryname', 'France'), ('personcount', '1')]
[('country', 'http://www.wikidata.org/entity/Q145'), ('countryname', 'United Kingdom'), ('personcount', '3')]
4


## Task 5
Identify the BGP for Olympic Games

In [77]:
queryString = """
SELECT DISTINCT ?partof ?partname ?subclassof ?subclassname ?partof1 ?partname1
WHERE {
   wd:Q917206 wdt:P361 ?partof .
   ?partof sc:name ?partname.
   ?partof wdt:P279 ?subclassof .
   ?subclassof sc:name ?subclassname .
   ?subclassof wdt:P361 ?partof1 . 
   ?partof1 sc:name ?partname1

}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('partof', 'http://www.wikidata.org/entity/Q542'), ('partname', 'athletics'), ('subclassof', 'http://www.wikidata.org/entity/Q212434'), ('subclassname', 'Olympic sport'), ('partof1', 'http://www.wikidata.org/entity/Q5389'), ('partname1', 'Olympic Games')]
1


Final query for this task

In [78]:
queryString = """
SELECT DISTINCT ?p ?name 
WHERE {
     wd:Q5389 ?p ?obj.
   ?p sc:name ?name .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1051'), ('name', 'PSH ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1151'), ('name', "topic's main Wikimedia portal")]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('name', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1245'), ('name', 'OmegaWiki Defined Meaning')]
[('p', 'http://www.wikidata.org/prop/direct/P1296'), ('name', 'Gran Enciclopèdia Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('name', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('name', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P1546'), ('name', 'motto')]
[('p', 'http://www.wikidata.org/prop/direct/P163'), ('name', 'flag')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('name', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P1807

## Task 6
Consider only the summer olympic games of 2004-2008-2012-2016. Return the number of marathon runners who had partecipated to all these editions group by their country of citizenship (if the runner partecipated in more than one edition of the game, consider him/her only once) (the result set must be a list of country IRI, label and #marathon runners)

In [20]:
queryString = """
SELECT DISTINCT ?person ?name ?pname ?year
WHERE {
    ?person wdt:P106 wd:Q13382460 .
    ?person sc:name ?name .
    ?person wdt:P1344 ?participated .
    ?participated sc:name ?pname .
    ?participated wdt:P585 ?date .
    FILTER (REGEX(STR(?pname), 'Summer Olympics'))
    BIND(STRBEFORE( STR(?date), "-" )  AS ?year)
    FILTER(STR(?year) = '2004' || STR(?year) = '2008' || STR(?year) = '2012' ||STR(?year) = '2016')
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q1701191'), ('name', 'John Nada Saya'), ('pname', '2004 Summer Olympics'), ('year', '2004')]
[('person', 'http://www.wikidata.org/entity/Q16210841'), ('name', 'Elva Dryer'), ('pname', '2004 Summer Olympics'), ('year', '2004')]
[('person', 'http://www.wikidata.org/entity/Q275831'), ('name', 'Christelle Daunay'), ('pname', '2008 Summer Olympics'), ('year', '2008')]
[('person', 'http://www.wikidata.org/entity/Q275831'), ('name', 'Christelle Daunay'), ('pname', 'athletics at the 2016 Summer Olympics'), ('year', '2016')]
[('person', 'http://www.wikidata.org/entity/Q3575181'), ('name', 'Zemzem Ahmed'), ('pname', '2008 Summer Olympics'), ('year', '2008')]
[('person', 'http://www.wikidata.org/entity/Q20988053'), ('name', 'Gulzhanat Zhanatbek'), ('pname', 'athletics at the 2016 Summer Olympics'), ('year', '2016')]
[('person', 'http://www.wikidata.org/entity/Q435784'), ('name', 'Stine Larsen'), ('pname', '2004 Summer Olympics'), ('year', '2004'

Final query for this task

In [21]:
queryString = """
SELECT DISTINCT ?country ?countryname (count(?person) AS ?personcount)
WHERE {
    ?person wdt:P106 wd:Q13382460 .
    ?person sc:name ?name .
    ?person wdt:P1344 ?participated .
    ?participated sc:name ?pname .
    ?participated wdt:P585 ?date .
    FILTER (REGEX(STR(?pname), 'Summer Olympics'))
    BIND(STRBEFORE( STR(?date), "-" )  AS ?year)
    FILTER(STR(?year) = '2004' || STR(?year) = '2008' || STR(?year) = '2012' ||STR(?year) = '2016')
    ?person wdt:P27 ?country .
    ?country sc:name ?countryname
   
} GROUP BY ?country ?countryname
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q232'), ('countryname', 'Kazakhstan'), ('personcount', '3')]
[('country', 'http://www.wikidata.org/entity/Q215'), ('countryname', 'Slovenia'), ('personcount', '4')]
[('country', 'http://www.wikidata.org/entity/Q974'), ('countryname', 'Democratic Republic of the Congo'), ('personcount', '2')]
[('country', 'http://www.wikidata.org/entity/Q967'), ('countryname', 'Burundi'), ('personcount', '4')]
[('country', 'http://www.wikidata.org/entity/Q865'), ('countryname', 'Taiwan'), ('personcount', '3')]
[('country', 'http://www.wikidata.org/entity/Q218'), ('countryname', 'Romania'), ('personcount', '5')]
[('country', 'http://www.wikidata.org/entity/Q963'), ('countryname', 'Botswana'), ('personcount', '2')]
[('country', 'http://www.wikidata.org/entity/Q28'), ('countryname', 'Hungary'), ('personcount', '5')]
[('country', 'http://www.wikidata.org/entity/Q929'), ('countryname', 'Central African Republic'), ('personcount', '1')]
[('country', 'http:/

## Task 7
For each disciplines of the long distance running find who hold the World Record (the result set must be a list of elements with discipline IRI, label and athlete IRI and label).

In [83]:
queryString = """
SELECT DISTINCT ?disciplines ?name 
WHERE {
    ?disciplines wdt:P279 wd:Q917206 .
   ?disciplines sc:name ?name .
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('disciplines', 'http://www.wikidata.org/entity/Q26844379'), ('name', 'one-hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q2815830'), ('name', '2 miles run')]
[('disciplines', 'http://www.wikidata.org/entity/Q215677'), ('name', 'half marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q163892'), ('name', '10,000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q240500'), ('name', '5000 metres')]
[('disciplines', 'http://www.wikidata.org/entity/Q40244'), ('name', 'marathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2774730'), ('name', '10K run')]
[('disciplines', 'http://www.wikidata.org/entity/Q500050'), ('name', 'cross country running')]
[('disciplines', 'http://www.wikidata.org/entity/Q26303'), ('name', 'ultramarathon')]
[('disciplines', 'http://www.wikidata.org/entity/Q2164200'), ('name', 'one hour run')]
[('disciplines', 'http://www.wikidata.org/entity/Q19767716'), ('name', '15K run')]
[('disciplines', 'http://www.wikidata.org/ent

Final query for this task

In [24]:
queryString = """
SELECT DISTINCT ?disciplines ?name ?recordholder ?holdername
WHERE {
    ?disciplines wdt:P279 wd:Q917206 .
   ?disciplines sc:name ?name .
   ?disciplines wdt:P1000 ?recordholder .
   ?recordholder sc:name ?holdername
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('disciplines', 'http://www.wikidata.org/entity/Q240500'), ('name', '5000 metres'), ('recordholder', 'http://www.wikidata.org/entity/Q9119'), ('holdername', 'Kenenisa Bekele Beyecha')]
[('disciplines', 'http://www.wikidata.org/entity/Q40244'), ('name', 'marathon'), ('recordholder', 'http://www.wikidata.org/entity/Q9124'), ('holdername', 'Eliud Kipchoge')]
[('disciplines', 'http://www.wikidata.org/entity/Q40244'), ('name', 'marathon'), ('recordholder', 'http://www.wikidata.org/entity/Q63223'), ('holdername', 'Dennis Kipruto Kimetto')]
3
