# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [16]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-236d32b515-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# History Workflow Series ("World Wide Web") 

Consider the following exploratory information need:

> Investigate the origins of the World Wide Web and related academic activities and scientists.

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | occupation    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P3342`     | Significant person       | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q466`| World Wide Web      | node |





Also consider

```
wd:Q466 ?p ?obj .
```

is the BGP to retrieve all **properties of World Wide Web**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should:

1. Find the inventors of World Wide Web (return IRI and name).

2. Identify the BGP for hypertext system.

3. Find all the hypertext systems born before the 1980 (return the IRI and name of the system and the inception date).

4. Identify the BGP for computer scientist

5. Find how many computer scientists there are for each continent (consider their citizenship). Return the IRI and name of the continent and the number of scientists for each continent. 

6. Find all the computer scientists who thaught at the University of Cambridge (return IRI and name)

7. Find all the computer scientists who wrote at least 5 books (please consider only the instances of book, exclude "literary work" or other type of work related to books). Return the IRI and name of the computer scientist and the number of books. 

## Task 1
Find the inventors of World Wide Web (return IRI and name).

In [7]:
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # bind something
   wd:Q466 ?p ?obj .
   # get the label
   ?p sc:name ?name.
}
LIMIT 100
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1051'), ('name', 'PSH ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('name', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1430'), ('name', 'OpenPlaques subject ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1482'), ('name', 'Stack Exchange tag')]
[('p', 'http://www.wikidata.org/prop/direct/P1542'), ('name', 'has effect')]
[('p', 'http://www.wikidata.org/prop/direct/P1813'), ('name', 'short name')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P2004'), ('name', 'NALT ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2163'), ('name', 'FAST ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2184'), ('name', 'history of topic')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('name', 'GND ID')]
[('p', 

Final query for this task

In [17]:
queryString = """
SELECT DISTINCT ?inventor ?name
WHERE {
   wd:Q466 wdt:P61 ?inventor .
   ?inventor sc:name ?name.
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('inventor', 'http://www.wikidata.org/entity/Q80'), ('name', 'Tim Berners-Lee')]
[('inventor', 'http://www.wikidata.org/entity/Q92749'), ('name', 'Robert Cailliau')]
2


## Task 2
Identify the BGP for hypertext system.

In [10]:
queryString = """
SELECT DISTINCT ?obj ?name
WHERE {
   # bind something
   wd:Q466 wdt:P31 ?obj .
   # get the label
   ?obj sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q121182'), ('name', 'information system')]
[('obj', 'http://www.wikidata.org/entity/Q65966993'), ('name', 'hypertext system')]
2


Final query for this task

In [25]:
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   wd:Q65966993 ?p ?obj .
   # get the label
   ?p sc:name ?name.
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3847'), ('name', 'Open Library subject ID')]
3


## Task 3
Find all the hypertext systems born before the 1980 (return the IRI and name of the system and the inception date).

In [23]:
queryString = """
SELECT DISTINCT ?obj ?name ?born
WHERE {
   ?obj wdt:P31 wd:Q65966993  .
   ?obj sc:name ?name.
   ?obj wdt:P571 ?born   
}
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q466'), ('name', 'World Wide Web'), ('born', '1989-03-12T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q466'), ('name', 'World Wide Web'), ('born', '1990-12-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q370979'), ('name', 'Amigaguide'), ('born', '1991-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q2385520'), ('name', 'ENQUIRE'), ('born', '1980-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q2385520'), ('name', 'ENQUIRE'), ('born', '1984-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q4994212'), ('name', 'Hypertext Editing System'), ('born', '1967-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q5448331'), ('name', 'File Retrieval and Editing System'), ('born', '1968-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q785345'), ('name', 'Project Xanadu'), ('born', '1960-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q1799609'), ('name', 'HyTime')

Final query for this task

In [22]:
queryString = """
SELECT DISTINCT ?obj ?name ?year
WHERE {
   ?obj wdt:P31 wd:Q65966993  .
   ?obj sc:name ?name.
   ?obj wdt:P571 ?born
   BIND(STRBEFORE( STR(?born), "-" )  AS ?year)
   FILTER(str(?year) < "1980")
    
}
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q4994212'), ('name', 'Hypertext Editing System'), ('year', '1967')]
[('obj', 'http://www.wikidata.org/entity/Q5448331'), ('name', 'File Retrieval and Editing System'), ('year', '1968')]
[('obj', 'http://www.wikidata.org/entity/Q785345'), ('name', 'Project Xanadu'), ('year', '1960')]
[('obj', 'http://www.wikidata.org/entity/Q471'), ('name', 'Memex'), ('year', '1939')]
[('obj', 'http://www.wikidata.org/entity/Q1050365'), ('name', 'oNLine System'), ('year', '1962')]
[('obj', 'http://www.wikidata.org/entity/Q8063246'), ('name', 'ZOG'), ('year', '1977')]
[('obj', 'http://www.wikidata.org/entity/Q3625272'), ('name', 'Aspen Movie Map'), ('year', '1978')]
[('obj', 'http://www.wikidata.org/entity/Q5358242'), ('name', 'Electronic Document System'), ('year', '1978')]
8


## Task 4
Identify the BGP for computer scientist

In [88]:
queryString = """
SELECT DISTINCT ?person ?name ?obj ?occupation
WHERE {
    ?person wdt:P31 wd:Q5 .
    ?person wdt:P106 ?obj .
   ?person sc:name ?name.
   ?obj sc:name ?occupation .
   FILTER(STR(?occupation) = "computer scientist")
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q8190500'), ('name', 'Seinosuke Toda'), ('obj', 'http://www.wikidata.org/entity/Q82594'), ('occupation', 'computer scientist')]
[('person', 'http://www.wikidata.org/entity/Q603999'), ('name', 'Kevin Poulsen'), ('obj', 'http://www.wikidata.org/entity/Q82594'), ('occupation', 'computer scientist')]
[('person', 'http://www.wikidata.org/entity/Q11109023'), ('name', 'Michael Barr'), ('obj', 'http://www.wikidata.org/entity/Q82594'), ('occupation', 'computer scientist')]
[('person', 'http://www.wikidata.org/entity/Q20731777'), ('name', 'Valeria de Paiva'), ('obj', 'http://www.wikidata.org/entity/Q82594'), ('occupation', 'computer scientist')]
[('person', 'http://www.wikidata.org/entity/Q24901829'), ('name', 'Ernesto Damiani'), ('obj', 'http://www.wikidata.org/entity/Q82594'), ('occupation', 'computer scientist')]
[('person', 'http://www.wikidata.org/entity/Q61856956'), ('name', 'Richard Garner'), ('obj', 'http://www.wikidata.org/entity/Q8259

Final query for this task

In [90]:
queryString = """
SELECT DISTINCT ?p ?name 
WHERE {
    wd:Q82594 ?p ?obj .
    ?p sc:name ?name
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1014'), ('name', 'Art & Architecture Thesaurus ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1036'), ('name', 'Dewey Decimal Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P1245'), ('name', 'OmegaWiki Defined Meaning')]
[('p', 'http://www.wikidata.org/prop/direct/P1296'), ('name', 'Gran Enciclopèdia Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('name', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('name', 'GND ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2283'), ('name', 'uses')]
[('p', 'http://www.wikidata.org/prop/direct/P2354'), ('name', 'has list')]
[('p', 'http://www.wikidata.org/prop/direct/P244'), ('name', 'Library of Congress authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('name', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct

## Task 5
Find how many computer scientists there are for each continent (consider their citizenship). Return the IRI and name of the continent and the number of scientists for each continent.

In [20]:
queryString = """
SELECT DISTINCT  ?person ?name ?occupation ?continent ?continentname
WHERE {
    ?person wdt:P31 wd:Q5 .
    ?person wdt:P106 ?obj .
    ?person sc:name ?name .
   ?obj sc:name ?occupation .
   FILTER(STR(?occupation) = "computer scientist")
   ?person wdt:P27 ?nation .
   ?nation sc:name ?country .
   ?nation wdt:P30 ?continent .
   ?continent sc:name ?continentname
   
} LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q8190500'), ('name', 'Seinosuke Toda'), ('occupation', 'computer scientist'), ('continent', 'http://www.wikidata.org/entity/Q48'), ('continentname', 'Asia')]
[('person', 'http://www.wikidata.org/entity/Q20731777'), ('name', 'Valeria de Paiva'), ('occupation', 'computer scientist'), ('continent', 'http://www.wikidata.org/entity/Q46'), ('continentname', 'Europe')]
[('person', 'http://www.wikidata.org/entity/Q20731777'), ('name', 'Valeria de Paiva'), ('occupation', 'computer scientist'), ('continent', 'http://www.wikidata.org/entity/Q18'), ('continentname', 'South America')]
[('person', 'http://www.wikidata.org/entity/Q11109023'), ('name', 'Michael Barr'), ('occupation', 'computer scientist'), ('continent', 'http://www.wikidata.org/entity/Q49'), ('continentname', 'North America')]
[('person', 'http://www.wikidata.org/entity/Q603999'), ('name', 'Kevin Poulsen'), ('occupation', 'computer scientist'), ('continent', 'http://www.wikidata.org/

Final query for this task

In [22]:
queryString = """
SELECT DISTINCT   ?continent ?continentname (count(?name) AS ?namecount)
WHERE {
    ?person wdt:P106 wd:Q82594 .
    ?person sc:name ?name .
    ?person wdt:P27 ?nation .
   ?nation sc:name ?country .
   ?nation wdt:P30 ?continent .
   ?continent sc:name ?continentname

   
} GROUP BY ?continentname ?continent 
"""

print("Results")
x=run_query(queryString)



Results
[('continent', 'http://www.wikidata.org/entity/Q5401'), ('continentname', 'Eurasia'), ('namecount', '23')]
[('continent', 'http://www.wikidata.org/entity/Q46'), ('continentname', 'Europe'), ('namecount', '3443')]
[('continent', 'http://www.wikidata.org/entity/Q538'), ('continentname', 'Insular Oceania'), ('namecount', '2479')]
[('continent', 'http://www.wikidata.org/entity/Q15'), ('continentname', 'Africa'), ('namecount', '76')]
[('continent', 'http://www.wikidata.org/entity/Q48'), ('continentname', 'Asia'), ('namecount', '908')]
[('continent', 'http://www.wikidata.org/entity/Q18'), ('continentname', 'South America'), ('namecount', '92')]
[('continent', 'http://www.wikidata.org/entity/Q49'), ('continentname', 'North America'), ('namecount', '2782')]
7


## Task 6
Find all the computer scientists who thaught at the University of Cambridge (return IRI and name)

In [28]:
queryString = """
SELECT DISTINCT ?person ?name ?uni ?uniname
WHERE {
    ?person wdt:P106 wd:Q82594 .
    ?person sc:name ?name .
   ?person wdt:P69 ?uni .
   ?uni sc:name ?uniname
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q62857'), ('name', 'Maurice Wilkes'), ('uni', 'http://www.wikidata.org/entity/Q6411521'), ('uniname', 'King Edward VI College Stourbridge')]
[('person', 'http://www.wikidata.org/entity/Q16221584'), ('name', 'Yoky Matsuoka'), ('uni', 'http://www.wikidata.org/entity/Q2919229'), ('uniname', 'IMG Academy Bollettieri Tennis Program')]
[('person', 'http://www.wikidata.org/entity/Q9912'), ('name', 'Annie Antón'), ('uni', 'http://www.wikidata.org/entity/Q4894781'), ('uniname', 'Berry College')]
[('person', 'http://www.wikidata.org/entity/Q16221606'), ('name', 'Jim Spohrer'), ('uni', 'http://www.wikidata.org/entity/Q5645975'), ('uniname', 'Hampden Academy')]
[('person', 'http://www.wikidata.org/entity/Q16215695'), ('name', 'Sachin Bansal'), ('uni', 'http://www.wikidata.org/entity/Q7587006'), ('uniname', "St. Anne's Convent School")]
[('person', 'http://www.wikidata.org/entity/Q26207354'), ('name', 'Lum Zhaveli'), ('uni', 'http://www.wikidata.o

Final query for this task

In [26]:
queryString = """
SELECT DISTINCT ?person ?name
WHERE {
    ?person wdt:P106 wd:Q82594 .
    ?person sc:name ?name .
   ?person wdt:P69 ?uni .
   ?uni sc:name ?uniname
   FILTER(STR(?uniname) = "University of Cambridge")
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q20731777'), ('name', 'Valeria de Paiva')]
[('person', 'http://www.wikidata.org/entity/Q61856956'), ('name', 'Richard Garner')]
[('person', 'http://www.wikidata.org/entity/Q42222436'), ('name', 'Diego López-de-Ipiña')]
[('person', 'http://www.wikidata.org/entity/Q178577'), ('name', 'Norbert Wiener')]
[('person', 'http://www.wikidata.org/entity/Q468042'), ('name', 'Jessie MacWilliams')]
[('person', 'http://www.wikidata.org/entity/Q3655362'), ('name', 'Don Syme')]
[('person', 'http://www.wikidata.org/entity/Q92844'), ('name', 'David Marr')]
[('person', 'http://www.wikidata.org/entity/Q335027'), ('name', 'Seymour Papert')]
[('person', 'http://www.wikidata.org/entity/Q92694'), ('name', 'Andy Hopper')]
[('person', 'http://www.wikidata.org/entity/Q92776'), ('name', 'Steve Furber')]
[('person', 'http://www.wikidata.org/entity/Q92944'), ('name', 'David Wheeler')]
[('person', 'http://www.wikidata.org/entity/Q92774'), ('name', 'Stephen R. Bourn

## Task 7
Find all the computer scientists who wrote at least 5 books (please consider only the instances of book, exclude "literary work" or other type of work related to books). Return the IRI and name of the computer scientist and the number of books.

In [89]:
queryString = """
SELECT DISTINCT  ?authorname (count(?bookname) AS ?bookcount)
WHERE {
    ?book wdt:P31 wd:Q571 .
    ?book sc:name ?bookname .
    ?book wdt:P50 ?author .
    ?author sc:name ?authorname .
    ?author wdt:P106 ?field .
    ?field sc:name ?occupation
    FILTER REGEX(?occupation, "computer scientist")
   
} GROUP BY ?authorname 
LIMIT 100
"""

print("Results")
x=run_query(queryString)

Results
[('authorname', 'Gabriel Weinberg'), ('bookcount', '1')]
[('authorname', 'Ron Rivest'), ('bookcount', '1')]
[('authorname', 'Evgeny Morozov'), ('bookcount', '1')]
[('authorname', 'Michael Färber'), ('bookcount', '1')]
[('authorname', 'Ralph Kimball'), ('bookcount', '1')]
[('authorname', 'David J. C. MacKay'), ('bookcount', '1')]
[('authorname', 'John Edward Hopcroft'), ('bookcount', '1')]
[('authorname', 'Gerard Salton'), ('bookcount', '1')]
[('authorname', 'Peter Dayan'), ('bookcount', '1')]
[('authorname', 'John Cowan'), ('bookcount', '1')]
[('authorname', 'John Ousterhout'), ('bookcount', '1')]
[('authorname', 'Edward Tufte'), ('bookcount', '2')]
[('authorname', 'Gayle Laakmann McDowell'), ('bookcount', '1')]
[('authorname', 'Jeffrey David Ullman'), ('bookcount', '1')]
[('authorname', 'Clifford Stein'), ('bookcount', '1')]
[('authorname', 'Martin Davis'), ('bookcount', '1')]
[('authorname', 'John Lions'), ('bookcount', '1')]
[('authorname', 'Jean Gallier'), ('bookcount', '1'

Final query for this task

In [87]:
queryString = """
SELECT DISTINCT ?author ?authorname (count(?bookname) AS ?bookcount)
WHERE {
    ?book wdt:P31 wd:Q571 .
    ?book sc:name ?bookname .
    ?book wdt:P50 ?author .
    ?author sc:name ?authorname .
    ?author wdt:P106 ?field .
    ?field sc:name ?occupation .
    FILTER REGEX(?occupation, "computer scientist")
} GROUP BY ?authorname ?author
HAVING(count(?bookname) >= 2)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q1293908'), ('authorname', 'Edward Tufte'), ('bookcount', '2')]
[('author', 'http://www.wikidata.org/entity/Q92927'), ('authorname', 'Noga Alon'), ('bookcount', '2')]
[('author', 'http://www.wikidata.org/entity/Q315577'), ('authorname', 'Hideo Kojima'), ('bookcount', '2')]
[('author', 'http://www.wikidata.org/entity/Q93043'), ('authorname', 'Jonathan Bowen'), ('bookcount', '2')]
4
