# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-history2-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# History Workflow Series ("World Wide Web") 

Consider the following exploratory information need:

> Investigate the origins of the World Wide Web and related academic activities and scientists.

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | occupation    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P3342`     | Significant person       | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q466`| World Wide Web      | node |





Also consider

```
wd:Q466 ?p ?obj .
```

is the BGP to retrieve all **properties of World Wide Web**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should:

1. Find the inventors of World Wide Web (return IRI and name).

2. Identify the BGP for hypertext system.

3. Find all the hypertext systems born before the 1980 (return the IRI and name of the system and the inception date).

4. Identify the BGP for computer scientist

5. Find how many computer scientists there are for each continent (consider their citizenship). Return the IRI and name of the continent and the number of scientists for each continent. 

6. Find all the computer scientists who thaught at the University of Cambridge (return IRI and name)

7. Find all the computer scientists who wrote at least 5 books (please consider only the instances of book, exclude "literary work" or other type of work related to books). Return the IRI and name of the computer scientist and the number of books. 

## Task 1

In [9]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q466 ?prop ?obj .
    ?prop sc:name ?name.
    FILTER(!isLiteral(?obj)).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P1482'), ('name', 'Stack Exchange tag')]
[('prop', 'http://www.wikidata.org/prop/direct/P1542'), ('name', 'has effect')]
[('prop', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('prop', 'http://www.wikidata.org/prop/direct/P2184'), ('name', 'history of topic')]
[('prop', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('prop', 'http://www.wikidata.org/prop/direct/P2959'), ('name', 'permanent duplicated item')]
[('prop', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('prop', 'http://www.wikidata.org/prop/direct/P5008'), ('name', 'on focus list of Wikimedia project')]
[('prop', 'http://www.wikidata.org/prop/direct/P527'), ('name', 'has part')]
[('prop', 'http://www.wikidata.org/prop/direct/P61'), ('name', 'discoverer or inventor')]
[('prop', 'http://www.wikidata.org/prop/direct/P737'), ('name', 'influenced by')]
[('prop', 'http://www.wikidata.org/prop/direct/

In [10]:
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    wd:Q466 wdt:P61 ?obj .
    ?obj sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q80'), ('name', 'Tim Berners-Lee')]
[('obj', 'http://www.wikidata.org/entity/Q92749'), ('name', 'Robert Cailliau')]
2


## Task 2

In [11]:
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    wd:Q466 wdt:P31 ?obj .
    ?obj sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q121182'), ('name', 'information system')]
[('obj', 'http://www.wikidata.org/entity/Q65966993'), ('name', 'hypertext system')]
2


## Task 3

In [3]:
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    ?obj wdt:P31 wd:Q65966993 .
    ?obj sc:name ?name.
    FILTER(!isLiteral(?obj)).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q466'), ('name', 'World Wide Web')]
[('obj', 'http://www.wikidata.org/entity/Q370979'), ('name', 'Amigaguide')]
[('obj', 'http://www.wikidata.org/entity/Q2385520'), ('name', 'ENQUIRE')]
[('obj', 'http://www.wikidata.org/entity/Q4994212'), ('name', 'Hypertext Editing System')]
[('obj', 'http://www.wikidata.org/entity/Q5448331'), ('name', 'File Retrieval and Editing System')]
[('obj', 'http://www.wikidata.org/entity/Q785345'), ('name', 'Project Xanadu')]
[('obj', 'http://www.wikidata.org/entity/Q1799609'), ('name', 'HyTime')]
[('obj', 'http://www.wikidata.org/entity/Q7742259'), ('name', 'The Interactive Encyclopedia System')]
[('obj', 'http://www.wikidata.org/entity/Q74587961'), ('name', 'MaxThink')]
[('obj', 'http://www.wikidata.org/entity/Q74590695'), ('name', 'HyperRez')]
[('obj', 'http://www.wikidata.org/entity/Q74667091'), ('name', 'HOUDINI')]
[('obj', 'http://www.wikidata.org/entity/Q66561170'), ('name', 'Visual Knowledge Builder')]


In [5]:
queryString = """
SELECT DISTINCT ?p ?name
WHERE { 
    ?obj wdt:P31 wd:Q65966993 ;
        ?p ?x.
    ?p sc:name ?name.
    FILTER(isLiteral(?x)).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2093'), ('name', 'author name string')]
[('p', 'http://www.wikidata.org/prop/direct/P4107'), ('name', 'Framalibre ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1051'), ('name', 'PSH ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1195'), ('name', 'file extension')]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('name', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1430'), ('name', 'OpenPlaques subject ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1641'), ('name', 'port')]
[('p', 'http://www.wikidata.org/prop/direct/P1813'), ('name', 'short name')]
[('p', 'http://www.wikidata.org/prop/direct/P1814'), ('name', 'name in kana')]
[('p', 'http://www.wikidata.org/prop/direct/P2002'), ('name', 'Twitter username')]
[('p', 'http://www.wikidata.org/prop/direct/P2004'), ('name', 'NALT ID')]


In [12]:
queryString = """
SELECT DISTINCT ?obj ?name ?x
WHERE { 
    ?obj wdt:P31 wd:Q65966993 ;
        wdt:P571 ?x.
    ?obj sc:name ?name.
    FILTER(?x < "1980-01-01"^^xsd:date).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q2385520'), ('name', 'ENQUIRE'), ('x', '1980-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q4994212'), ('name', 'Hypertext Editing System'), ('x', '1967-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q5448331'), ('name', 'File Retrieval and Editing System'), ('x', '1968-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q785345'), ('name', 'Project Xanadu'), ('x', '1960-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q471'), ('name', 'Memex'), ('x', '1939-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q1050365'), ('name', 'oNLine System'), ('x', '1962-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q8063246'), ('name', 'ZOG'), ('x', '1977-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q3625272'), ('name', 'Aspen Movie Map'), ('x', '1978-01-01T00:00:00Z')]
[('obj', 'http://www.wikidata.org/entity/Q5358242'), ('name', 'Electronic Document System'), ('x', 

## Task 4

In [14]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q80 ?prop ?obj .
    ?prop sc:name ?name.
    FILTER(!isLiteral(?obj)).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P101'), ('name', 'field of work')]
[('prop', 'http://www.wikidata.org/prop/direct/P103'), ('name', 'native language')]
[('prop', 'http://www.wikidata.org/prop/direct/P106'), ('name', 'occupation')]
[('prop', 'http://www.wikidata.org/prop/direct/P108'), ('name', 'employer')]
[('prop', 'http://www.wikidata.org/prop/direct/P1343'), ('name', 'described by source')]
[('prop', 'http://www.wikidata.org/prop/direct/P1344'), ('name', 'participant in')]
[('prop', 'http://www.wikidata.org/prop/direct/P140'), ('name', 'religion')]
[('prop', 'http://www.wikidata.org/prop/direct/P1412'), ('name', 'languages spoken, written or signed')]
[('prop', 'http://www.wikidata.org/prop/direct/P166'), ('name', 'award received')]
[('prop', 'http://www.wikidata.org/prop/direct/P172'), ('name', 'ethnic group')]
[('prop', 'http://www.wikidata.org/prop/direct/P18'), ('name', 'image')]
[('prop', 'http://www.wikidata.org/prop/direct/P19'), ('name', 'place of birth

In [15]:
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    wd:Q80 wdt:P106 ?obj .
    ?obj sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q1622272'), ('name', 'university teacher')]
[('obj', 'http://www.wikidata.org/entity/Q169470'), ('name', 'physicist')]
[('obj', 'http://www.wikidata.org/entity/Q205375'), ('name', 'inventor')]
[('obj', 'http://www.wikidata.org/entity/Q81096'), ('name', 'engineer')]
[('obj', 'http://www.wikidata.org/entity/Q82594'), ('name', 'computer scientist')]
[('obj', 'http://www.wikidata.org/entity/Q5482740'), ('name', 'programmer')]
[('obj', 'http://www.wikidata.org/entity/Q6859454'), ('name', 'web developer')]
7


## Task 5

In [45]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        ?prop ?obj.
    ?prop sc:name ?name.
    FILTER(REGEX(?name,"izen")).
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P27'), ('name', 'country of citizenship')]
1


In [50]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        wdt:P27 ?obj.
    ?obj ?prop ?x.
    ?prop sc:name ?name.
    FILTER(REGEX(?name,"continent"))
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P30'), ('name', 'continent')]
1


In [52]:
queryString = """
SELECT DISTINCT ?x ?continent COUNT(DISTINCT(?people)) AS ?comp_scientist
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        wdt:P27 ?obj.
    ?obj wdt:P30 ?x.
    ?x sc:name ?continent.
} 
GROUP BY ?x ?continent
"""

print("Results")
x=run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q46'), ('continent', 'Europe'), ('comp_scientist', '3237')]
[('x', 'http://www.wikidata.org/entity/Q538'), ('continent', 'Insular Oceania'), ('comp_scientist', '2476')]
[('x', 'http://www.wikidata.org/entity/Q15'), ('continent', 'Africa'), ('comp_scientist', '76')]
[('x', 'http://www.wikidata.org/entity/Q48'), ('continent', 'Asia'), ('comp_scientist', '846')]
[('x', 'http://www.wikidata.org/entity/Q5401'), ('continent', 'Eurasia'), ('comp_scientist', '22')]
[('x', 'http://www.wikidata.org/entity/Q18'), ('continent', 'South America'), ('comp_scientist', '92')]
[('x', 'http://www.wikidata.org/entity/Q49'), ('continent', 'North America'), ('comp_scientist', '2770')]
7


## Task 6

In [27]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        ?prop ?obj.
    ?prop sc:name ?name.
    ?obj sc:name ?n.
    FILTER(REGEX(?n,"Cambridge"))
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P551'), ('name', 'residence')]
[('prop', 'http://www.wikidata.org/prop/direct/P69'), ('name', 'educated at')]
[('prop', 'http://www.wikidata.org/prop/direct/P20'), ('name', 'place of death')]
[('prop', 'http://www.wikidata.org/prop/direct/P108'), ('name', 'employer')]
[('prop', 'http://www.wikidata.org/prop/direct/P19'), ('name', 'place of birth')]
[('prop', 'http://www.wikidata.org/prop/direct/P485'), ('name', 'archives at')]
[('prop', 'http://www.wikidata.org/prop/direct/P937'), ('name', 'work location')]
[('prop', 'http://www.wikidata.org/prop/direct/P166'), ('name', 'award received')]
8


In [28]:
queryString = """
SELECT DISTINCT ?obj ?n
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        wdt:P108 ?obj.
    ?obj sc:name ?n.
    FILTER(REGEX(?n,"Cambridge"))
} 
LIMIT 10
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q35794'), ('n', 'University of Cambridge')]
1


In [31]:
queryString = """
SELECT DISTINCT ?people ?name
WHERE { 
    ?people wdt:P106 wd:Q82594 ;
        wdt:P108 wd:Q35794.
    ?people sc:name ?name.
}
"""

print("Results")
x=run_query(queryString)

Results
[('people', 'http://www.wikidata.org/entity/Q92944'), ('name', 'David Wheeler')]
[('people', 'http://www.wikidata.org/entity/Q7251'), ('name', 'Alan Turing')]
[('people', 'http://www.wikidata.org/entity/Q365578'), ('name', 'Nathan Myhrvold')]
[('people', 'http://www.wikidata.org/entity/Q446862'), ('name', 'David J. C. MacKay')]
[('people', 'http://www.wikidata.org/entity/Q92431724'), ('name', 'Andrew Pitts')]
[('people', 'http://www.wikidata.org/entity/Q46633'), ('name', 'Charles Babbage')]
[('people', 'http://www.wikidata.org/entity/Q7259'), ('name', 'Ada Lovelace')]
[('people', 'http://www.wikidata.org/entity/Q62857'), ('name', 'Maurice Wilkes')]
[('people', 'http://www.wikidata.org/entity/Q7176624'), ('name', 'Peter Robinson')]
[('people', 'http://www.wikidata.org/entity/Q451770'), ('name', 'Douglas Hartree')]
[('people', 'http://www.wikidata.org/entity/Q6135125'), ('name', 'James H. Davenport')]
[('people', 'http://www.wikidata.org/entity/Q4707402'), ('name', 'Alan Mycroft'

## Task 7

In [33]:
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?people wdt:P108 wd:Q35794.
    ?obj ?prop ?people .
    ?prop sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P1136'), ('name', 'solved by')]
[('prop', 'http://www.wikidata.org/prop/direct/P2438'), ('name', 'narrator')]
[('prop', 'http://www.wikidata.org/prop/direct/P3938'), ('name', 'named by')]
[('prop', 'http://www.wikidata.org/prop/direct/P1318'), ('name', 'proved by')]
[('prop', 'http://www.wikidata.org/prop/direct/P823'), ('name', 'speaker')]
[('prop', 'http://www.wikidata.org/prop/direct/P101'), ('name', 'field of work')]
[('prop', 'http://www.wikidata.org/prop/direct/P1037'), ('name', 'director / manager')]
[('prop', 'http://www.wikidata.org/prop/direct/P1038'), ('name', 'relative')]
[('prop', 'http://www.wikidata.org/prop/direct/P1066'), ('name', 'student of')]
[('prop', 'http://www.wikidata.org/prop/direct/P108'), ('name', 'employer')]
[('prop', 'http://www.wikidata.org/prop/direct/P112'), ('name', 'founded by')]
[('prop', 'http://www.wikidata.org/prop/direct/P123'), ('name', 'publisher')]
[('prop', 'http://www.wikidata.org/prop/

In [35]:
## find P50 author
queryString = """
SELECT DISTINCT ?x ?name
WHERE { 
    ?people wdt:P108 wd:Q35794.
    ?obj wdt:P50 ?people .
    ?obj wdt:P31 ?x.
    ?x sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q815382'), ('name', 'meta-analysis')]
[('x', 'http://www.wikidata.org/entity/Q7318358'), ('name', 'review article')]
[('x', 'http://www.wikidata.org/entity/Q871232'), ('name', 'editorial')]
[('x', 'http://www.wikidata.org/entity/Q58901591'), ('name', 'comparative study')]
[('x', 'http://www.wikidata.org/entity/Q187685'), ('name', 'doctoral thesis')]
[('x', 'http://www.wikidata.org/entity/Q1348305'), ('name', 'erratum')]
[('x', 'http://www.wikidata.org/entity/Q58898636'), ('name', 'evaluation study')]
[('x', 'http://www.wikidata.org/entity/Q13442814'), ('name', 'scholarly article')]
[('x', 'http://www.wikidata.org/entity/Q7725634'), ('name', 'literary work')]
[('x', 'http://www.wikidata.org/entity/Q69488'), ('name', 'MDMA')]
[('x', 'http://www.wikidata.org/entity/Q1504425'), ('name', 'systematic review')]
[('x', 'http://www.wikidata.org/entity/Q2782326'), ('name', 'case report')]
[('x', 'http://www.wikidata.org/entity/Q58900768'), ('name', 

In [40]:
## find P50 author
queryString = """
SELECT DISTINCT ?people ?name (COUNT(DISTINCT ?book) AS ?books)
WHERE { 
    ?people wdt:P108 wd:Q35794.
    ?book wdt:P50 ?people .
    ?book wdt:P31 wd:Q571.
    ?people sc:name ?name.
}
GROUP BY ?people ?name
HAVING (COUNT(DISTINCT ?book) > 4 )
"""

print("Results")
x=run_query(queryString)

Results
[('people', 'http://www.wikidata.org/entity/Q7830608'), ('name', 'Toyin Falola'), ('books', '10')]
[('people', 'http://www.wikidata.org/entity/Q61682'), ('name', 'Nikolaus Pevsner'), ('books', '15')]
[('people', 'http://www.wikidata.org/entity/Q556140'), ('name', 'Ian Hodder'), ('books', '5')]
3


## Task 7

In [41]:
## find P50 author
queryString = """
SELECT DISTINCT COUNT(*)
WHERE { 
    ?people wdt:P108 wd:Q35794.
}
"""

print("Results")
x=run_query(queryString)

Results
[('callret-0', '5053')]
1
