# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Directors explorative search") 

Consider the following exploratory information need:

> investigate the results concerning the common aspects between movies directed by Woody Allen or Quentin Tarantino. We want to know the people that worked for both directors with some numerical analyses, what are the differences in terms of budget for their movies, who won more Academy Awards. 

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P106`     | profession   | predicate |
| `wdt:P3342`     | Significant person       | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q2526255`| Director       | node |
| `wd:Q25089`| Woody Allen       | node |
| `wd:Q3772`    | Quentin Tarantino      | node |





Also consider

```
wd:Q25089 ?p ?obj .
```

is the BGP to retrieve all **properties of Woody Allen**


The workload should:


1. Identify the BGP for films

2. Identify the BGP for directors

3. Identify the BGP for workers in a films

4. Compare the workers amongst the films directed by the two directors

5. Return some numerical comparison between the two directors (e.g., how many workers in Tarantino's movies who also worked in Allen's films?, what is the film with the highest number of shared actors? Who is the most used actor by both the directors? etc. )

6. Is the maximum budget for a Tarantino's movie higher of the max budget of an Allen's movie?

7. Who has films with more nominations for Academy Awards and who won more Academy Awards (with his films not only personal awards).

    7.1 Find the BGP for Academy Awards 

    7.2 Find the related subproperties

    7.3 Find how they are related to the directors
    
    7.4 Are there alternative queries to get the same result?



In [1]:
# start your workflow here

In [3]:
queryString = """
SELECT *
WHERE { 

wd:Q25089 ?p ?obj .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), ('obj', 'http://wikiba.se/ontology#Item')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P1006'), ('obj', 'http://data.bibliotheken.nl/id/thes/p06917654X')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P1015'), ('obj', 'https://livedata.bibsys.no/authority/90085344')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P1150'), ('obj', 'http://rvk.uni-regensburg.de/nt/HU+3027')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P1741'), ('obj', 'http://data.beeldengeluid.nl/gtaa/76199')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P214'), ('obj', 'http://viaf.org/viaf/59077912')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P2163'), ('obj', 'http://id.worldcat.org/fast/1897483')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P227'), ('obj', 'https://d-nb.info/gnd/118502077')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P244'), ('obj', 'https://id.loc.go

334