# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

## Workflow 5


Consider the following exploratory scenario:


>  we are interested in western horror movies, a particular subgenre mixing two main genres: Horror and Western. We want to investigate what are the principal movies in the instersection between these two genres. 



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P136`    | genre      | predicate |
| `wd:Q11424` | Film        | node |
| `wd:Q835650` | Tremors        | node |



Also consider

```
wd:Q835650 ?p ?obj .
```

is the BGP to retrieve all **properties of Tremors**



The workload should


1. Return the titles of the western-horror movies (with data like year and director) in Wikidata

2. Investigate the number of films directed by the directors of the selected western-horror movies and then get information about the most productive three (films, most used actors in their movies, total revenues if available)

3. Get all the movies in the "Tremors" franchise and check if there are actors in the Tremors movies who also acted in any of the previously selected movies

4. Return how many of the actors who are members of the cast of the Tremors franchise have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

In [1]:
# start your workflow here

In [3]:
queryString = """
SELECT *
WHERE { 

wd:Q835650 ?p ?obj .

} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), ('obj', 'http://wikiba.se/ontology#Item')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P3212'), ('obj', 'urn:isan:0000-0000-7ADA-0000-8-0000-0000-D')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P4276'), ('obj', 'https://data.cinematheque.qc.ca/data/Work56604')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P646'), ('obj', 'http://g.co/kg/m/027ywq')]
[('p', 'http://www.wikidata.org/prop/direct/P1040'), ('obj', 'http://www.wikidata.org/entity/Q1293442')]
[('p', 'http://www.wikidata.org/prop/direct/P1237'), ('obj', 'tremors')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('obj', 'm/tremors')]
[('p', 'http://www.wikidata.org/prop/direct/P1265'), ('obj', '5674')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q1065444')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q157443')]
[('p', 'http://www.

117