# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("The Batman movies explorative search") 

Consider the following exploratory scenario:


> we are interested in movies about the Batman. We want to investigate the differences between the variuos series of films produced in different decades. 


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q2695156` | Batman        | node |
| `wd:Q25191` | Christopher Nolan         | node |
| `wd:Q12859908'` | The Dark Knight Trilogy | node |



Also consider

```
wd:Q25191 ?p ?obj .
```

is the BGP to retrieve all **properties of Christopher Nolan**

The workload should


1. Investigate the works (aka derivative works) related to the Batman and individuate the movies. Return the movies along with the year of production and the director.

2. Return the main Batman movie series produced in the last four decades and compare them in terms of length, number of actors involved and costs.

3. Investigate what are the workers (writers, actors, etc.) who had a role in more Batman movies so far.

4. Compare the ratings of the single movies and of the series. Indentify the movie with highest rating from the critics and the "best" series overall

5. Return how many actors who are members of the cast of the "Dark Knight Trilogy" by Christopher Nolan have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 

In [2]:
# start your workflow here

### Task 1. Investigate the works (aka derivative works) related to the Batman and individuate the movies. Return the movies along with the year of production and the director.

In [3]:
queryString = """
SELECT DISTINCT ?p ?pname 
WHERE { 

    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname
    FILTER(REGEX(?pname, ".*[F|f]ilm.*") || REGEX(?pname, ".*[M|m]ovie.*"))
}
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q11424'), ('pname', 'film')]
[('p', 'http://www.wikidata.org/entity/Q63998451'), ('pname', 'superhero film character')]
[('p', 'http://www.wikidata.org/entity/Q18011172'), ('pname', 'film project')]


3

In [4]:
queryString = """
SELECT DISTINCT ?derivw ?dwname
WHERE { 

    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname. FILTER(REGEX(?pname, "^film$", "i"))
    ?derivw sc:name ?dwname .
}
"""
print("Results")
run_query(queryString)

Results
[('derivw', 'http://www.wikidata.org/entity/Q116852'), ('dwname', 'Batman')]
[('derivw', 'http://www.wikidata.org/entity/Q221345'), ('dwname', 'Batman Forever')]
[('derivw', 'http://www.wikidata.org/entity/Q21095079'), ('dwname', 'Batman: Bad Blood')]
[('derivw', 'http://www.wikidata.org/entity/Q189054'), ('dwname', 'Batman Returns')]
[('derivw', 'http://www.wikidata.org/entity/Q163872'), ('dwname', 'The Dark Knight')]
[('derivw', 'http://www.wikidata.org/entity/Q166262'), ('dwname', 'Batman Begins')]
[('derivw', 'http://www.wikidata.org/entity/Q189330'), ('dwname', 'The Dark Knight Rises')]
[('derivw', 'http://www.wikidata.org/entity/Q276523'), ('dwname', 'Batman & Robin')]


8

In [5]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 
    wd:Q276523 ?p ?o.
    ?p sc:name ?pname .
    FILTER (REGEX(?pname, ".*[Y|y]ear.*") || REGEX(?pname, ".*[D|d]ate.*") || REGEX(?pname, ".*[D|d]ir.*"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P3340'), ('pname', 'Kvikmyndir film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P344'), ('pname', 'director of photography')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director')]
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pname', 'publication date')]


4

In [6]:
queryString = """
SELECT DISTINCT ?fname (GROUP_CONCAT(DISTINCT ?f; SEPARATOR=", ")) 
                            (GROUP_CONCAT(DISTINCT ?dirname; SEPARATOR=", ")) 
                            (GROUP_CONCAT(DISTINCT ?date; SEPARATOR=", "))
WHERE { 

    wd:Q2695156 wdt:P4969 ?f.
    ?f wdt:P31 ?p .
    ?p sc:name ?pname. FILTER(REGEX(?pname, "^film$", "i"))
    ?f sc:name ?fname .
    ?f wdt:P57 ?director .
    ?director sc:name ?dirname .
    ?f wdt:P577 ?date .
    
}GROUP BY ?fname
"""
print("Results")
run_query(queryString)

Results
[('fname', 'Batman Returns'), ('callret-1', 'http://www.wikidata.org/entity/Q189054'), ('callret-2', 'Tim Burton'), ('callret-3', '1992-06-16 00:00:00Z, 1992-06-19 00:00:00Z, 1992-07-16 00:00:00Z, 1992-07-31 00:00:00Z')]
[('fname', 'The Dark Knight Rises'), ('callret-1', 'http://www.wikidata.org/entity/Q189330'), ('callret-2', 'Christopher Nolan'), ('callret-3', '2012-07-20 00:00:00Z, 2012-07-25 00:00:00Z, 2012-07-26 00:00:00Z')]
[('fname', 'Batman Forever'), ('callret-1', 'http://www.wikidata.org/entity/Q221345'), ('callret-2', 'Joel Schumacher'), ('callret-3', '1995-06-16 00:00:00Z, 1995-07-14 00:00:00Z, 1995-08-03 00:00:00Z')]
[('fname', 'Batman Begins'), ('callret-1', 'http://www.wikidata.org/entity/Q166262'), ('callret-2', 'Christopher Nolan'), ('callret-3', '2005-06-15 00:00:00Z, 2005-06-16 00:00:00Z, 2005-07-27 00:00:00Z')]
[('fname', 'Batman'), ('callret-1', 'http://www.wikidata.org/entity/Q116852'), ('callret-2', 'Tim Burton'), ('callret-3', '1989-06-23 00:00:00Z, 1989

8

### Task 2. Return the main Batman movie series produced in the last four decades and compare them in terms of length, number of actors involved and costs.

In [7]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 
    wd:Q116852 ?p ?o.
    ?p sc:name ?pname .
    FILTER (REGEX(?pname, ".*[S|s]erie.*"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series')]


1

In [8]:
queryString = """
SELECT  DISTINCT ?sname (GROUP_CONCAT(DISTINCT ?fname; SEPARATOR=", ")) 
WHERE { 
    wd:Q2695156 wdt:P4969 ?f.
    ?f wdt:P179 ?serie.
    ?serie sc:name ?sname.
    ?f sc:name ?fname .
    ?f wdt:P577 ?date . FILTER (?date >= 1982-01-01)
}GROUP BY ?sname
"""
print("Results")
run_query(queryString)

Results
[('sname', 'Batman: Arkham'), ('callret-1', 'Batman: Arkham Asylum, Batman: Arkham City, Batman: Arkham Knight, Batman: Arkham Origins, Batman: Arkham Origins Blackgate')]
[('sname', 'Batman'), ('callret-1', 'Batman, Batman & Robin, Batman Forever, Batman Returns')]
[('sname', 'DC Universe Animated Original Movies'), ('callret-1', 'Batman: Bad Blood')]
[('sname', 'The Dark Knight Trilogy'), ('callret-1', 'Batman Begins, The Dark Knight, The Dark Knight Rises')]
[('sname', 'Batman in film'), ('callret-1', 'Batman, Batman & Robin, Batman Begins, Batman Forever, Batman Returns, The Dark Knight, The Dark Knight Rises')]


5

In [9]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 
    wd:Q276523 ?p ?o.
    ?p sc:name ?pname .
} 
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P6359'), ('pname', 'Crew United title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8918'), ('pname', 'Max Movie film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9146'), ('pname', 'CITWF title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6932'), ('pname', 'RogerEbert.com film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9203'), ('pname', 'CineFAN.ro title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9204'), ('pname', 'CinemaRX title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1040'), ('pname', 'film editor')]
[('p', 'http://www.wikidata.org/prop/direct/P1237'), ('pname', 'Box Office Mojo film ID (former scheme)')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1265'), ('pname', 'AlloCiné film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P14

136

In [10]:
queryString = """
SELECT  DISTINCT ?sname (GROUP_CONCAT(DISTINCT ?fname; SEPARATOR=", ")) 
                        (SUM(?duration))
                        (COUNT(?actors))
                        (SUM(?cost))
WHERE { 
    wd:Q2695156 wdt:P4969 ?f.
    ?f wdt:P179 ?serie.
    ?serie sc:name ?sname.
    ?f sc:name ?fname .
    ?f wdt:P577 ?date . FILTER (?date >= 1982-01-01)
    OPTIONAL{?f wdt:P2047 ?duration.}
    OPTIONAL{?f wdt:P161 ?actors.}
    OPTIONAL{?f wdt:P2130 ?cost.}
}GROUP BY ?sname
"""
print("Results")
run_query(queryString)

Results
[('sname', 'Batman: Arkham'), ('callret-1', 'Batman: Arkham Asylum, Batman: Arkham City, Batman: Arkham Knight, Batman: Arkham Origins, Batman: Arkham Origins Blackgate'), ('callret-3', '0')]
[('sname', 'Batman'), ('callret-1', 'Batman, Batman & Robin, Batman Forever, Batman Returns'), ('callret-2', '40188'), ('callret-3', '324'), ('callret-4', '23885000000')]
[('sname', 'DC Universe Animated Original Movies'), ('callret-1', 'Batman: Bad Blood'), ('callret-2', '144'), ('callret-3', '0')]
[('sname', 'The Dark Knight Trilogy'), ('callret-1', 'Batman Begins, The Dark Knight, The Dark Knight Rises'), ('callret-2', '55716'), ('callret-3', '362'), ('callret-4', '73060000000')]
[('sname', 'Batman in film'), ('callret-1', 'Batman, Batman & Robin, Batman Begins, Batman Forever, Batman Returns, The Dark Knight, The Dark Knight Rises'), ('callret-2', '95904'), ('callret-3', '686'), ('callret-4', '96945000000')]


5

### Task 3. Investigate what are the workers (writers, actors, etc.) who had a role in more Batman movies so far.

In [24]:
queryString = """
SELECT ?bname count(?b)
WHERE { 
    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname. FILTER(REGEX(?pname, "^film$", "i"))
    ?derivw sc:name ?dwname.
    ?derivw ?a ?b.
    ?b wdt:P31 wd:Q5.
    ?b sc:name ?bname
}GROUP BY ?bname
ORDER BY DESC (count(?b))
LIMIT 10
"""
print("Results")
run_query(queryString)

Results
[('bname', 'Christopher Nolan'), ('callret-1', '9')]
[('bname', 'Benjamin Melniker'), ('callret-1', '7')]
[('bname', 'Bob Kane'), ('callret-1', '6')]
[('bname', 'Hans Zimmer'), ('callret-1', '6')]
[('bname', 'Tim Burton'), ('callret-1', '5')]
[('bname', 'Pat Hingle'), ('callret-1', '4')]
[('bname', 'Michael Gough'), ('callret-1', '4')]
[('bname', 'Bill Finger'), ('callret-1', '4')]
[('bname', 'Morgan Freeman'), ('callret-1', '3')]
[('bname', 'Cillian Murphy'), ('callret-1', '3')]


10

### Task 4. Compare the ratings of the single movies and of the series. Indentify the movie with highest rating from the critics and the "best" series overall

In [27]:
queryString = """
SELECT DISTINCT ?pname
WHERE { 
    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname.
}
"""
print("Results")
run_query(queryString)

Results
[('pname', 'comics character')]
[('pname', 'extraterrestrials in fiction')]
[('pname', 'video game character')]
[('pname', 'fictional character')]
[('pname', 'film')]
[('pname', 'fictional human')]
[('pname', 'animated character')]
[('pname', 'television character')]
[('pname', 'television series')]
[('pname', 'animated series')]
[('pname', 'video game')]
[('pname', 'superhero film character')]
[('pname', 'fictional detective')]
[('pname', 'film project')]


14

In [32]:
queryString = """
SELECT DISTINCT ?a ?aname
WHERE { 
    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname. FILTER(REGEX(?pname, "^film$", "i"))
    ?derivw sc:name ?dwname.
    ?derivw ?a ?b.
    ?a sc:name ?aname.
    FILTER(REGEX(?aname, ".*rating.*", "i"))
}
"""
print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/prop/direct/P1657'), ('aname', 'MPAA film rating')]
[('a', 'http://www.wikidata.org/prop/direct/P1981'), ('aname', 'FSK film rating')]
[('a', 'http://www.wikidata.org/prop/direct/P2363'), ('aname', 'NMHH film rating')]
[('a', 'http://www.wikidata.org/prop/direct/P2629'), ('aname', 'BBFC rating')]
[('a', 'http://www.wikidata.org/prop/direct/P2747'), ('aname', 'Filmiroda rating')]
[('a', 'http://www.wikidata.org/prop/direct/P2758'), ('aname', 'CNC film rating (France)')]
[('a', 'http://www.wikidata.org/prop/direct/P3306'), ('aname', 'ICAA rating')]
[('a', 'http://www.wikidata.org/prop/direct/P3402'), ('aname', 'CNC film rating (Romania)')]
[('a', 'http://www.wikidata.org/prop/direct/P3650'), ('aname', 'JMK film rating')]
[('a', 'http://www.wikidata.org/prop/direct/P3834'), ('aname', 'RTC film rating')]
[('a', 'http://www.wikidata.org/prop/direct/P5970'), ('aname', 'Medierådet rating')]
[('a', 'http://www.wikidata.org/prop/direct/P2756'), ('aname', 

14

In [37]:
queryString = """
SELECT ?dwname SUM(?rating)
WHERE { 
    wd:Q2695156 wdt:P4969 ?derivw .
    ?derivw wdt:P31 ?p.
    ?p sc:name ?pname. FILTER(REGEX(?pname, "^film$", "i"))
    ?derivw sc:name ?dwname.
    ?derivw wdt:P1657 ?rating.
} GROUP BY ?dwname
"""
print("Results")
run_query(queryString)

Results
[('dwname', 'Batman Forever'), ('callret-1', 'iri_id_0_with_no_name_entry')]
[('dwname', 'The Dark Knight Rises'), ('callret-1', 'iri_id_0_with_no_name_entry')]
[('dwname', 'The Dark Knight'), ('callret-1', 'iri_id_0_with_no_name_entry')]


3