# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-8b1cee9b39-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("The Batman movies explorative search") 

Consider the following exploratory scenario:


> we are interested in movies about the Batman. We want to investigate the differences between the variuos series of films produced in different decades. 


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q2695156` | Batman        | node |
| `wd:Q25191` | Christopher Nolan         | node |
| `wd:Q12859908'` | The Dark Knight Trilogy | node |



Also consider

```
wd:Q25191 ?p ?obj .
```

is the BGP to retrieve all **properties of Christopher Nolan**

The workload should


1. Investigate the works (aka derivative works) related to the Batman and individuate the movies. Return the movies along with the year of production and the director.

2. Return the main Batman movie series produced in the last four decades and compare them in terms of length, number of actors involved and costs.

3. Investigate what are the workers (writers, actors, etc.) who had a role in more Batman movies so far.

4. Compare the ratings of the single movies and of the series. Indentify the movie with highest rating from the critics and the "best" series overall

5. Return how many actors who are members of the cast of the "Dark Knight Trilogy" by Christopher Nolan have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 

In [1]:
# start your workflow here

## Task 1: First of all we search the related works to Batman.

In [10]:
queryString = """
SELECT ?worksname
WHERE { 

    wd:Q2695156 wdt:P4969 ?works .
    ?works <http://schema.org/name> ?worksname .
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('worksname', 'Batman: Arkham Origins')]
[('worksname', 'Batman: Arkham Origins Blackgate')]
[('worksname', 'Batman: Arkham Origins')]
[('worksname', 'Batman: Arkham Knight')]
[('worksname', 'Batman: The Animated Series')]
[('worksname', 'Nite Owl')]
[('worksname', 'Batman: Arkham City')]
[('worksname', 'Batman Beyond')]
[('worksname', 'The Batman')]
[('worksname', 'Batman')]
[('worksname', 'Batman Forever')]
[('worksname', 'Bruce Wayne')]
[('worksname', 'Batman: Bad Blood')]
[('worksname', 'Batman Returns')]
[('worksname', 'Batman')]
[('worksname', 'The Dark Knight')]
[('worksname', 'Batman: Arkham Asylum')]
[('worksname', 'Batman Begins')]
[('worksname', 'The Dark Knight Rises')]
[('worksname', 'Batman & Robin')]
[('worksname', 'The Batman')]
[('worksname', 'Batman: Dark Tomorrow')]
[('worksname', 'Batman (Earth-Two)')]
[('worksname', 'Batman')]
[('worksname', 'Batman of Zur-En-Arrh')]
[('worksname', 'Bruce Wayne')]
[('worksname', 'Batzarro')]
[('worksname', 'Batman')]
[('wo

32

Now let's see if these are movies or not.

In [13]:
queryString = """
SELECT DISTINCT ?fname ?f
WHERE { 

    wd:Q2695156 wdt:P4969 ?works .
    ?works wdt:P31 ?f .
    ?f <http://schema.org/name> ?fname 
} 
LIMIT 50
"""
print("Results")
run_query(queryString)

Results
[('fname', 'comics character'), ('f', 'http://www.wikidata.org/entity/Q1114461')]
[('fname', 'extraterrestrials in fiction'), ('f', 'http://www.wikidata.org/entity/Q1307329')]
[('fname', 'video game character'), ('f', 'http://www.wikidata.org/entity/Q1569167')]
[('fname', 'fictional character'), ('f', 'http://www.wikidata.org/entity/Q95074')]
[('fname', 'film'), ('f', 'http://www.wikidata.org/entity/Q11424')]
[('fname', 'fictional human'), ('f', 'http://www.wikidata.org/entity/Q15632617')]
[('fname', 'animated character'), ('f', 'http://www.wikidata.org/entity/Q15711870')]
[('fname', 'television character'), ('f', 'http://www.wikidata.org/entity/Q15773317')]
[('fname', 'television series'), ('f', 'http://www.wikidata.org/entity/Q5398426')]
[('fname', 'animated series'), ('f', 'http://www.wikidata.org/entity/Q581714')]
[('fname', 'video game'), ('f', 'http://www.wikidata.org/entity/Q7889')]
[('fname', 'superhero film character'), ('f', 'http://www.wikidata.org/entity/Q63998451')

14

So now we know that the derivative works related to Batman that are movies are the ones that are instance of film (Q11424).

In [15]:
queryString = """
SELECT ?worksname ?works
WHERE { 

    wd:Q2695156 wdt:P4969 ?works .
    ?works wdt:P31 wd:Q11424 .
    ?works <http://schema.org/name> ?worksname .
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('worksname', 'Batman'), ('works', 'http://www.wikidata.org/entity/Q116852')]
[('worksname', 'Batman Forever'), ('works', 'http://www.wikidata.org/entity/Q221345')]
[('worksname', 'Batman: Bad Blood'), ('works', 'http://www.wikidata.org/entity/Q21095079')]
[('worksname', 'Batman Returns'), ('works', 'http://www.wikidata.org/entity/Q189054')]
[('worksname', 'The Dark Knight'), ('works', 'http://www.wikidata.org/entity/Q163872')]
[('worksname', 'Batman Begins'), ('works', 'http://www.wikidata.org/entity/Q166262')]
[('worksname', 'The Dark Knight Rises'), ('works', 'http://www.wikidata.org/entity/Q189330')]
[('worksname', 'Batman & Robin'), ('works', 'http://www.wikidata.org/entity/Q276523')]


8

We have to identify the predicates to get the year of publication of the film and the director.

In [201]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 

    wd:Q166262 ?p ?o .
    ?p <http://schema.org/name> ?pname .
    FILTER REGEX(?pname, 'publ|date|director|cost|lenght|durat|series', 'i')
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2047'), ('pname', 'duration')]
[('p', 'http://www.wikidata.org/prop/direct/P2130'), ('pname', 'cost')]
[('p', 'http://www.wikidata.org/prop/direct/P2515'), ('pname', 'costume designer')]
[('p', 'http://www.wikidata.org/prop/direct/P344'), ('pname', 'director of photography')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director')]
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pname', 'publication date')]


7

Finally we found all the property we need.

In [113]:
queryString = """
SELECT DISTINCT ?worksname ?works ?dname ?date
WHERE { 

    wd:Q2695156 wdt:P4969 ?works .
    ?works wdt:P31 wd:Q11424 .
    ?works <http://schema.org/name> ?worksname .
    ?works wdt:P57 ?d .
    ?works wdt:P577 ?date .
    ?d <http://schema.org/name> ?dname .
    
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('worksname', 'Batman'), ('works', 'http://www.wikidata.org/entity/Q116852'), ('dname', 'Tim Burton'), ('date', '1989-10-26T00:00:00Z')]
[('worksname', 'Batman'), ('works', 'http://www.wikidata.org/entity/Q116852'), ('dname', 'Tim Burton'), ('date', '1989-10-20T00:00:00Z')]
[('worksname', 'Batman'), ('works', 'http://www.wikidata.org/entity/Q116852'), ('dname', 'Tim Burton'), ('date', '1989-09-13T00:00:00Z')]
[('worksname', 'Batman'), ('works', 'http://www.wikidata.org/entity/Q116852'), ('dname', 'Tim Burton'), ('date', '1989-06-23T00:00:00Z')]
[('worksname', 'Batman Forever'), ('works', 'http://www.wikidata.org/entity/Q221345'), ('dname', 'Joel Schumacher'), ('date', '1995-08-03T00:00:00Z')]
[('worksname', 'Batman Forever'), ('works', 'http://www.wikidata.org/entity/Q221345'), ('dname', 'Joel Schumacher'), ('date', '1995-07-14T00:00:00Z')]
[('worksname', 'Batman Forever'), ('works', 'http://www.wikidata.org/entity/Q221345'), ('dname', 'Joel Schumacher'), ('date', '1995-06-16T

26

As we can see we have more than one date because probably a film was published in different times in different areas.

## Task 2: First of all we retrieve the movies produced after 1982.

In [45]:
queryString = """
SELECT ?movie ?date
WHERE { 

    wd:Q2695156 wdt:P4969 ?works .
    ?works wdt:P31 wd:Q11424 .
    ?works <http://schema.org/name> ?movie .
    ?works wdt:P577 ?date 
    FILTER (?date >= 1982-01-01)
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('movie', 'Batman'), ('date', '1989-09-13T00:00:00Z')]
[('movie', 'Batman'), ('date', '1989-10-26T00:00:00Z')]
[('movie', 'Batman'), ('date', '1989-06-23T00:00:00Z')]
[('movie', 'Batman'), ('date', '1989-10-20T00:00:00Z')]
[('movie', 'Batman Forever'), ('date', '1995-07-14T00:00:00Z')]
[('movie', 'Batman Forever'), ('date', '1995-08-03T00:00:00Z')]
[('movie', 'Batman Forever'), ('date', '1995-06-16T00:00:00Z')]
[('movie', 'Batman: Bad Blood'), ('date', '2016-01-01T00:00:00Z')]
[('movie', 'Batman: Bad Blood'), ('date', '2016-08-04T00:00:00Z')]
[('movie', 'Batman Returns'), ('date', '1992-06-19T00:00:00Z')]
[('movie', 'Batman Returns'), ('date', '1992-07-31T00:00:00Z')]
[('movie', 'Batman Returns'), ('date', '1992-07-16T00:00:00Z')]
[('movie', 'Batman Returns'), ('date', '1992-06-16T00:00:00Z')]
[('movie', 'The Dark Knight'), ('date', '2008-08-13T00:00:00Z')]
[('movie', 'The Dark Knight'), ('date', '2008-07-18T00:00:00Z')]
[('movie', 'The Dark Knight'), ('date', '2008-08-21T00:0

26

We need to find the actors that partecipate in a movie so let's find the predicate asoociated.

In [87]:
queryString = """
SELECT DISTINCT ?pname ?p 
WHERE { 
    wd:Q116852 ?p ?o .
    ?p <http://schema.org/name> ?pname 
    FILTER REGEX(?pname, 'cast|act', 'i')
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('pname', 'cast member'), ('p', 'http://www.wikidata.org/prop/direct/P161')]
[('pname', 'characters'), ('p', 'http://www.wikidata.org/prop/direct/P674')]


2

The predicate called 'cast member' (P161) is the one that relate the principal actors to a movie. We alredy retrive the predicate that we need for the other information.

In [107]:
queryString = """
SELECT DISTINCT ?moviename ?date ?duration ?cost COUNT(?actname) as ?howmanyactors 
WHERE { 

    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?moviename .
    ?movie wdt:P577 ?date .
    ?movie wdt:P2047 ?duration .
    ?movie wdt:P2130 ?cost .
    ?movie wdt:P161 ?act .
    ?act <http://schema.org/name> ?actname .
    FILTER (?date >= 1982-01-01)
} 
GROUP BY ?moviename ?cost ?date ?duration
ORDER BY DESC (?cost)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('moviename', 'The Dark Knight Rises'), ('date', '2012-07-25T00:00:00Z'), ('duration', '164'), ('cost', '250000000'), ('howmanyactors', '49')]
[('moviename', 'The Dark Knight Rises'), ('date', '2012-07-26T00:00:00Z'), ('duration', '164'), ('cost', '250000000'), ('howmanyactors', '49')]
[('moviename', 'The Dark Knight Rises'), ('date', '2012-07-20T00:00:00Z'), ('duration', '164'), ('cost', '250000000'), ('howmanyactors', '49')]
[('moviename', 'The Dark Knight'), ('date', '2008-07-25T00:00:00Z'), ('duration', '153'), ('cost', '185000000'), ('howmanyactors', '29')]
[('moviename', 'The Dark Knight'), ('date', '2008-08-13T00:00:00Z'), ('duration', '153'), ('cost', '185000000'), ('howmanyactors', '29')]
[('moviename', 'The Dark Knight'), ('date', '2008-07-18T00:00:00Z'), ('duration', '153'), ('cost', '185000000'), ('howmanyactors', '29')]
[('moviename', 'The Dark Knight'), ('date', '2008-08-21T00:00:00Z'), ('duration', '153'), ('cost', '185000000'), ('howmanyactors', '29')]
[('movie

20

## Task 3: We need all the workers related to a film.

In [165]:
queryString = """
SELECT DISTINCT ?pname ?p
WHERE { 
    wd:Q221345 ?p ?o .
    ?p <http://schema.org/name> ?pname
    FILTER REGEX (?pname, 'film|prod|creat|direct|screen|compose', 'i')
} 


"""

print("Results")
run_query(queryString)

Results
[('pname', 'film editor'), ('p', 'http://www.wikidata.org/prop/direct/P1040')]
[('pname', 'Box Office Mojo film ID (former scheme)'), ('p', 'http://www.wikidata.org/prop/direct/P1237')]
[('pname', 'AlloCiné film ID'), ('p', 'http://www.wikidata.org/prop/direct/P1265')]
[('pname', 'producer'), ('p', 'http://www.wikidata.org/prop/direct/P162')]
[('pname', 'MPAA film rating'), ('p', 'http://www.wikidata.org/prop/direct/P1657')]
[('pname', 'DNF film ID'), ('p', 'http://www.wikidata.org/prop/direct/P1804')]
[('pname', 'MovieMeter film ID'), ('p', 'http://www.wikidata.org/prop/direct/P1970')]
[('pname', 'FSK film rating'), ('p', 'http://www.wikidata.org/prop/direct/P1981')]
[('pname', 'Swedish Film Database film ID'), ('p', 'http://www.wikidata.org/prop/direct/P2334')]
[('pname', 'Allcinema film ID'), ('p', 'http://www.wikidata.org/prop/direct/P2465')]
[('pname', 'KINENOTE film ID'), ('p', 'http://www.wikidata.org/prop/direct/P2508')]
[('pname', 'Movie Walker film ID'), ('p', 'http:/

44

After the previus list of predicate we can use these that connect all the important workers related to a movie.

In [164]:
queryString = """
SELECT DISTINCT ?title ?filmeditorname ?producername ?directorofphotographyname ?directorname ?composername ?screenwritername
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P1040 ?filmeditor .
    ?movie wdt:P162 ?producer .
    ?movie wdt:P344 ?directorofphotography .
    ?movie wdt:P57 ?director .
    ?movie wdt:P58 ?screenwriter .
    ?movie wdt:P86 ?composer .
    ?filmeditor <http://schema.org/name> ?filmeditorname .
    ?producer <http://schema.org/name> ?producername .
    ?directorofphotography <http://schema.org/name> ?directorofphotographyname .
    ?director <http://schema.org/name> ?directorname .
    ?composer <http://schema.org/name> ?composername .
    ?screenwriter <http://schema.org/name> ?screenwritername
    }   
} 

LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('title', 'Batman'), ('filmeditorname', 'Ray Lovejoy'), ('producername', 'Peter Guber'), ('directorofphotographyname', 'Roger Pratt'), ('directorname', 'Tim Burton'), ('composername', 'Danny Elfman'), ('screenwritername', 'Bill Finger')]
[('title', 'Batman'), ('filmeditorname', 'Ray Lovejoy'), ('producername', 'Peter Guber'), ('directorofphotographyname', 'Roger Pratt'), ('directorname', 'Tim Burton'), ('composername', 'Danny Elfman'), ('screenwritername', 'Sam Hamm')]
[('title', 'Batman'), ('filmeditorname', 'Ray Lovejoy'), ('producername', 'Peter Guber'), ('directorofphotographyname', 'Roger Pratt'), ('directorname', 'Tim Burton'), ('composername', 'Danny Elfman'), ('screenwritername', 'Warren Skaaren')]
[('title', 'Batman'), ('filmeditorname', 'Ray Lovejoy'), ('producername', 'Jon Peters'), ('directorofphotographyname', 'Roger Pratt'), ('directorname', 'Tim Burton'), ('composername', 'Danny Elfman'), ('screenwritername', 'Bill Finger')]
[('title', 'Batman'), ('filmeditornam

10

Now we can do a bounch of query to see if someone works in more than one film.

In [162]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?filmeditorname 
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P1040 ?filmeditor .
    ?filmeditor <http://schema.org/name> ?filmeditorname .
    }
} 
GROUP BY ?filmeditorname
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '3'), ('filmeditorname', 'Lee Smith')]


1

In [161]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?producername 
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P162 ?producer .
    ?producer <http://schema.org/name> ?producername .
    }
} 
GROUP BY ?producername
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '3'), ('producername', 'Charles Roven')]
[('howmanyfilms', '3'), ('producername', 'Emma Thomas')]
[('howmanyfilms', '2'), ('producername', 'Tim Burton')]
[('howmanyfilms', '2'), ('producername', 'Christopher Nolan')]


4

In [160]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?directorofphotographyname 
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P344 ?directorofphotography .
    ?directorofphotography <http://schema.org/name> ?directorofphotographyname .
    }
} 
GROUP BY ?directorofphotographyname
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '2'), ('directorofphotographyname', 'Stephen Goldblatt')]
[('howmanyfilms', '3'), ('directorofphotographyname', 'Wally Pfister')]


2

In [159]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?directorname 
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P57 ?director .
    ?director <http://schema.org/name> ?directorname .
    }
} 
GROUP BY ?directorname
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '2'), ('directorname', 'Tim Burton')]
[('howmanyfilms', '2'), ('directorname', 'Joel Schumacher')]
[('howmanyfilms', '3'), ('directorname', 'Christopher Nolan')]


3

In [158]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?screenwritername
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P58 ?screenwriter .
    ?screenwriter <http://schema.org/name> ?screenwritername
    }
} 
GROUP BY ?screenwritername
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '3'), ('screenwritername', 'Bob Kane')]
[('howmanyfilms', '3'), ('screenwritername', 'David S. Goyer')]
[('howmanyfilms', '2'), ('screenwritername', 'Jonathan Nolan')]
[('howmanyfilms', '3'), ('screenwritername', 'Bill Finger')]
[('howmanyfilms', '3'), ('screenwritername', 'Christopher Nolan')]
[('howmanyfilms', '2'), ('screenwritername', 'Akiva Goldsman')]


6

In [157]:
queryString = """
SELECT DISTINCT COUNT(?title) as ?howmanyfilms ?composername
WHERE {
    wd:Q2695156 wdt:P4969 ?movie .
    ?movie wdt:P31 wd:Q11424 .
    ?movie <http://schema.org/name> ?title .
    OPTIONAL{
    ?movie wdt:P86 ?composer .
    ?composer <http://schema.org/name> ?composername .
    }
} 
GROUP BY ?composername
HAVING (COUNT(?title)>=2)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('howmanyfilms', '2'), ('composername', 'Elliot Goldenthal')]
[('howmanyfilms', '2'), ('composername', 'Danny Elfman')]
[('howmanyfilms', '2'), ('composername', 'James Newton Howard')]
[('howmanyfilms', '3'), ('composername', 'Hans Zimmer')]


4

## Task 4: We have to find the predicates about the critic score.

In [183]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?a wdt:P31 wd:Q11424 .
    ?a <http://schema.org/name> ?aname .
    ?a ?p ?o .
    ?p <http://schema.org/name> ?pname .
    FILTER REGEX(?pname, 'score|critic|rating', 'i')
} 

LIMIT 200
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1657'), ('pname', 'MPAA film rating')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pname', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1981'), ('pname', 'FSK film rating')]
[('p', 'http://www.wikidata.org/prop/direct/P2363'), ('pname', 'NMHH film rating')]
[('p', 'http://www.wikidata.org/prop/direct/P2629'), ('pname', 'BBFC rating')]
[('p', 'http://www.wikidata.org/prop/direct/P2758'), ('pname', 'CNC film rating (France)')]
[('p', 'http://www.wikidata.org/prop/direct/P3306'), ('pname', 'ICAA rating')]
[('p', 'http://www.wikidata.org/prop/direct/P3402'), ('pname', 'CNC film rating (Romania)')]
[('p', 'http://www.wikidata.org/prop/direct/P3650'), ('pname', 'JMK film rating')]
[('p', 'http://www.wikidata.org/prop/direct/P3834'), ('pname', 'RTC film rating')]
[('p', 'http://www.wikidata.org/prop/direct/P5970'), ('pname', 'Medierådet rating')]
[('p', 'http://www.wikidata.org/prop/direct/P2684'), ('pname', 'Ki

32

Here we can see that we have a lot of predicate for our purpose, but I think that the best are 'rating' (P4271) and 'review score' (P444).
After some try I found that 'review' score is the best. 

In [198]:
queryString = """
SELECT DISTINCT  ?aname ?score
WHERE {
    ?a wdt:P31 wd:Q11424 .
    ?a <http://schema.org/name> ?aname .
    ?a wdt:P444 ?score .
    FILTER REGEX(?score, '%$', 'i')
} 
ORDER BY DESC(?score)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('aname', 'Citizen Kane'), ('score', '99%')]
[('aname', 'Brazil'), ('score', '98%')]
[('aname', 'Hamilton'), ('score', '98%')]
[('aname', 'Black Panther'), ('score', '96%')]
[('aname', 'Nightcrawler'), ('score', '95%')]
[('aname', 'The Shawshank Redemption'), ('score', '95%')]
[('aname', 'Yojimbo'), ('score', '95%')]
[('aname', 'Monty Python’s Life of Brian'), ('score', '95%')]
[('aname', 'Lost in Translation'), ('score', '95%')]
[('aname', 'A Short Film About Love'), ('score', '95%')]
[('aname', 'Visions of Light'), ('score', '95%')]
[('aname', 'Iron Man'), ('score', '94%')]
[('aname', 'Avengers: Endgame'), ('score', '94%')]
[('aname', 'Juno'), ('score', '94%')]
[('aname', 'Logan'), ('score', '93%')]
[('aname', 'Thor: Ragnarok'), ('score', '93%')]
[('aname', 'Fantastic Mr. Fox'), ('score', '93%')]
[('aname', 'A Better Tomorrow'), ('score', '93%')]
[('aname', 'Dillinger'), ('score', '93%')]
[('aname', 'Captain Phillips'), ('score', '93%')]
[('aname', 'Disobedience'), ('score',

50

Here we have all the series with the average of the score.

In [7]:
queryString = """
SELECT DISTINCT  AVG(xsd:integer(SUBSTR(?score, 1,2))) as ?score ?seriename ?serie
WHERE {
    ?a wdt:P31 wd:Q11424 .
    ?a <http://schema.org/name> ?aname .
    ?a wdt:P444 ?score .
    ?a wdt:P179 ?serie .
    ?serie <http://schema.org/name> ?seriename .
    FILTER REGEX(?score, '%$', 'i')
} 
GROUP BY ?seriename ?serie
ORDER BY DESC(?score)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('score', '96'), ('seriename', 'Black Panther'), ('serie', 'http://www.wikidata.org/entity/Q76127371')]
[('score', '94'), ('seriename', 'Avengers'), ('serie', 'http://www.wikidata.org/entity/Q20021634')]
[('score', '94'), ('seriename', 'Iron Man'), ('serie', 'http://www.wikidata.org/entity/Q16642701')]
[('score', '93'), ('seriename', 'X-Men'), ('serie', 'http://www.wikidata.org/entity/Q2006869')]
[('score', '93'), ('seriename', 'Wolverine trilogy'), ('serie', 'http://www.wikidata.org/entity/Q18286530')]
[('score', '91'), ('seriename', 'Star Wars'), ('serie', 'http://www.wikidata.org/entity/Q22092344')]
[('score', '91'), ('seriename', 'Star Wars sequel trilogy'), ('serie', 'http://www.wikidata.org/entity/Q6586871')]
[('score', '91'), ('seriename', 'Marvel Cinematic Universe Phase Three'), ('serie', 'http://www.wikidata.org/entity/Q51963836')]
[('score', '90'), ('seriename', 'Captain America'), ('serie', 'http://www.wikidata.org/entity/Q17385223')]
[('score', '89'), ('seriename'

30

I can't do an average of the score per series because the value of the score is not a simple integer or double.

## Task 5: Here we retrieve all the cast members of 'The Dark Trilogy'.

In [221]:
queryString = """
SELECT DISTINCT ?actorname ?actor
WHERE {
   wd:Q12859908 wdt:P161 ?actor .
   ?actor <http://schema.org/name> ?actorname
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('actorname', 'Christian Bale'), ('actor', 'http://www.wikidata.org/entity/Q45772')]
[('actorname', 'Liam Neeson'), ('actor', 'http://www.wikidata.org/entity/Q58444')]
[('actorname', 'Katie Holmes'), ('actor', 'http://www.wikidata.org/entity/Q174346')]
[('actorname', 'Maggie Gyllenhaal'), ('actor', 'http://www.wikidata.org/entity/Q202381')]
[('actorname', 'Gary Oldman'), ('actor', 'http://www.wikidata.org/entity/Q83492')]
[('actorname', 'Tom Hardy'), ('actor', 'http://www.wikidata.org/entity/Q208026')]
[('actorname', 'Morgan Freeman'), ('actor', 'http://www.wikidata.org/entity/Q48337')]
[('actorname', 'Joseph Gordon-Levitt'), ('actor', 'http://www.wikidata.org/entity/Q177311')]
[('actorname', 'Anne Hathaway'), ('actor', 'http://www.wikidata.org/entity/Q36301')]
[('actorname', 'Marion Cotillard'), ('actor', 'http://www.wikidata.org/entity/Q8927')]
[('actorname', 'Heath Ledger'), ('actor', 'http://www.wikidata.org/entity/Q40572')]
[('actorname', 'Michael Caine'), ('actor', 'http

14

Let's find Kevin Bacon.

In [247]:
queryString = """
SELECT DISTINCT ?actor ?actorname
WHERE {
    ?movie wdt:P161 ?actor .
    ?actor <http://schema.org/name> ?actorname .
    FILTER REGEX(?actorname, "Kevin Bacon", 'i')
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q3454165'), ('actorname', 'Kevin Bacon')]


1

In [252]:
queryString = """
SELECT DISTINCT ?actorname
WHERE {
    wd:Q12859908 wdt:P161 ?actor .
    ?actor <http://schema.org/name> ?actorname .
    ?movies  wdt:P161 ?actor .
    ?movies wdt:P161 ?actor1 .
    ?movies1 wdt:P161 ?actor1 .
    ?movies2 wdt:P161 ?actor1 .
    ?movies2 wdt:P161 ?actor2 .
    FILTER(?actor2 = wd:Q3454165)
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('actorname', 'Katie Holmes')]
[('actorname', 'Morgan Freeman')]
[('actorname', 'Cillian Murphy')]
[('actorname', 'Gary Oldman')]
[('actorname', 'Tom Hardy')]
[('actorname', 'Christian Bale')]
[('actorname', 'Maggie Gyllenhaal')]
[('actorname', 'Aaron Eckhart')]
[('actorname', 'Michael Caine')]
[('actorname', 'Marion Cotillard')]
[('actorname', 'Heath Ledger')]
[('actorname', 'Liam Neeson')]
[('actorname', 'Joseph Gordon-Levitt')]
[('actorname', 'Anne Hathaway')]


14