# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-638f3972bf-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Directors explorative search") 

Consider the following exploratory information need:

> investigate the results concerning the common aspects between movies directed by Woody Allen or Quentin Tarantino. We want to know the people that worked for both directors with some numerical analyses, what are the differences in terms of budget for their movies, who won more Academy Awards. 

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P106`     | profession   | predicate |
| `wdt:P3342`     | Significant person       | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q2526255`| Director       | node |
| `wd:Q25089`| Woody Allen       | node |
| `wd:Q3772`    | Quentin Tarantino      | node |





Also consider

```
wd:Q25089 ?p ?obj .
```

is the BGP to retrieve all **properties of Woody Allen**


The workload should:


1. Identify the BGP for films

2. Identify the BGP for directors

3. Identify the BGP for workers in a films

4. Compare the workers amongst the films directed by the two directors

5. Return some numerical comparison between the two directors (e.g., how many workers in Tarantino's movies who also worked in Allen's films?, what is the film with the highest number of shared actors? Who is the most used actor by both the directors? etc. )

6. Is the maximum budget for a Tarantino's movie higher of the max budget of an Allen's movie?

7. Who has films with more nominations for Academy Awards and who won more Academy Awards (with his films not only personal awards).

    7.1 Find the BGP for Academy Awards 

    7.2 Find the related subproperties

    7.3 Find how they are related to the directors
    
    7.4 Are there alternative queries to get the same result?



## Task 1. Identify the BGP for films

List all the properties of Woody Allen

In [2]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q25089 ?p ?o.
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after'), ('o', 'http://www.wikidata.org/entity/Q349357'), ('oName', 'Woody Herman')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pName', 'spouse'), ('o', 'http://www.wikidata.org/entity/Q1140914'), ('oName', 'Soon-Yi Previn')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('pName', 'child'), ('o', 'http://www.wikidata.org/entity/Q22673707'), ('oName', 'Bechet Dumaine Allen')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('pName', 'child'), ('o', 'http://www.wikidata.org/entity/Q22673708'), ('oName', 'Manzie Tio Allen')]
[('p', 'http://www.wikidata.org/prop/direct/P19'), ('pName', 'place of birth'), ('o', 'http://www.wikidata.org/entity/Q18426'), ('oName', 'The Bronx')]
[('p', 'http://www.wikidata.org/prop/direct/P451'), ('pName', 'unmarried partner'), ('o', 'http://www.wikidata.org/entity/Q102642'), ('oName', 'Diane Keaton')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'aw

306

Starting from Woody Allen, find some property that contains "film"

In [3]:
queryString = """
select distinct ?p ?pName where { 
    wd:Q25089 ?p ?o .
    ?p <http://schema.org/name> ?pName .
    
    filter(regex(?pName, "film"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1283'), ('pName', 'filmography')]


1

I found out that the property **P1283** refers to the filmography of a director. This means that from Woody Allen we can retrieve his filmography.

In [4]:
queryString = """
select ?filmography ?fName where { 
    wd:Q25089 wdt:P1283 ?filmography .
    
    ?filmography <http://schema.org/name> ?fName .
} 
"""

print("Results")
run_query(queryString)

Results
[('filmography', 'http://www.wikidata.org/entity/Q2455835'), ('fName', 'Woody Allen filmography')]


1

From the filmography of Woody Allen, search for entities that contains "film".

In [5]:
queryString = """
select ?p ?pName ?o ?oName where { 
    wd:Q2455835 ?p ?o .
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
    
    filter(regex(?oName, "film"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q1371849'), ('oName', 'filmography')]
[('p', 'http://www.wikidata.org/prop/direct/P360'), ('pName', 'is a list of'), ('o', 'http://www.wikidata.org/entity/Q11424'), ('oName', 'film')]


2

And finally, we retrieve that "filmography" has a property named "is a list of" that points to the entity "film". I can use this entity to look for films.

So **Q11424** is the entity of film. And the GPB to retrieve all films is:
```
?film wdt:P31 wd:Q11424
```

In [6]:
queryString = """
select distinct ?film ?filmName where { 
    ?film wdt:P31 wd:Q11424 .
    
    ?film <http://schema.org/name> ?filmName .
} limit 10
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q8178307'), ('filmName', '15 Park Avenue')]
[('film', 'http://www.wikidata.org/entity/Q8197668'), ('filmName', 'Idiot Love')]
[('film', 'http://www.wikidata.org/entity/Q319263'), ('filmName', 'Doctor in the House')]
[('film', 'http://www.wikidata.org/entity/Q8452990'), ('filmName', "Sunday League - Pepik Hnatek's Final Match")]
[('film', 'http://www.wikidata.org/entity/Q284184'), ('filmName', 'Ra.One')]
[('film', 'http://www.wikidata.org/entity/Q13637926'), ('filmName', 'Indra')]
[('film', 'http://www.wikidata.org/entity/Q13638153'), ('filmName', 'Inner Senses')]
[('film', 'http://www.wikidata.org/entity/Q13645807'), ('filmName', 'Madhumasam')]
[('film', 'http://www.wikidata.org/entity/Q13647258'), ('filmName', 'Marupakkam')]
[('film', 'http://www.wikidata.org/entity/Q13677566'), ('filmName', 'Timberjack')]


10

## Task 2. Identify the BGP for directors

To find the property "director", I can look for properties connected to any film which contains "direct".

In [7]:
queryString = """
select distinct ?p ?pName where { 
    ?film wdt:P31 wd:Q11424 ;
          ?p ?obj .
    
    ?p <http://schema.org/name> ?pName .
    
    filter(regex(?pName, "direct"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P344'), ('pName', 'director of photography')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pName', 'director')]
[('p', 'http://www.wikidata.org/prop/direct/P3174'), ('pName', 'art director')]
[('p', 'http://www.wikidata.org/prop/direct/P5126'), ('pName', 'assistant director')]
[('p', 'http://www.wikidata.org/prop/direct/P1037'), ('pName', 'director / manager')]


5

And so we can find that **P57** is the property for "director". To find all the directors of films we can use the following BGP.

```
?film wdt:P31 wd:Q11424 ;
      wdt:P57 ?director .
```

In [8]:
queryString = """
select distinct ?director ?directorName where { 
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director .
    
    ?director <http://schema.org/name> ?directorName .
} limit 10
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q1202851'), ('directorName', 'Kihachi Okamoto')]
[('director', 'http://www.wikidata.org/entity/Q1357018'), ('directorName', 'Raoul Peck')]
[('director', 'http://www.wikidata.org/entity/Q15042966'), ('directorName', 'Jan Prušinovský')]
[('director', 'http://www.wikidata.org/entity/Q3053027'), ('directorName', 'Emmett J. Flynn')]
[('director', 'http://www.wikidata.org/entity/Q505780'), ('directorName', 'Andrew L. Stone')]
[('director', 'http://www.wikidata.org/entity/Q550854'), ('directorName', 'Jorge Brum do Canto')]
[('director', 'http://www.wikidata.org/entity/Q5642632'), ('directorName', 'Takeshi Furusawa')]
[('director', 'http://www.wikidata.org/entity/Q7820336'), ('directorName', 'Tomoyuki Furumaya')]
[('director', 'http://www.wikidata.org/entity/Q1698561'), ('directorName', 'Johannes Wahlström')]
[('director', 'http://www.wikidata.org/entity/Q3120961'), ('directorName', 'Gurmeet Singh')]


10

## Task 3. Identify the BGP for workers in a films

From the films I can see the properties which connect films to humans.

In [9]:
queryString = """
select distinct ?p ?pName where {
    ?film wdt:P31 wd:Q11424 ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
    
    ?p <http://schema.org/name> ?pName .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1809'), ('pName', 'choreographer')]
[('p', 'http://www.wikidata.org/prop/direct/P1040'), ('pName', 'film editor')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P162'), ('pName', 'producer')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pName', 'performer')]
[('p', 'http://www.wikidata.org/prop/direct/P2515'), ('pName', 'costume designer')]
[('p', 'http://www.wikidata.org/prop/direct/P344'), ('pName', 'director of photography')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pName', 'director')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pName', 'screenwriter')]
[('p', 'http://www.wikidata.org/prop/direct/P676'), ('pName', 'lyrics by')]
[('p', 'http://www.wikidata.org/prop/direct/P86'), ('pName', 'composer')]
[('p', 'http://www.wikidata.org/prop/direct/P2554'), ('pName', 'production designer')]
[('p', 'http://www.wikidata.org/prop/dir

79

Let's take "director" (**P57**) and "cast member" (**P161**) properties.

In [10]:
queryString = """
select distinct ?work ?p ?pName ?o ?oName where {
    values ?work {wd:P57 wd:P161}
    
    ?work ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
} order by ?work
"""

print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/P161'), ('p', 'http://www.wikidata.org/prop/direct/P1629'), ('pName', 'Wikidata item of this property'), ('o', 'http://www.wikidata.org/entity/Q337084'), ('oName', 'drag queen')]
[('work', 'http://www.wikidata.org/entity/P161'), ('p', 'http://www.wikidata.org/prop/direct/P1629'), ('pName', 'Wikidata item of this property'), ('o', 'http://www.wikidata.org/entity/Q622807'), ('oName', 'seiyū')]
[('work', 'http://www.wikidata.org/entity/P161'), ('p', 'http://www.wikidata.org/prop/direct/P1629'), ('pName', 'Wikidata item of this property'), ('o', 'http://www.wikidata.org/entity/Q713200'), ('oName', 'performing artist')]
[('work', 'http://www.wikidata.org/entity/P161'), ('p', 'http://www.wikidata.org/prop/direct/P1629'), ('pName', 'Wikidata item of this property'), ('o', 'http://www.wikidata.org/entity/Q27658988'), ('oName', 'reality television participant')]
[('work', 'http://www.wikidata.org/entity/P161'), ('p', 'http://www.wikidata.org/pro

64

I can see that the entities releated to the property "director" and "cast member" are "director" (**Q3455803**) and "film actor" (**Q10800557**) (and many other, but take them as example). I see the properties of those entities.

In [11]:
queryString = """
select distinct ?work ?p ?pName ?o ?oName where {
    values ?work {wd:Q10800557 wd:Q3455803}
    
    ?work ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
} order by ?work
"""

print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/Q10800557'), ('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q4220920'), ('oName', 'filmmaking occupation')]
[('work', 'http://www.wikidata.org/entity/Q10800557'), ('p', 'http://www.wikidata.org/prop/direct/P910'), ('pName', "topic's main category"), ('o', 'http://www.wikidata.org/entity/Q5479723'), ('oName', 'Category:Film actors')]
[('work', 'http://www.wikidata.org/entity/Q10800557'), ('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from'), ('o', 'http://www.wikidata.org/entity/Q10798782'), ('oName', 'television actor')]
[('work', 'http://www.wikidata.org/entity/Q10800557'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q33999'), ('oName', 'actor')]
[('work', 'http://www.wikidata.org/entity/Q10800557'), ('p', 'http://www.wikidata.org/prop/direct/P1687'), ('pName', 'Wikidata pro

42

I see the properties of the instances of the two workers, filmmaking occupation (**Q4220920**) and profession (**Q28640**).

In [12]:
queryString = """
select distinct ?work ?p ?pName ?o ?oName where {
    values ?work {wd:Q4220920 wd:Q28640}
    
    ?work ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
} order by ?work
"""

print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/Q28640'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q192581'), ('oName', 'job')]
[('work', 'http://www.wikidata.org/entity/Q28640'), ('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from'), ('o', 'http://www.wikidata.org/entity/Q828803'), ('oName', 'job title')]
[('work', 'http://www.wikidata.org/entity/Q28640'), ('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source'), ('o', 'http://www.wikidata.org/entity/Q101314624'), ('oName', 'Lean Logic: A Dictionary for the Future and How to Survive It')]
[('work', 'http://www.wikidata.org/entity/Q28640'), ('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from'), ('o', 'http://www.wikidata.org/entity/Q12737077'), ('oName', 'occupation')]
[('work', 'http://www.wikidata.org/entity/Q28640'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 's

22

The subclasses job (**Q192581**), occupation (**Q12737077**), specialty (**Q1047113**), profession (**Q28640**, as seen before) have properties:

In [13]:
queryString = """
select distinct ?work ?p ?pName ?o ?oName where {
    values ?work {wd:Q192581 wd:Q12737077 wd:Q1047113 wd:Q28640}
    
    ?work ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
} order by ?work
"""

print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/Q1047113'), ('p', 'http://www.wikidata.org/prop/direct/P1382'), ('pName', 'partially coincident with'), ('o', 'http://www.wikidata.org/entity/Q644238'), ('oName', 'expertise')]
[('work', 'http://www.wikidata.org/entity/Q1047113'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q9081'), ('oName', 'knowledge')]
[('work', 'http://www.wikidata.org/entity/Q1047113'), ('p', 'http://www.wikidata.org/prop/direct/P460'), ('pName', 'said to be the same as'), ('o', 'http://www.wikidata.org/entity/Q11862829'), ('oName', 'academic discipline')]
[('work', 'http://www.wikidata.org/entity/Q1047113'), ('p', 'http://www.wikidata.org/prop/direct/P460'), ('pName', 'said to be the same as'), ('o', 'http://www.wikidata.org/entity/Q2465832'), ('oName', 'branch of science')]
[('work', 'http://www.wikidata.org/entity/Q1047113'), ('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'insta

45

And finally I find out that there can be a path from an occupation to "profession", identified by **Q28640**.
In particular I see that director is an instance of profession. Film actor is an instance of filmmaking occupation which is a sublass of profession.
The path is ```occupation->instance of->subclass of*->profession```.
I need also to exlude some properties, such as "characters" (**P674**), that are fictional person but not workers.
The properties that I'll remove are P674, P453, P2354, P859, P136, P825, P155.

Analyzing the properties for example for the property "director", I can see that it is an instance of "Wikidata property for items about works" (**Q18618644**). This should be helpful to retrieve the workers in a better way. Unfurtunatly, I cannot use a BGP of the type:

```
?film wdt:P31 wd:Q11424 ;
      ?p ?worker .
?worker wdt:P31 wd:Q5 .
?p wdt:P31 wd:Q18618644 .
```

And so, I must use the following BGP for workers in a film:

```
?film wdt:P31 wd:Q11424 ;
      ?p ?worker .
?worker wdt:P31 wd:Q5 ;
    wdt:P106/wdt:P31/wdt:P279* wd:Q28640 .

filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
```

The following query shows 10 workers in the films of Quentin Tarantino.

In [14]:
queryString = """
select distinct ?worker ?pName ?workerName where {
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 wd:Q3772;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P106/wdt:P31/wdt:P279* wd:Q28640 .
    
    ?p <http://schema.org/name> ?pName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} limit 10
"""

print("Results")
run_query(queryString)

Results
[('worker', 'http://www.wikidata.org/entity/Q172678'), ('pName', 'narrator'), ('workerName', 'Samuel L. Jackson')]
[('worker', 'http://www.wikidata.org/entity/Q47284'), ('pName', 'film editor'), ('workerName', 'Robert Rodriguez')]
[('worker', 'http://www.wikidata.org/entity/Q57147'), ('pName', 'cast member'), ('workerName', 'Michael Fassbender')]
[('worker', 'http://www.wikidata.org/entity/Q351732'), ('pName', 'cast member'), ('workerName', 'Sonny Chiba')]
[('worker', 'http://www.wikidata.org/entity/Q234094'), ('pName', 'cast member'), ('workerName', 'Chiaki Kuriyama')]
[('worker', 'http://www.wikidata.org/entity/Q1985488'), ('pName', 'cast member'), ('workerName', 'Nick Offerman')]
[('worker', 'http://www.wikidata.org/entity/Q104061'), ('pName', 'cast member'), ('workerName', 'Steve Buscemi')]
[('worker', 'http://www.wikidata.org/entity/Q5896455'), ('pName', 'cast member'), ('workerName', 'Slim Khezri')]
[('worker', 'http://www.wikidata.org/entity/Q188375'), ('pName', 'cast me

10

We can compare also the queries of workers in the two directors films with and without the property path to profession, to see if something changes.

In [15]:
queryString = """
select count(distinct ?worker) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '1213')]


1

In [16]:
queryString = """
select count(distinct ?worker) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '1187')]


1

In [17]:
queryString = """
select count(distinct ?worker) where {
    values ?director {wd:Q25089 wd:Q3772}
    
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
        wdt:P106/wdt:P31/wdt:P279* wd:Q28640 .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '1184')]


1

So I can discover that almost nothing changes between the BGP with the "profession" path and the one without it, and I can exclude the path from the BGP, that has also an huge impact from the point of view of the performance.

The final BGP for workers is:
```
?film wdt:P31 wd:Q11424 ;
      ?p ?worker .
?worker wdt:P31 wd:Q5 .

filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
```

## Task 4. Compare the workers amongst the films directed by the two directors

List films directed by Quentin Tarantino and Woody Allen

In [18]:
queryString = """
select distinct ?film ?filmName ?director ?directorName where {
    values ?director {wd:Q25089 wd:Q3772}
    
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director .
    
    ?director <http://schema.org/name> ?directorName .
    ?film <http://schema.org/name> ?filmName .
} order by ?director
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q3986392'), ('filmName', 'The Concert for New York City'), ('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen')]
[('film', 'http://www.wikidata.org/entity/Q206124'), ('filmName', 'Midnight in Paris'), ('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen')]
[('film', 'http://www.wikidata.org/entity/Q682262'), ('filmName', 'Alice'), ('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen')]
[('film', 'http://www.wikidata.org/entity/Q971865'), ('filmName', "What's Up, Tiger Lily?"), ('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen')]
[('film', 'http://www.wikidata.org/entity/Q1004531'), ('filmName', 'Bullets Over Broadway'), ('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen')]
[('film', 'http://www.wikidata.org/entity/Q1354109'), ('filmName', 'Mighty Aphrodite'), ('director', 

63

List the workers for "Django Unchained" (only to check if there are some known workers like Ennio Morricone or DiCaprio)

In [19]:
queryString = """
select distinct ?worker ?workerName where {
    wd:Q571032 ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
    
    ?p <http://schema.org/name> ?pName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} order by ?workerName
"""

print("Results")
run_query(queryString)

Results
[('worker', 'http://www.wikidata.org/entity/Q231128'), ('workerName', 'Amber Tamblyn')]
[('worker', 'http://www.wikidata.org/entity/Q4817144'), ('workerName', 'Ato Essandoh')]
[('worker', 'http://www.wikidata.org/entity/Q888311'), ('workerName', 'Bob Weinstein')]
[('worker', 'http://www.wikidata.org/entity/Q357001'), ('workerName', 'Bruce Dern')]
[('worker', 'http://www.wikidata.org/entity/Q76819'), ('workerName', 'Christoph Waltz')]
[('worker', 'http://www.wikidata.org/entity/Q2912708'), ('workerName', 'Cooper Huckabee')]
[('worker', 'http://www.wikidata.org/entity/Q23010323'), ('workerName', 'Dana Gourrier')]
[('worker', 'http://www.wikidata.org/entity/Q18049086'), ('workerName', 'Danièle Watts')]
[('worker', 'http://www.wikidata.org/entity/Q5240051'), ('workerName', 'David Steen')]
[('worker', 'http://www.wikidata.org/entity/Q1278871'), ('workerName', 'Dennis Christopher')]
[('worker', 'http://www.wikidata.org/entity/Q309788'), ('workerName', 'Don Johnson')]
[('worker', 'htt

62

## Task 5. Return some numerical comparison between the two directors

### How many people worked for Tarantino and Allen's films?

Count all the workers for each director's films

In [20]:
queryString = """
select ?director (count(distinct ?worker) as ?numWorkers) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} group by ?director
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q25089'), ('numWorkers', '916')]
[('director', 'http://www.wikidata.org/entity/Q3772'), ('numWorkers', '285')]


2

We can also see that Woody Allen has more films than Quentin Tarantino, so it can be helpfull to see the average number of workers for their films.

In [21]:
queryString = """
select ?director (avg(?numWorkers) as ?avgWorkers) where {
    {
        select ?director ?film (count(distinct ?worker) as ?numWorkers) where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  ?p ?worker .
            ?worker wdt:P31 wd:Q5 .
            
            filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
        } group by ?director ?film
    }
    
} group by ?director
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q25089'), ('avgWorkers', '26.764705882352941')]
[('director', 'http://www.wikidata.org/entity/Q3772'), ('avgWorkers', '33.916666666666667')]


2

### How many workers in Tarantino's movies who also worked in Allen's films

First let's take a look to which people worked for both directors.

In [22]:
queryString = """
select ?worker ?workerName where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 .
    
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} group by ?worker ?workerName
having (count(distinct ?director) > 1)
"""

print("Results")
run_query(queryString)

Results
[('worker', 'http://www.wikidata.org/entity/Q346285'), ('workerName', 'David Arnold')]
[('worker', 'http://www.wikidata.org/entity/Q185051'), ('workerName', 'Christopher Walken')]
[('worker', 'http://www.wikidata.org/entity/Q26806'), ('workerName', 'Danny DeVito')]
[('worker', 'http://www.wikidata.org/entity/Q125017'), ('workerName', 'Uma Thurman')]
[('worker', 'http://www.wikidata.org/entity/Q202148'), ('workerName', 'Burt Reynolds')]
[('worker', 'http://www.wikidata.org/entity/Q231096'), ('workerName', 'Léa Seydoux')]
[('worker', 'http://www.wikidata.org/entity/Q185724'), ('workerName', 'Mike Myers')]
[('worker', 'http://www.wikidata.org/entity/Q104061'), ('workerName', 'Steve Buscemi')]
[('worker', 'http://www.wikidata.org/entity/Q207596'), ('workerName', 'Daryl Hannah')]
[('worker', 'http://www.wikidata.org/entity/Q38111'), ('workerName', 'Leonardo DiCaprio')]
[('worker', 'http://www.wikidata.org/entity/Q76819'), ('workerName', 'Christoph Waltz')]
[('worker', 'http://www.wi

14

**14 people worker for both directors.**

### Who is the most used actor by both the directors?

The BGP for actors in a film is (using cast member property):
    
```
?film wdt:P31 wd:Q11424 ;
      wdt:P161 ?actor .
```

The most used actor by **Woody Allen** is:

In [23]:
queryString = """     
select ?actor ?actorName (count(distinct ?film) as ?numFilms) where {
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 wd:Q25089 ;
          wdt:P161 ?actor .
        
    ?actor <http://schema.org/name> ?actorName .
} group by ?actor ?actorName
order by desc(?numFilms)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q25089'), ('actorName', 'Woody Allen'), ('numFilms', '30')]


1

I see that the director is part of the cast, so it must be excluded.

In [24]:
queryString = """
select ?actor ?actorName (count(distinct ?film) as ?numFilms) where {
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 wd:Q25089 ;
          wdt:P161 ?actor .
        
    ?actor <http://schema.org/name> ?actorName .
    
    filter (?actor != wd:Q25089) .
} group by ?actor ?actorName
order by desc(?numFilms)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q202725'), ('actorName', 'Mia Farrow'), ('numFilms', '13')]


1

And for **Quentin Tarantino**:

In [25]:
queryString = """
select ?actor ?actorName (count(distinct ?film) as ?numFilms) where {
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 wd:Q3772 ;
          wdt:P161 ?actor .
        
    ?actor <http://schema.org/name> ?actorName .
    
    filter (?actor != wd:Q3772) .
} group by ?actor ?actorName
order by desc(?numFilms)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q220584'), ('actorName', 'Michael Madsen'), ('numFilms', '7')]


1

And the most used actor in general is:

In [26]:
queryString = """
select ?actor ?actorName (count(distinct ?film) as ?numFilms) where {
    values ?director {wd:Q25089 wd:Q3772}
    
    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P161 ?actor .
        
    ?actor <http://schema.org/name> ?actorName .
    
    filter (?actor != ?director) .
} group by ?actor ?actorName
order by desc(?numFilms)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q202725'), ('actorName', 'Mia Farrow'), ('numFilms', '13')]


1

### What is the film with the highest number of shared actors

In [27]:
queryString = """
select ?film ?filmName (count(distinct ?sharedActor) as ?numSharedActors) (group_concat(distinct ?actorName; separator=", ") as ?sharedActorsName) where {
    values ?director {wd:Q25089 wd:Q3772}
    
    {
        select distinct ?sharedActor where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  wdt:P161 ?sharedActor .
        } group by ?sharedActor
        having (count(distinct ?director) > 1)
    } .
    
    ?film wdt:P31 wd:Q11424 ;
      wdt:P57 ?director ;
      wdt:P161 ?sharedActor .
    
    ?sharedActor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
} group by ?film ?filmName
order by desc(?numSharedActors)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q104123'), ('filmName', 'Pulp Fiction'), ('numSharedActors', '5'), ('sharedActorsName', 'Christopher Walken, Rosanna Arquette, Steve Buscemi, Tim Roth, Uma Thurman')]


1

### What is the average age of the workers for the two directors?

**P569** is the property for date of birth (found in properties of Woody Allen in the first query)

In [28]:
queryString = """
select ?directorName (avg(?workerAge) as ?avgAge) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P569 ?birthDate .
    
    bind(year(now()) - year(?birthDate) as ?workerAge) .
    
    ?director <http://schema.org/name> ?directorName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} group by ?directorName
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Quentin Tarantino'), ('avgAge', '59.589569160997732')]
[('directorName', 'Woody Allen'), ('avgAge', '76.134196185286104')]


2

### Which are the younger workers? (first five)

In [29]:
queryString = """
select distinct ?directorName ?worker ?workerName ?workerAge where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P569 ?birthDate .
    
    bind(year(now()) - year(?birthDate) as ?workerAge) .
    
    ?director <http://schema.org/name> ?directorName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} order by asc(?workerAge)
limit 5
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Quentin Tarantino'), ('worker', 'http://www.wikidata.org/entity/Q62023911'), ('workerName', 'Julia Butters'), ('workerAge', '13')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q7610664'), ('workerName', 'Stephen Tenenbaum'), ('workerAge', '22')]
[('directorName', 'Quentin Tarantino'), ('worker', 'http://www.wikidata.org/entity/Q23010323'), ('workerName', 'Dana Gourrier'), ('workerAge', '22')]
[('directorName', 'Quentin Tarantino'), ('worker', 'http://www.wikidata.org/entity/Q3059246'), ('workerName', 'Ethan Maniquis'), ('workerAge', '22')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q2078767'), ('workerName', 'Gary Weis'), ('workerAge', '22')]


5

### Which are the oldest workers? (first five)

In [30]:
queryString = """
select distinct ?directorName ?worker ?workerName ?workerAge where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P569 ?birthDate .
    
    bind(year(now()) - year(?birthDate) as ?workerAge) .
    
    ?director <http://schema.org/name> ?directorName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} order by desc(?workerAge)
limit 5
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q1339'), ('workerName', 'Johann Sebastian Bach'), ('workerAge', '337')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q7312'), ('workerName', 'Franz Schubert'), ('workerAge', '225')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q46096'), ('workerName', 'Felix Mendelssohn'), ('workerAge', '213')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q991'), ('workerName', 'Fyodor Dostoyevsky'), ('workerAge', '201')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q2846583'), ('workerName', 'Sir Andrew Clark, 1st Baronet'), ('workerAge', '196')]


5

What is the role of Johann Sebastian Bach and in which films?

In [31]:
queryString = """
select distinct ?film ?filmName ?directorName ?pName ?workerName where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p wd:Q1339 .
        
    ?director <http://schema.org/name> ?directorName .
    ?film <http://schema.org/name> ?filmName .
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q1354121'), ('filmName', 'Melinda and Melinda'), ('directorName', 'Woody Allen'), ('pName', 'composer')]
[('film', 'http://www.wikidata.org/entity/Q845057'), ('filmName', 'Hannah and Her Sisters'), ('directorName', 'Woody Allen'), ('pName', 'composer')]


2

### Which workers are death?

First, I need to get the property for death date. I can retrieve it from Bach properties.

In [32]:
queryString = """
select distinct ?p ?pName where { 
    wd:Q1339 ?p ?o .
    ?p <http://schema.org/name> ?pName .
    
    filter(regex(?pName, "death"))
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1196'), ('pName', 'manner of death')]
[('p', 'http://www.wikidata.org/prop/direct/P20'), ('pName', 'place of death')]
[('p', 'http://www.wikidata.org/prop/direct/P509'), ('pName', 'cause of death')]
[('p', 'http://www.wikidata.org/prop/direct/P570'), ('pName', 'date of death')]


4

In [33]:
queryString = """
select distinct ?worker ?workerName ?deathDate ?mannerOfDeath where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P570 ?deathDate .
    
    optional {
        ?worker wdt:P1196 ?mannerOfDeathE .
        ?mannerOfDeathE <http://schema.org/name> ?mannerOfDeath .
    }
        
    ?director <http://schema.org/name> ?directorName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} order by desc(?deathDate)
"""

print("Results")
run_query(queryString)

Results
[('worker', 'http://www.wikidata.org/entity/Q351732'), ('workerName', 'Sonny Chiba'), ('deathDate', '2021-08-19T00:00:00Z'), ('mannerOfDeath', 'natural causes')]
[('worker', 'http://www.wikidata.org/entity/Q230188'), ('workerName', 'Olympia Dukakis'), ('deathDate', '2021-05-01T00:00:00Z')]
[('worker', 'http://www.wikidata.org/entity/Q945188'), ('workerName', 'Monte Hellman'), ('deathDate', '2021-04-20T00:00:00Z'), ('mannerOfDeath', 'accident')]
[('worker', 'http://www.wikidata.org/entity/Q230131'), ('workerName', 'Cloris Leachman'), ('deathDate', '2021-01-27T00:00:00Z'), ('mannerOfDeath', 'natural causes')]
[('worker', 'http://www.wikidata.org/entity/Q2543651'), ('workerName', 'Walter Bernstein'), ('deathDate', '2021-01-23T00:00:00Z'), ('mannerOfDeath', 'natural causes')]
[('worker', 'http://www.wikidata.org/entity/Q1366768'), ('workerName', 'Tom Lister, Jr.'), ('deathDate', '2020-12-10T00:00:00Z'), ('mannerOfDeath', 'natural causes')]
[('worker', 'http://www.wikidata.org/entit

286

### What is the film with most death actors?

In [34]:
queryString = """
select distinct ?filmName ?directorName (count(distinct ?actor) as ?numDeads) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P161 ?actor .
    ?actor wdt:P31 wd:Q5 ;
           wdt:P570 ?deathDate .
        
    ?director <http://schema.org/name> ?directorName .
    ?film <http://schema.org/name> ?filmName .
} group by ?filmName ?directorName
order by desc(?numDeads)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('filmName', 'Zelig'), ('directorName', 'Woody Allen'), ('numDeads', '33')]


1

### What is the average age of the alive workers for the two directors?

In [35]:
queryString = """
select ?directorName (avg(?workerAge) as ?avgAge) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P569 ?birthDate .
    
    bind(year(now()) - year(?birthDate) as ?workerAge) .
    
    ?director <http://schema.org/name> ?directorName .
    
    filter not exists { ?worker wdt:P570 ?deathDate } .
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} group by ?directorName
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Quentin Tarantino'), ('avgAge', '57.396464646464646')]
[('directorName', 'Woody Allen'), ('avgAge', '67.927484333034915')]


2

### Which are the oldest alive workers? (first five)

In [36]:
queryString = """
select distinct ?directorName ?worker ?workerName ?workerAge where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          ?p ?worker .
    ?worker wdt:P31 wd:Q5 ;
            wdt:P569 ?birthDate .
    
    bind(year(now()) - year(?birthDate) as ?workerAge) .
    
    ?director <http://schema.org/name> ?directorName .
    ?worker <http://schema.org/name> ?workerName .
    
    filter not exists { ?worker wdt:P570 ?deathDate } .
    filter(?p not in (wdt:P674, wdt:P453, wdt:P2354, wdt:P859, wdt:P136, wdt:P825, wdt:P155)) .
} order by desc(?workerAge)
limit 5
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q5998104'), ('workerName', 'Maricel Álvarez'), ('workerAge', '121')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q22563868'), ('workerName', 'Sam Gray'), ('workerAge', '99')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q5498657'), ('workerName', 'Frederick Rolf'), ('workerAge', '96')]
[('directorName', 'Woody Allen'), ('worker', 'http://www.wikidata.org/entity/Q1209703'), ('workerName', 'Dick Hyman'), ('workerAge', '95')]
[('directorName', 'Quentin Tarantino'), ('worker', 'http://www.wikidata.org/entity/Q943390'), ('workerName', 'Clu Gulager'), ('workerAge', '94')]


5

### Country of birth of the actors

**P19** is the property for birth place. Let's see the birth places for some actors.

In [37]:
queryString = """
select distinct ?actor ?actorName ?birthPlace ?birthPlaceName where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P161 ?actor .
    ?actor wdt:P31 wd:Q5 ;
           wdt:P19 ?birthPlace .
        
    ?birthPlace <http://schema.org/name> ?birthPlaceName .
    ?actor <http://schema.org/name> ?actorName .
} limit 5
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q2915471'), ('actorName', 'Deborah Rush'), ('birthPlace', 'http://www.wikidata.org/entity/Q3017853'), ('birthPlaceName', 'Chatham Borough')]
[('actor', 'http://www.wikidata.org/entity/Q3547009'), ('actorName', 'Titos Vandis'), ('birthPlace', 'http://www.wikidata.org/entity/Q17780761'), ('birthPlaceName', 'Neo Faliro')]
[('actor', 'http://www.wikidata.org/entity/Q185724'), ('actorName', 'Mike Myers'), ('birthPlace', 'http://www.wikidata.org/entity/Q1025401'), ('birthPlaceName', 'Scarborough')]
[('actor', 'http://www.wikidata.org/entity/Q21500776'), ('actorName', 'John Sehil'), ('birthPlace', 'http://www.wikidata.org/entity/Q132790'), ('birthPlaceName', 'Biarritz')]
[('actor', 'http://www.wikidata.org/entity/Q1985488'), ('actorName', 'Nick Offerman'), ('birthPlace', 'http://www.wikidata.org/entity/Q40345'), ('birthPlaceName', 'Joliet')]


5

Let's see the properties for Scarborough to find the property for country.

In [38]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q1025401 ?p ?o .
        
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q4286337'), ('oName', 'city district')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country'), ('o', 'http://www.wikidata.org/entity/Q16'), ('oName', 'Canada')]
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pName', 'located in the administrative territorial entity'), ('o', 'http://www.wikidata.org/entity/Q172'), ('oName', 'Toronto')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('pName', 'part of'), ('o', 'http://www.wikidata.org/entity/Q172'), ('oName', 'Toronto')]
[('p', 'http://www.wikidata.org/prop/direct/P421'), ('pName', 'located in time zone'), ('o', 'http://www.wikidata.org/entity/Q941023'), ('oName', 'Eastern Time Zone')]
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pName', 'located in the administrative territorial entity'), ('o', 'http://www.wikidata.org/entity/Q1904'), ('oName', 'Ontario')]
[('p', 'http://www.wi

32

**P17** is the property of country. The BGP for country of birth is:

```
?human P19/P17 ?birthCountry 
```

In [39]:
queryString = """
select distinct ?actor ?actorName ?birthPlace ?birthPlaceName where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P161 ?actor .
    ?actor wdt:P31 wd:Q5 ;
           wdt:P19/wdt:P17 ?birthPlace .
        
    ?birthPlace <http://schema.org/name> ?birthPlaceName .
    ?actor <http://schema.org/name> ?actorName .
} limit 5
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q448005'), ('actorName', 'David Ortiz'), ('birthPlace', 'http://www.wikidata.org/entity/Q786'), ('birthPlaceName', 'Dominican Republic')]
[('actor', 'http://www.wikidata.org/entity/Q3292235'), ('actorName', 'Marie-Sohna Condé'), ('birthPlace', 'http://www.wikidata.org/entity/Q1008'), ('birthPlaceName', 'Ivory Coast')]
[('actor', 'http://www.wikidata.org/entity/Q318991'), ('actorName', 'Gad Elmaleh'), ('birthPlace', 'http://www.wikidata.org/entity/Q1028'), ('birthPlaceName', 'Morocco')]
[('actor', 'http://www.wikidata.org/entity/Q19629645'), ('actorName', 'Sammi Rotibi'), ('birthPlace', 'http://www.wikidata.org/entity/Q1033'), ('birthPlaceName', 'Nigeria')]
[('actor', 'http://www.wikidata.org/entity/Q1986254'), ('actorName', 'Sonia Rolland'), ('birthPlace', 'http://www.wikidata.org/entity/Q1037'), ('birthPlaceName', 'Rwanda')]


5

In [40]:
queryString = """
select distinct ?directorName ?birthPlace ?birthPlaceName (count(distinct ?actor) as ?numActors) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P161 ?actor .
    ?actor wdt:P31 wd:Q5 ;
           wdt:P19/wdt:P17 ?birthPlace .
        
    ?birthPlace <http://schema.org/name> ?birthPlaceName .
    ?actor <http://schema.org/name> ?actorName .
    ?director <http://schema.org/name> ?directorName .
} group by ?directorName ?birthPlace ?birthPlaceName
order by asc(?directorName) desc(?numActors)
"""

print("Results")
run_query(queryString)

Results
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q30'), ('birthPlaceName', 'United States of America'), ('numActors', '175')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q183'), ('birthPlaceName', 'Germany'), ('numActors', '25')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q145'), ('birthPlaceName', 'United Kingdom'), ('numActors', '7')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q17'), ('birthPlaceName', 'Japan'), ('numActors', '7')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q142'), ('birthPlaceName', 'France'), ('numActors', '5')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q408'), ('birthPlaceName', 'Australia'), ('numActors', '4')]
[('directorName', 'Quentin Tarantino'), ('birthPlace', 'http://www.wikidata.org/entity/Q38')

68

## Task 6. Is the maximum budget for a Tarantino's movie higher of the max budget of an Allen's movie?

Get the properties for films which contains "cost" or "budget".

In [41]:
queryString = """
select distinct ?p ?pName where { 
    ?film wdt:P31 wd:Q11424 ;
          ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    filter regex(?pName, "cost|budget", "i") .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2130'), ('pName', 'cost')]
[('p', 'http://www.wikidata.org/prop/direct/P2515'), ('pName', 'costume designer')]
[('p', 'http://www.wikidata.org/prop/direct/P2769'), ('pName', 'budget')]


3

The property cost is **P2130**. The property budget is **P2769**.

In [42]:
queryString = """
select ?filmName ?cost ?budget ?directorName where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director .
    optional { ?film wdt:P2130 ?cost } .
    optional { ?film wdt:P2769 ?budget } .
    
    ?film <http://schema.org/name> ?filmName .
    ?director <http://schema.org/name> ?directorName .
}
"""

print("Results")
run_query(queryString)

Results
[('filmName', 'The Hateful Eight'), ('cost', '44000000'), ('directorName', 'Quentin Tarantino')]
[('filmName', 'Pulp Fiction'), ('cost', '8000000'), ('directorName', 'Quentin Tarantino')]
[('filmName', 'Midnight in Paris'), ('cost', '17000000'), ('directorName', 'Woody Allen')]
[('filmName', 'Match Point'), ('cost', '15000000'), ('directorName', 'Woody Allen')]
[('filmName', 'Bananas'), ('cost', '2000000'), ('directorName', 'Woody Allen')]
[('filmName', 'You Will Meet a Tall Dark Stranger'), ('cost', '22000000'), ('directorName', 'Woody Allen')]
[('filmName', 'Sin City'), ('cost', '40000000'), ('directorName', 'Quentin Tarantino')]
[('filmName', 'Kill Bill Volume 1'), ('cost', '30000000'), ('directorName', 'Quentin Tarantino')]
[('filmName', 'Anything Else'), ('cost', '18000000'), ('directorName', 'Woody Allen')]
[('filmName', 'Jackie Brown'), ('cost', '12000000'), ('directorName', 'Quentin Tarantino')]
[('filmName', 'Inglourious Basterds'), ('cost', '70000000'), ('directorName

63

As we can see, Wonder Wheel by Woody Allen is the only film with the property budget. For this reason, it's impossible comparing the budgets for the two directors's films. So the next query will consider the cost of a film as its maximum budget, so using the property cost instead of budget. However, as showed in the previous query, only some films have the property cost, so the result of the ask query may be incorrect.

In [43]:
queryString = """
ask where {
    {
        select (max(?costQuentin) as ?maxCostQuentin) where {
            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 wd:Q3772 ;
                  wdt:P2130 ?costQuentin .
        }
    } .
    {
        select (max(?costWoody) as ?maxCostWoody) where {
            ?film wdt:P31 wd:Q11424 ;
              wdt:P57 wd:Q25089 ;
              wdt:P2130 ?costWoody .
        }
    } .
    filter (?maxCostQuentin > ?maxCostWoody)
}
"""

print("Results")
run_ask_query(queryString)

Results


{'head': {'link': []}, 'boolean': True}

## Task 7. Who has films with more nominations for Academy Awards and who won more Academy Awards (with his films not only personal awards).

First of all, looking at the properties for Woody Allen, I can see that **P1411** is the property for nomination to an award and **P166** is the property for a received award. **Q103360** is the Academy Award for Best Director. Let's see the properties of this entity.

In [44]:
queryString = """
select distinct ?p ?pName ?o ?oName where { 
    wd:Q103360 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q96474679'), ('oName', 'award for best direction')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pName', 'winner'), ('o', 'http://www.wikidata.org/entity/Q160726'), ('oName', 'Ang Lee')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pName', 'winner'), ('o', 'http://www.wikidata.org/entity/Q55215'), ('oName', 'Alejandro González Iñárritu')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('pName', 'part of'), ('o', 'http://www.wikidata.org/entity/Q19020'), ('oName', 'Academy Awards')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q19020'), ('oName', 'Academy Awards')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of America')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), (

20

So, **Q19020** is the entity for Academy Awards. The BGP for Academy Awards is

```
?award wdt:P31? wd:Q19020 .
```

Let's see also the properties for all Academic Awards

In [45]:
queryString = """
select distinct ?p ?pName where {
    ?award wdt:P31? wd:Q19020 ;
           ?p ?o.
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1407'), ('pName', 'MusicBrainz series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pName', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('pName', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1296'), ('pName', 'Gran Enciclopèdia Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pName', 'winner')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('pName', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('pName', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pName', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received')]
[('p', 'http://www.wikidata.org/prop/d

82

Academy Awards Allen winned or has been nominated to.

In [46]:
queryString = """
select distinct ?award ?pName ?awardName where {
    wd:Q25089 ?p ?award .
    ?award wdt:P31? wd:Q19020 .
    
    ?award <http://schema.org/name> ?awardName .
    ?p <http://schema.org/name> ?pName .
} order by ?winnedAward
"""

print("Results")
run_query(queryString)

Results
[('award', 'http://www.wikidata.org/entity/Q103360'), ('pName', 'nominated for'), ('awardName', 'Academy Award for Best Director')]
[('award', 'http://www.wikidata.org/entity/Q103360'), ('pName', 'award received'), ('awardName', 'Academy Award for Best Director')]
[('award', 'http://www.wikidata.org/entity/Q41417'), ('pName', 'nominated for'), ('awardName', 'Academy Award for Best Writing, Original Screenplay')]
[('award', 'http://www.wikidata.org/entity/Q41417'), ('pName', 'award received'), ('awardName', 'Academy Award for Best Writing, Original Screenplay')]
[('award', 'http://www.wikidata.org/entity/Q103916'), ('pName', 'nominated for'), ('awardName', 'Academy Award for Best Actor')]


5

The nominations for each director, both personal and for films directed are:

In [47]:
queryString = """
# Don't use distinct because multiple awards with same code would be considered as one

select ?director ?directorName (count(?nominationAward) as ?numAwards) where {
    {
        select  ?nominationAward ?director where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  wdt:P1411 ?nominationAward .
            ?nominationAward wdt:P31? wd:Q19020 .
        }
    }
    union
    {
        select ?nominationAward ?director where {
            values ?director {wd:Q25089 wd:Q3772}

            ?director wdt:P1411 ?nominationAward .
            ?nominationAward wdt:P31? wd:Q19020 .
        }
    }
    
    ?director <http://schema.org/name> ?directorName .
} group by ?director ?directorName
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q3772'), ('directorName', 'Quentin Tarantino'), ('numAwards', '26')]
[('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen'), ('numAwards', '55')]


2

And of course Woody Allen has directed much more films than Tarantino. We can also consider the average number of nominations for the films of the two directors.

In [48]:
queryString = """
select ?director ?directorName (avg(?numNominations) as ?avgAwards) where {
    {
        select ?film ?director (count(?nominationAward) as ?numNominations) where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  wdt:P1411 ?nominationAward .
            ?nominationAward wdt:P31? wd:Q19020 .
        } group by ?film ?director
    } .
    
    ?director <http://schema.org/name> ?directorName .
} group by ?director ?directorName
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen'), ('avgAwards', '2.736842105263158')]
[('director', 'http://www.wikidata.org/entity/Q3772'), ('directorName', 'Quentin Tarantino'), ('avgAwards', '4.8')]


2

The film with more Academic Awards nominations is:

In [49]:
queryString = """
select ?film ?filmName ?directorName (count(?nominationAward) as ?numAwards) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P1411 ?nominationAward .
    ?nominationAward wdt:P31? wd:Q19020 .
    
    ?director <http://schema.org/name> ?directorName .
    ?film <http://schema.org/name> ?filmName .
} order by desc(?numAwards)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q153723'), ('filmName', 'Inglourious Basterds'), ('directorName', 'Quentin Tarantino'), ('numAwards', '8')]


1

For what concerns winners

In [50]:
queryString = """
# Don't use distinct because multiple awards with same code would be considered as one

select ?director ?directorName (count(?winAward) as ?numAwards) where {
    {
        select  ?winAward ?director where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  wdt:P166 ?winAward .
            ?winAward wdt:P31? wd:Q19020 .
        }
    }
    union
    {
        select ?winAward ?director where {
            values ?director {wd:Q25089 wd:Q3772}

            ?director wdt:P166 ?winAward .
            ?winAward wdt:P31? wd:Q19020 .
        }
    }
    
    ?director <http://schema.org/name> ?directorName .
} group by ?director ?directorName
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q3772'), ('directorName', 'Quentin Tarantino'), ('numAwards', '8')]
[('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen'), ('numAwards', '14')]


2

In [51]:
queryString = """
select ?director ?directorName (avg(?numWins) as ?avgAwards) where {
    {
        select ?film ?director (count(?winAward) as ?numWins) where {
            values ?director {wd:Q25089 wd:Q3772}

            ?film wdt:P31 wd:Q11424 ;
                  wdt:P57 ?director ;
                  wdt:P166 ?winAward .
            ?winAward wdt:P31? wd:Q19020 .
        } group by ?film ?director
    } .
    
    ?director <http://schema.org/name> ?directorName .
} group by ?director ?directorName
"""

print("Results")
run_query(queryString)

Results
[('director', 'http://www.wikidata.org/entity/Q25089'), ('directorName', 'Woody Allen'), ('avgAwards', '1.714285714285714')]
[('director', 'http://www.wikidata.org/entity/Q3772'), ('directorName', 'Quentin Tarantino'), ('avgAwards', '1.4')]


2

In [52]:
queryString = """
select ?film ?filmName ?directorName (count(?winAward) as ?numAwards) where {
    values ?director {wd:Q25089 wd:Q3772}

    ?film wdt:P31 wd:Q11424 ;
          wdt:P57 ?director ;
          wdt:P166 ?winAward .
    ?winAward wdt:P31? wd:Q19020 .
    
    ?director <http://schema.org/name> ?directorName .
    ?film <http://schema.org/name> ?filmName .
} order by desc(?numAwards)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q233464'), ('filmName', 'Annie Hall'), ('directorName', 'Woody Allen'), ('numAwards', '4')]


1

#### Recap

Woody Allen has more nominations and winned more Academic Awards than Tarantino. The film with most nominations is "Inglourious Basterds" by Quentin Tarantino. However, Tarantino has an average number of nominations for his films than the Allen's films.

For what concerns received awards, also in this case Allen received more Academic Awards than Tarantino, both in absolute value and in average. The film with most number of awards receiver is "Annie Hall" by Woody Allen.