# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [36]:
## SETUP used later

import sys
import os
import json
import pandas as pd
sys.path.insert(1, '/locale/data/jupyter/prando/wd-project/src')
import gt_modules.evaluation as evaluation
from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-batman_g_t-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("The Batman movies explorative search") 

Consider the following exploratory scenario:


> we are interested in movies about the Batman. We want to investigate the differences between the variuos series of films produced in different decades. 


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q2695156` | Batman        | node |
| `wd:Q25191` | Christopher Nolan         | node |
| `wd:Q12859908'` | The Dark Knight Trilogy | node |



Also consider

```
wd:Q25191 ?p ?obj .
```

is the BGP to retrieve all **properties of Christopher Nolan**

The workload should


1. Investigate the works (aka derivative works) related to the Batman and individuate the movies. Return the movies along with the year of production and the director.

2. Return the main Batman movie series produced in the last four decades and compare them in terms of length, number of actors involved and costs.

3. Investigate what are the workers (writers, actors, etc.) who had a role in more Batman movies so far.

4. Compare the ratings of the single movies and of the series. Indentify the movie with highest rating from the critics and the "best" series overall

5. Return how many actors who are members of the cast of the "Dark Knight Trilogy" by Christopher Nolan have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 

In [5]:
## startup the evaluation
# setup the file and create the empty json
ipname = "batman.ipynb"
pt = os.getcwd()+os.sep+ipname
evaluation.setup(pt)

The index of this workflow is: 5_1


## Task 1
Investigate the works (aka derivative works) related to the Batman and individuate the movies. Return the movies along with the year of production and the director.

In [6]:
# show all the derivative works related to the batman
queryString = """
SELECT DISTINCT ?obj ?pName
WHERE { 
    wd:Q2695156 wdt:P4969 ?obj .
    ?obj <http://schema.org/name> ?pName .

}
"""

print("Results")
x = run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q1339570'), ('pName', 'Batman Beyond')]
[('obj', 'http://www.wikidata.org/entity/Q635933'), ('pName', 'The Batman')]
[('obj', 'http://www.wikidata.org/entity/Q673517'), ('pName', 'Batman: The Animated Series')]
[('obj', 'http://www.wikidata.org/entity/Q958989'), ('pName', 'Nite Owl')]
[('obj', 'http://www.wikidata.org/entity/Q221345'), ('pName', 'Batman Forever')]
[('obj', 'http://www.wikidata.org/entity/Q94496874'), ('pName', 'Bruce Wayne')]
[('obj', 'http://www.wikidata.org/entity/Q288811'), ('pName', 'Batman: Arkham City')]
[('obj', 'http://www.wikidata.org/entity/Q21095079'), ('pName', 'Batman: Bad Blood')]
[('obj', 'http://www.wikidata.org/entity/Q189054'), ('pName', 'Batman Returns')]
[('obj', 'http://www.wikidata.org/entity/Q10826261'), ('pName', 'Batman: Arkham Origins')]
[('obj', 'http://www.wikidata.org/entity/Q10856847'), ('pName', 'Batman: Arkham Origins Blackgate')]
[('obj', 'http://www.wikidata.org/entity/Q15091564'), ('pNa

To individuate the movies I must know which is the class of films

In [7]:
# find the instances of this derivative works
queryString = """
SELECT DISTINCT ?inst ?iName
WHERE { 
    wd:Q2695156 wdt:P4969 ?obj .
    ?obj wdt:P31 ?inst.
    ?inst <http://schema.org/name> ?iName .

}
"""

print("Results")
x = run_query(queryString)

Results
[('inst', 'http://www.wikidata.org/entity/Q11424'), ('iName', 'film')]
[('inst', 'http://www.wikidata.org/entity/Q15632617'), ('iName', 'fictional human')]
[('inst', 'http://www.wikidata.org/entity/Q15711870'), ('iName', 'animated character')]
[('inst', 'http://www.wikidata.org/entity/Q15773317'), ('iName', 'television character')]
[('inst', 'http://www.wikidata.org/entity/Q5398426'), ('iName', 'television series')]
[('inst', 'http://www.wikidata.org/entity/Q581714'), ('iName', 'animated series')]
[('inst', 'http://www.wikidata.org/entity/Q7889'), ('iName', 'video game')]
[('inst', 'http://www.wikidata.org/entity/Q1114461'), ('iName', 'comics character')]
[('inst', 'http://www.wikidata.org/entity/Q1307329'), ('iName', 'extraterrestrials in fiction')]
[('inst', 'http://www.wikidata.org/entity/Q1569167'), ('iName', 'video game character')]
[('inst', 'http://www.wikidata.org/entity/Q95074'), ('iName', 'fictional character')]
[('inst', 'http://www.wikidata.org/entity/Q63998451'), (

`wd:Q11424` is the class of films. I can show all the films of the Batman

In [8]:
# show the batman movies
queryString = """
SELECT DISTINCT ?film ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film <http://schema.org/name> ?name .

}
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever')]
[('film', 'http://www.wikidata.org/entity/Q21095079'), ('name', 'Batman: Bad Blood')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns')]
[('film', 'http://www.wikidata.org/entity/Q116852'), ('name', 'Batman')]
[('film', 'http://www.wikidata.org/entity/Q163872'), ('name', 'The Dark Knight')]
[('film', 'http://www.wikidata.org/entity/Q166262'), ('name', 'Batman Begins')]
[('film', 'http://www.wikidata.org/entity/Q189330'), ('name', 'The Dark Knight Rises')]
[('film', 'http://www.wikidata.org/entity/Q276523'), ('name', 'Batman & Robin')]
8


To return the movies with the year of production and the directors I have to find this two properties. 

In [9]:
# find the year of production
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          ?prop ?o.
    ?prop sc:name ?name.
    FILTER(isLiteral(?o)).
} 
ORDER BY DESC (?name)
"""

print("Results")
x = run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P2529'), ('name', 'ČSFD film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P1476'), ('name', 'title')]
[('prop', 'http://www.wikidata.org/prop/direct/P577'), ('name', 'publication date')]
[('prop', 'http://www.wikidata.org/prop/direct/P2754'), ('name', 'production date')]
[('prop', 'http://www.wikidata.org/prop/direct/P6398'), ('name', 'iTunes movie ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P2755'), ('name', 'exploitation visa number')]
[('prop', 'http://www.wikidata.org/prop/direct/P3143'), ('name', 'elFilm film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P3135'), ('name', 'elCinema film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P2047'), ('name', 'duration')]
[('prop', 'http://www.wikidata.org/prop/direct/P2130'), ('name', 'cost')]
[('prop', 'http://www.wikidata.org/prop/direct/P1638'), ('name', 'codename')]
[('prop', 'http://www.wikidata.org/prop/direct/P4786'), ('name', 'cinematografo film I

Production date: `wdt:P2754`
Publication date: `wdt:P577`

In [10]:
# print the year of production
queryString = """
SELECT DISTINCT ?film ?name ?date
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          wdt:P2754 ?date.
    ?film sc:name ?name.
}
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q116852'), ('name', 'Batman'), ('date', '1989-01-01T00:00:00Z')]
1


Since the production date exists only for one element, I rewrite the query using the publication date.

In [11]:
# print the year of publication
queryString = """
SELECT DISTINCT ?film ?name ?date
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          wdt:P577 ?date.
    ?film sc:name ?name.
}
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever'), ('date', '1995-06-16T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever'), ('date', '1995-07-14T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever'), ('date', '1995-08-03T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q21095079'), ('name', 'Batman: Bad Blood'), ('date', '2016-01-01T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q21095079'), ('name', 'Batman: Bad Blood'), ('date', '2016-08-04T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns'), ('date', '1992-06-16T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns'), ('date', '1992-06-19T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns'), ('date', '1992-07-16T00:00:00Z')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 

There are more publication date for each film that is surely due to different publication dates in different countries. I decide to take the minimum publication date and then retrieve the related year.

In [12]:
# show the minimum year of publication (the first publication ever)
queryString = """
SELECT ?film ?name (year(xsd:dateTime(min(?date))) as ?year)
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          wdt:P577 ?date.
    ?film sc:name ?name.
}
GROUP BY ?film ?name
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q116852'), ('name', 'Batman'), ('year', '1989')]
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever'), ('year', '1995')]
[('film', 'http://www.wikidata.org/entity/Q166262'), ('name', 'Batman Begins'), ('year', '2005')]
[('film', 'http://www.wikidata.org/entity/Q276523'), ('name', 'Batman & Robin'), ('year', '1997')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns'), ('year', '1992')]
[('film', 'http://www.wikidata.org/entity/Q189330'), ('name', 'The Dark Knight Rises'), ('year', '2012')]
[('film', 'http://www.wikidata.org/entity/Q21095079'), ('name', 'Batman: Bad Blood'), ('year', '2016')]
[('film', 'http://www.wikidata.org/entity/Q163872'), ('name', 'The Dark Knight'), ('year', '2008')]
8


Find the directors of the films. Start from the films and look to their properties to find the director.

In [13]:
# find the property director
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          ?prop ?obj.
    ?prop sc:name ?name.
    FILTER(REGEX(?name,"director")).
}
"""

print("Results")
x = run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P344'), ('name', 'director of photography')]
[('prop', 'http://www.wikidata.org/prop/direct/P57'), ('name', 'director')]
2


`wdt:P57` is the property that tells me the director of the film.<br>
Put all the informations found together and show all the batman films with the year of publication (because production is very restrictive) and the director

In [14]:
# show all together
queryString = """
SELECT ?name (year(xsd:dateTime(min(?date))) as ?year) ?director
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          wdt:P57 ?dir;
          wdt:P577 ?date.
    ?film sc:name ?name.
    ?dir sc:name ?director.
}
"""

print("Results")
x = run_query(queryString)

Results
[('name', 'The Dark Knight Rises'), ('year', '2012'), ('director', 'Christopher Nolan')]
[('name', 'Batman: Bad Blood'), ('year', '2016'), ('director', 'Jay Oliva')]
[('name', 'The Dark Knight'), ('year', '2008'), ('director', 'Christopher Nolan')]
[('name', 'Batman Forever'), ('year', '1995'), ('director', 'Joel Schumacher')]
[('name', 'Batman Returns'), ('year', '1992'), ('director', 'Tim Burton')]
[('name', 'Batman'), ('year', '1989'), ('director', 'Tim Burton')]
[('name', 'Batman Begins'), ('year', '2005'), ('director', 'Christopher Nolan')]
[('name', 'Batman & Robin'), ('year', '1997'), ('director', 'Joel Schumacher')]
8


In [17]:
# show all together with the respective URI for the films and the directors, for evaluation purpose
queryString = """
SELECT ?film ?name ?dir ?director (year(xsd:dateTime(min(?date))) as ?year) 
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424;
          wdt:P57 ?dir;
          wdt:P577 ?date.
    ?film sc:name ?name.
    ?dir sc:name ?director.
}
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q276523'), ('name', 'Batman & Robin'), ('dir', 'http://www.wikidata.org/entity/Q295207'), ('director', 'Joel Schumacher'), ('year', '1997')]
[('film', 'http://www.wikidata.org/entity/Q21095079'), ('name', 'Batman: Bad Blood'), ('dir', 'http://www.wikidata.org/entity/Q11866244'), ('director', 'Jay Oliva'), ('year', '2016')]
[('film', 'http://www.wikidata.org/entity/Q221345'), ('name', 'Batman Forever'), ('dir', 'http://www.wikidata.org/entity/Q295207'), ('director', 'Joel Schumacher'), ('year', '1995')]
[('film', 'http://www.wikidata.org/entity/Q189054'), ('name', 'Batman Returns'), ('dir', 'http://www.wikidata.org/entity/Q56008'), ('director', 'Tim Burton'), ('year', '1992')]
[('film', 'http://www.wikidata.org/entity/Q116852'), ('name', 'Batman'), ('dir', 'http://www.wikidata.org/entity/Q56008'), ('director', 'Tim Burton'), ('year', '1989')]
[('film', 'http://www.wikidata.org/entity/Q166262'), ('name', 'Batman Begins'), ('dir', 'http://

In [18]:
## more elements mixed between URI and literals
objs=[]
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    dir_uri = i[2][1]
    dir_name = i[3][1]
    year = i[4][1]
    # add the year element
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"] = year
    objs.append(obj)
    # add the director element
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = ["uri","name"]
    obj["uri"] = dir_uri
    obj["name"] = dir_name
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"1", evaluation.TYPE_REFERRED ,["uri","name","value"], objs, elements_per_tuple = 2)

The index of this workflow is: 5_1
The path is /locale/data/jupyter/prando/wd-project/2021/ground_truths/gt_json/workflow5_1.json
JSON object updated


## Task 2

In [19]:
# find something that tells me that a film is part of a series
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film ?prop ?series.
    ?prop <http://schema.org/name> ?name .

}
"""

print("Results")
x = run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P1040'), ('name', 'film editor')]
[('prop', 'http://www.wikidata.org/prop/direct/P1237'), ('name', 'Box Office Mojo film ID (former scheme)')]
[('prop', 'http://www.wikidata.org/prop/direct/P1258'), ('name', 'Rotten Tomatoes ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P1265'), ('name', 'AlloCiné film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P1273'), ('name', 'CANTIC ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P136'), ('name', 'genre')]
[('prop', 'http://www.wikidata.org/prop/direct/P1411'), ('name', 'nominated for')]
[('prop', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P1434'), ('name', 'takes place in fictional universe')]
[('prop', 'http://www.wikidata.org/prop/direct/P144'), ('name', 'based on')]
[('prop', 'http://www.wikidata.org/prop/direct/P1476'), ('name', 'title')]
[('prop', 'http://www.wikidata.org

`wdt:P179` tells me that the film is part of a series. Now, have a look to these series

In [20]:
# show all the movie series
queryString = """
SELECT DISTINCT ?series ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film wdt:P179 ?series.
    ?series <http://schema.org/name> ?name .

}
"""

print("Results")
x = run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q18914861'), ('name', 'Batman')]
[('series', 'http://www.wikidata.org/entity/Q2111133'), ('name', 'Batman in film')]
[('series', 'http://www.wikidata.org/entity/Q2405799'), ('name', 'DC Universe Animated Original Movies')]
[('series', 'http://www.wikidata.org/entity/Q12859908'), ('name', 'The Dark Knight Trilogy')]
4


These are all the series of batman movies

In [21]:
# find the date of production or publication
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film wdt:P179 ?series.
    ?series ?prop ?o.
    ?prop sc:name ?name.

}
"""

print("Results")
x = run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P1040'), ('name', 'film editor')]
[('prop', 'http://www.wikidata.org/prop/direct/P1424'), ('name', "topic's main template")]
[('prop', 'http://www.wikidata.org/prop/direct/P1434'), ('name', 'takes place in fictional universe')]
[('prop', 'http://www.wikidata.org/prop/direct/P1476'), ('name', 'title')]
[('prop', 'http://www.wikidata.org/prop/direct/P155'), ('name', 'follows')]
[('prop', 'http://www.wikidata.org/prop/direct/P156'), ('name', 'followed by')]
[('prop', 'http://www.wikidata.org/prop/direct/P161'), ('name', 'cast member')]
[('prop', 'http://www.wikidata.org/prop/direct/P162'), ('name', 'producer')]
[('prop', 'http://www.wikidata.org/prop/direct/P2354'), ('name', 'has list')]
[('prop', 'http://www.wikidata.org/prop/direct/P272'), ('name', 'production company')]
[('prop', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('prop', 'http://www.wikidata.org/prop/direct/P3417'), ('name', 'Quora topic ID')]
[

Some useful properties:
- `wdt:P2354` has list
- `wdt:P527` has part
- `wdt:P577` publication date
- `wdt:P580` start time

In [22]:
# show publication date and start time
queryString = """
SELECT DISTINCT ?series ?name ?date ?start
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film wdt:P179 ?series.
    OPTIONAL{?series wdt:P577 ?date.}
    OPTIONAL{?series wdt:P580 ?start.}
    ?series sc:name ?name.
}
"""

print("Results")
x = run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q12859908'), ('name', 'The Dark Knight Trilogy'), ('date', '2005-01-01T00:00:00Z'), ('start', '2005-01-01T00:00:00Z')]
[('series', 'http://www.wikidata.org/entity/Q18914861'), ('name', 'Batman'), ('date', '1989-01-01T00:00:00Z')]
[('series', 'http://www.wikidata.org/entity/Q2111133'), ('name', 'Batman in film'), ('date', '1940-01-01T00:00:00Z')]
[('series', 'http://www.wikidata.org/entity/Q2405799'), ('name', 'DC Universe Animated Original Movies'), ('date', '2007-01-01T00:00:00Z')]
4


We have the publication date for all of these series but not the start time. Since we are interested only in series of the last 4 decades, we have to discard the "Batman in film" that is older.

In [23]:
# show publication date and start time
queryString = """
SELECT DISTINCT ?series ?name ?date ?start
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film wdt:P179 ?series.
    ?series wdt:P577 ?date.
    BIND(year(xsd:dateTime(?date)) AS ?year).
    FILTER(?year > 1982).
    ?series sc:name ?name.
}
"""

print("Results")
x = run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q18914861'), ('name', 'Batman'), ('date', '1989-01-01T00:00:00Z')]
[('series', 'http://www.wikidata.org/entity/Q2405799'), ('name', 'DC Universe Animated Original Movies'), ('date', '2007-01-01T00:00:00Z')]
[('series', 'http://www.wikidata.org/entity/Q12859908'), ('name', 'The Dark Knight Trilogy'), ('date', '2005-01-01T00:00:00Z')]
3


At this point we have to find how many films for each series, how many actors for each series and the total cost of the series (we have to sum the cost of the single film) and also the length of the film. From a previous query we have already found:
- `wdt:P161` : cast member
- `wdt:P2047` : duration
- `wdt:P2130` : cost
We can fully answer to the second task

In [26]:
# show all together filtering for series of the last 4 decades
queryString = """
SELECT DISTINCT ?series ?name (COUNT(DISTINCT ?film) AS ?num_film) (COUNT(DISTINCT ?cast) AS ?actors) (sum(?duration) AS ?tot_len) (sum(?cost) AS ?tot_cost) 
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    ?film wdt:P179 ?series.
    ?series wdt:P577 ?date.
    BIND(year(xsd:dateTime(?date)) AS ?year).
    FILTER(?year > 1982).
    
    #get the properties of the film
    ?film wdt:P161 ?cast;
          wdt:P2047 ?duration;
          wdt:P2130 ?cost.
    
    ?series sc:name ?name.
}
GROUP BY ?series ?name
"""

print("Results")
x = run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q12859908'), ('name', 'The Dark Knight Trilogy'), ('num_film', '3'), ('actors', '95'), ('tot_len', '17093'), ('tot_cost', '22565000000')]
[('series', 'http://www.wikidata.org/entity/Q18914861'), ('name', 'Batman'), ('num_film', '3'), ('actors', '65'), ('tot_len', '8860'), ('tot_cost', '7215000000')]
2


In [27]:
## more literals associated to the same element
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    films = i[2][1]
    actors = i[3][1]
    tot_len = i[4][1]
    cost = i[5][1]
    for val in [films,actors,tot_len,cost]:
        obj = {}
        obj["refers_to"] = f_uri
        obj["refers_to_name"] = f_name
        obj["check"] = "value"
        obj["value"]= val
        objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"2", evaluation.TYPE_REFERRED ,"value", objs, elements_per_tuple = 4)

The index of this workflow is: 5_1
The path is /locale/data/jupyter/prando/wd-project/2021/ground_truths/gt_json/workflow5_1.json
JSON object updated


## Task 3
Investigate what are the workers (writers, actors, etc.) who had a role in more Batman movies so far.


In [28]:
# show the workers who worked on more films
queryString = """
SELECT DISTINCT ?worker ?name (COUNT(DISTINCT ?film) AS ?films)
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    
    #get the properties of the film
    ?film ?prop ?worker.
    ?worker wdt:P31 wd:Q5.
    
    ?worker sc:name ?name.
}
GROUP BY ?worker ?name
ORDER BY DESC (?films)
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('worker', 'http://www.wikidata.org/entity/Q2740528'), ('name', 'Benjamin Melniker'), ('films', '6')]
[('worker', 'http://www.wikidata.org/entity/Q313048'), ('name', 'Bob Kane'), ('films', '5')]
[('worker', 'http://www.wikidata.org/entity/Q299302'), ('name', 'Michael Gough'), ('films', '4')]
[('worker', 'http://www.wikidata.org/entity/Q165283'), ('name', 'Pat Hingle'), ('films', '4')]
[('worker', 'http://www.wikidata.org/entity/Q464282'), ('name', 'Bill Finger'), ('films', '4')]
[('worker', 'http://www.wikidata.org/entity/Q123351'), ('name', 'Michael Caine'), ('films', '3')]
[('worker', 'http://www.wikidata.org/entity/Q56008'), ('name', 'Tim Burton'), ('films', '3')]
[('worker', 'http://www.wikidata.org/entity/Q352010'), ('name', 'David S. Goyer'), ('films', '3')]
[('worker', 'http://www.wikidata.org/entity/Q3336708'), ('name', 'Nathan Crowley'), ('films', '3')]
[('worker', 'http://www.wikidata.org/entity/Q25191'), ('name', 'Christopher Nolan'), ('films', '3')]
10


In [30]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"3", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 5_1
The path is /locale/data/jupyter/prando/wd-project/2021/ground_truths/gt_json/workflow5_1.json
JSON object updated


## Task 4
Compare the ratings of the single movies and of the series. Indentify the movie with highest rating from the critics and the "best" series overall

In [31]:
### This query in the endpoint of wikidata returns exactly what we want, but here data is missing
####
queryString = """
SELECT DISTINCT ?prop ?pName
WHERE { 
    ?film wdt:P31 wd:Q11424.
    
    #get the properties of the film
    ?film ?prop ?obj.
    ?prop sc:name ?pName.
    FILTER(REGEX(?pName,"score")).
}

"""

print("Results")
x = run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P444'), ('pName', 'review score')]
[('prop', 'http://www.wikidata.org/prop/direct/P447'), ('pName', 'review score by')]
2


In [32]:
### This query in the endpoint of wikidata returns exactly what we want, but here data is missing
####
queryString = """
SELECT DISTINCT ?film  ?obj
WHERE { 
    wd:Q2695156 wdt:P4969 ?film .
    ?film wdt:P31 wd:Q11424.
    
    #get the properties of the film
    ?film wdt:P444 ?obj.
    
}

"""

print("Results")
x = run_query(queryString)

Results
Empty


## Task 5
Return how many actors who are members of the cast of the "Dark Knight Trilogy" by Christopher Nolan have Kavin Bacon number equal to 2

The Bacon number of an actor is the number of degrees of separation he or she has from Kevin Bacon. So, first af all, I need to retrieve Kevin Bacon.

***Dark Knight Trilogy (wd:Q12859908)*** 

In [27]:
# find Kevin Bacon
queryString = """
SELECT DISTINCT ?person ?personName  WHERE { 

    # Retrieve humans
    ?person wdt:P31 wd:Q5 .
    
    # This returns the labels
    ?person <http://schema.org/name> ?personName .

    # Since Kevin Bacon is an actor, he probably acted in a film.
    FILTER EXISTS{
        ?film   wdt:P31   wd:Q11424 ;
                wdt:P161  ?person   .             
    }
    
    # I use a regex to search for a surname that contains the word "Bacon"
    FILTER(REGEX(?personName, "Bacon"))
    
}
"""

print("Results")
x = run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q3102228'), ('personName', 'Georges Baconnet')]
[('person', 'http://www.wikidata.org/entity/Q3116093'), ('personName', 'Irving Bacon')]
[('person', 'http://www.wikidata.org/entity/Q503597'), ('personName', 'James Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3454165'), ('personName', 'Kevin Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3491343'), ('personName', 'Sosie Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3992438'), ('personName', 'Tom Bacon')]
[('person', 'http://www.wikidata.org/entity/Q706678'), ('personName', 'Lloyd Bacon')]
[('person', 'http://www.wikidata.org/entity/Q65116263'), ('personName', 'Marco Bacon')]
[('person', 'http://www.wikidata.org/entity/Q5216474'), ('personName', 'Daniel Bacon')]
[('person', 'http://www.wikidata.org/entity/Q16031668'), ('personName', 'Frank Bacon')]
[('person', 'http://www.wikidata.org/entity/Q21483293'), ('personName', 'Georges Baconnier')]
[('person', 'http://www

I have ***Kevin Bacon (wd:Q3454165)***. Now I can retrieve all the cast members of ***Batman's film*** with Kevin Bacon number equal to 2.

In [37]:
#actors of Dark Knight Trilogy with Kevin Bacon equal to 2
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    ## Retrieve actors of the Dark Knight Trilogy
    wd:Q12859908 wdt:P527 ?f.
    ?f wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
LIMIT 100
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q202381'), ('actorName', 'Maggie Gyllenhaal')]
[('actor', 'http://www.wikidata.org/entity/Q729618'), ('actorName', 'Nicky Katt')]
[('actor', 'http://www.wikidata.org/entity/Q318249'), ('actorName', 'Matthew Modine')]
[('actor', 'http://www.wikidata.org/entity/Q6166387'), ('actorName', 'Jay Benedict')]
[('actor', 'http://www.wikidata.org/entity/Q117415'), ('actorName', 'Josh Stewart')]
[('actor', 'http://www.wikidata.org/entity/Q16149374'), ('actorName', 'Christine Adams')]
[('actor', 'http://www.wikidata.org/entity/Q737369'), ('actorName', 'Tomas Arana')]
[('actor', 'http://www.wikidata.org/entity/Q220335'), ('actorName', 'William Fichtner')]
[('actor', 'http://www.wikidata.org/entity/Q40572'), ('actorName', 'Heath Ledger')]
[('actor', 'http://www.wikidata.org/entity/Q225933'), ('actorName', 'Nestor Carbonell')]
[('actor', 'http://www.wikidata.org/entity/Q207969'), ('actorName', 'Eric Roberts')]
[('actor', 'http://www.wikidata.org/enti

In [38]:
obj = [{"uri":r[0][1],"name":r[1][1]} for r in x]
evaluation.add_result(evaluation.get_index_workflow(pt),"5", evaluation.TYPE_SET ,"uri", obj)

The index of this workflow is: 5_1
The path is /locale/data/jupyter/prando/wd-project/2021/ground_truths/gt_json/workflow5_1.json
JSON object updated


In [39]:
#number of actors of Dark Knight Trilogy with Kevin Bacon equal to 2
queryString = """
SELECT COUNT(DISTINCT ?actor) AS ?bacon_2_actors WHERE { 

    ## Retrieve actors of the Dark Knight Trilogy
    wd:Q12859908 wdt:P527 ?f.
    ?f wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
x = run_query(queryString)



Results
[('bacon_2_actors', '87')]
1
