# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [24]:
## SETUP used later
import sys
import os
import json
import pandas as pd
sys.path.insert(1, '/locale/data/jupyter/prando/notebook/sparqlthesis/')
import modules.evaluation as evaluation
from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-movie6-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series Without a Trace explorative search") 


Consider the following exploratory scenario:


> we are interested in the TV series "Without a Trace" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q826477` | Without a Trace    | node |
| `wd:Q733960` | Cold Case       | node |



Also consider

```
wd:Q826477 ?p ?obj .
```

is the BGP to retrieve all **properties of Without a Trace**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.



The workload should

1. Identify the BGP for television series

2. Return the number of seasons and episodes per season of the tv series (the result set must be triples of season IRI, label and #episodes).

3. Get the number of episodes in which the cast members played a role. Who are the most present actors? (the result set must be a list of triples actor/actress IRI, label and #episodes)

4. Check who is the actor who acted in more films while working on "Without a Trace" (the result set must be a list of triples actor/actress IRI, label and #films).

5. Compare Without a Trace with the tv series "Cold Case" in terms of number of seasons, episods and cast members (the result set must be two elements -one for each tv series- of tv series IRI, label, #seasons, #episodes and #cast members).

6. Return the actors who are members of the cast of Without a Trace have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 (the result set must be a list of couples actor/actress IRI and label).

7. Consider the actors who are members of the cast of Cold Case. Amongst the tv series which these actors acted return only those which received more than 2 awards (the result set must be triples of tv series IRI, label, #awards won).

In [25]:
## startup the evaluation
# setup the file and create the empty json
ipname = "m6.ipynb"
pt = os.getcwd()+os.sep+ipname
evaluation.setup(pt)

The index of this workflow is: 2_3


### Task 1

In [26]:
# literal properties of Without a Trace 
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting Without a Trace to something
    wd:Q826477 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Only data properties
    FILTER(isLiteral(?o))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('pName', 'AlloCiné series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pName', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pName', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2047'), ('pName', 'duration')]
[('p', 'http://www.wikidata.org/prop/direct/P2437'), ('pName', 'number of seasons')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pName', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('pName', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2603'), ('pName', 'Kinopoisk film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pName', 'TV.com ID')]
[('p', 'http:

In [27]:
# use instance of
queryString = """
SELECT DISTINCT ?o ?pName WHERE { 

    # Connecting Without a Trace to something
    wd:Q826477 wdt:P31 ?o.

    # This returns the labels
    ?o <http://schema.org/name> ?pName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q5398426'), ('pName', 'television series')]
1


In [28]:
### insert the result of TASK 1 in the file
og_uri = "http://www.wikidata.org/entity/Q5398426"
og_name = "television series"
obj = {"uri":og_uri,"name":og_name}
evaluation.add_result(evaluation.get_index_workflow(pt),"1", evaluation.TYPE_SINGLE ,"uri", [obj] ,"all")

The index of this workflow is: 2_3
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_3.json
JSON object updated


### Task 2 : Return the number of seasons and episodes per season of the tv series.

I'm interested on the TV series ***"Without a Trace" (wd:Q826477)***, so as a starting point I show all the data properties of this TV series.

I discovered the two properties: ***number of episodes (wdt:P1113)*** and ***number of seasons (wdt:P2437)***. 

I try to use them on ***"Without a Trace" (wd:Q826477)***.

In [29]:
# find seasons and episodes
queryString = """
SELECT ?numEpisodes ?numSeasons WHERE { 

    # Retrieve Without a Trace numEpisodes and numSeasons
    wd:Q826477  wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons  .
}
"""

print("Results")
x = run_query(queryString)

Results
[('numEpisodes', '160'), ('numSeasons', '7')]
1


Now I have to discover how many episodes there are for each season. To do this I show all the object properties of ***"Without a Trace" (wd:Q826477)***.

In [30]:
# find object properties of Without a Trace
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting Without a Trace to something
    wd:Q826477 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Exclude data properties
    FILTER(!isLiteral(?o))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pName', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pName', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pName', 'creator')]
[('p', 'http://www.wikidata.org/prop/direct/P1811'), ('pName', 'list of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P2061'), ('pName', 'aspect ratio')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pName', 'original language of film or TV show')]
[('p', 'http://www.wikidata.org/prop/direct/P449'), ('pName', 'original broadcaster')]
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pName', 'country of origin')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pName', 'has part')]
[('p', 'http://www.wikidata.org/prop/direct/P674'), ('pName', 'characters')]
[('p', 'http://www.wikid

I try to use another property discovered before: ***has part (wdt:P527)***.

In [31]:
# has part
queryString = """
SELECT DISTINCT ?part ?partName WHERE { 

    # Connecting Without a Trace to something using property hasPart
    wd:Q826477 wdt:P527 ?part .

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q1120248'), ('partName', 'Without a Trace, season 1')]
[('part', 'http://www.wikidata.org/entity/Q3729810'), ('partName', 'Without a Trace, season 4')]
[('part', 'http://www.wikidata.org/entity/Q3729811'), ('partName', 'Without a Trace, season 5')]
[('part', 'http://www.wikidata.org/entity/Q3729812'), ('partName', 'Without a Trace, season 2')]
[('part', 'http://www.wikidata.org/entity/Q3729815'), ('partName', 'Without a Trace, season 6')]
[('part', 'http://www.wikidata.org/entity/Q3729816'), ('partName', 'Without a Trace, season 7')]
[('part', 'http://www.wikidata.org/entity/Q3729817'), ('partName', 'Without a Trace, season 3')]
7


It is possible to retrieve all the different seasons of the TV series. 
I show the properties of one of them ( ***Without a Trace, season 1 (wd:Q2715578)*** )

In [32]:
# properties of the first season 
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting Without a Trace S1 to something
    wd:Q2715578 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pName', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pName', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('pName', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pName', 'TV.com ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pName', 'EIDR content ID')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pName', 'Open Media Database film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pName', 'original language of film or TV show')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pName'

I have the property ***number of episodes (wdt:P1113)***. I try to use it on ***Without a Trace, season 1 (wd:Q2715578)***. I can finally retrieve the number of episodes per season of  ***"Without a Trace" (wd:Q826477)***.

In [33]:
# use number of episodes 
queryString = """
SELECT DISTINCT ?part ?partName ?numEpisodes WHERE { 

    # Retrieve Without a Trace seasons
    wd:Q826477 wdt:P527 ?part .
    
    # Retrieve number of episodes of each Without a Trace season
    ?part wdt:P1113 ?numEpisodes.

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q1120248'), ('partName', 'Without a Trace, season 1'), ('numEpisodes', '23')]
[('part', 'http://www.wikidata.org/entity/Q3729810'), ('partName', 'Without a Trace, season 4'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3729811'), ('partName', 'Without a Trace, season 5'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3729812'), ('partName', 'Without a Trace, season 2'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3729815'), ('partName', 'Without a Trace, season 6'), ('numEpisodes', '18')]
[('part', 'http://www.wikidata.org/entity/Q3729816'), ('partName', 'Without a Trace, season 7'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3729817'), ('partName', 'Without a Trace, season 3'), ('numEpisodes', '23')]
7


In [34]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"2", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_3
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_3.json
JSON object updated


### Task 3 : Get the number of episodes in which the cast members played a role. Who are the most present actors?

From a previous query, I can notice that each season has also the property ***has part (wdt:P527)***.

I want to see what is connected to ***Without a Trace, season 1 (wd:Q1120248)*** through this property.

In [35]:
# episodes of the first season
queryString = """
SELECT ?episode ?episodeName WHERE { 

    # Retrieve episodes of Without a Trace S1
    wd:Q1120248 wdt:P527 ?episode .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q52667904'), ('episodeName', 'Maple Street')]
[('episode', 'http://www.wikidata.org/entity/Q52667908'), ('episodeName', 'Underground Railroad')]
[('episode', 'http://www.wikidata.org/entity/Q52667910'), ('episodeName', 'Hang On to Me')]
[('episode', 'http://www.wikidata.org/entity/Q52667894'), ('episodeName', 'Snatch Back')]
[('episode', 'http://www.wikidata.org/entity/Q52667898'), ('episodeName', 'Little Big Man')]
[('episode', 'http://www.wikidata.org/entity/Q52667900'), ('episodeName', 'In Extremis')]
[('episode', 'http://www.wikidata.org/entity/Q52667911'), ('episodeName', 'The Friendly Skies')]
[('episode', 'http://www.wikidata.org/entity/Q52667916'), ('episodeName', 'There Goes the Bride')]
[('episode', 'http://www.wikidata.org/entity/Q52667920'), ('episodeName', 'Clare de Lune')]
[('episode', 'http://www.wikidata.org/entity/Q52667923'), ('episodeName', 'Kam Li')]
[('episode', 'http://www.wikidata.org/entity/Q52667926'), ('epis

Using the property ***has part (wdt:P527)*** on a single season, I can retrieve all the episodes of that season.

I have to discover the cast members, so I try to list all the properties of a specific episode: ***Pilot (wd:Q39069555)***

In [36]:
# find the cast members
queryString = """
SELECT DISTINCT ?p ?pName ?o ?oName WHERE { 

    # Connecting Pilot to something
    wd:Q39069555 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q551608'), ('oName', 'Enrique Murciano')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q355163'), ('oName', 'Bruce Davison')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q1125651'), ('oName', 'Thom Barry')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q139341'), ('oName', 'Zach Grenier')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q430872'), ('oName', 'Marianne Jean-Baptiste')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q308124'), ('oName', 'Anthony LaPaglia')]
[('p', 'http://www.wikidata.org/prop/direct/P161')

I discovered that the property ***cast member (wdt:P161)*** can be used to retrieve all the actors that partecipated in a specific episode.

Now I can count the number of episodes in which the cast members played a role, and show the most present actors.

In [37]:
# count the presence of the actors in the Tv serie
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve Without a Trace episodes 
    wd:Q826477 wdt:P527/wdt:P527 ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 6
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q551608'), ('actorName', 'Enrique Murciano'), ('numEpisodes', '160')]
[('actor', 'http://www.wikidata.org/entity/Q503040'), ('actorName', 'Eric Close'), ('numEpisodes', '160')]
[('actor', 'http://www.wikidata.org/entity/Q430872'), ('actorName', 'Marianne Jean-Baptiste'), ('numEpisodes', '160')]
[('actor', 'http://www.wikidata.org/entity/Q308124'), ('actorName', 'Anthony LaPaglia'), ('numEpisodes', '160')]
[('actor', 'http://www.wikidata.org/entity/Q235075'), ('actorName', 'Poppy Montgomery'), ('numEpisodes', '160')]
[('actor', 'http://www.wikidata.org/entity/Q18618690'), ('actorName', 'John Livingston'), ('numEpisodes', '1')]
6


In [38]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"3", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_3
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_3.json
JSON object updated


### Task 4: Check who is the actor who acted in more films while working on "Without a Trace" 

To check if an actor acted in a film while working on ***"Without a Trace" (wd:Q826477)***, I need to know when Without a Trace started and when it ended.

I can rely on these two properties discovered in a previous query: 
* ***start time (wdt:P580)*** 
* ***end time (wdt:P582)***

In [39]:
# start and end
queryString = """
SELECT ?startTime ?endTime WHERE { 

    # Retrieving Without a Trace startTime and endTime
    wd:Q826477  wdt:P580  ?startTime ;
                wdt:P582  ?endTime   . 
}
"""

print("Results")
x = run_query(queryString)

Results
[('startTime', '2002-09-26T00:00:00Z'), ('endTime', '2009-05-19T00:00:00Z')]
1


Hence, I need to check if an actor worked in a film between ***"2005-09-19"*** and ***"2014-03-31"***. To do this, I have to understand how actors and films are connected.

First, I try to retrieve all the object propertis of a specific actor : ***Anthony LaPaglia (wd:Q308124)***.

In [40]:
# see properties and objects of Anthony LaPaglia
queryString = """
SELECT ?p ?pName ?o ?oName WHERE { 

    # Connecting Anthony LaPaglia to something
    wd:Q308124  ?p  ?o .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received'), ('o', 'http://www.wikidata.org/entity/Q1131356'), ('oName', 'Theatre World Award')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received'), ('o', 'http://www.wikidata.org/entity/Q1044427'), ('oName', 'Primetime Emmy Award')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received'), ('o', 'http://www.wikidata.org/entity/Q1445521'), ('oName', 'Tony Award for Best Actor in a Play')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received'), ('o', 'http://www.wikidata.org/entity/Q530923'), ('oName', 'Primetime Emmy Award for Outstanding Guest Actor in a Comedy Series')]
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pName', 'country of citizenship'), ('o', 'http://www.wikidata.org/entity/Q408'), ('oName', 'Australia')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source'), ('o', 'http://www.wikid

Maybe there are connection in the opposite direction: ***?s ?p wd:Q308124***.

In [41]:
# opposite direction
queryString = """
SELECT ?s ?sName ?p ?pName WHERE { 

    # Connecting something to Cobie Smulders
    ?s ?p wd:Q308124 .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?s <http://schema.org/name> ?sName .
}
ORDER BY ?pName
LIMIT 20
"""

print("Results")
x = run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q16954197'), ('sName', 'The Code'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q1570302'), ('sName', 'Lansky'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q21511509'), ('sName', 'Annabelle: Creation'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q1352085'), ('sName', 'Analyze That'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q3805682'), ('sName', 'Jack the Dog'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q826477'), ('sName', 'Without a Trace'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q255376'), ('sName', '

I can use the property ***cast member (wdt:P161)*** as before, but I need to understand how to select only films and not other TV series.

In [42]:
queryString = """
SELECT DISTINCT ?class ?className WHERE { 

    # Retrieve class of films
    ?s wdt:P161 wd:Q308124 ;
        wdt:P31 ?class .
    
    # This returns the labels
    ?class <http://schema.org/name> ?className .
}
"""

print("Results")
x = run_query(queryString)

Results
[('class', 'http://www.wikidata.org/entity/Q11424'), ('className', 'film')]
[('class', 'http://www.wikidata.org/entity/Q21191270'), ('className', 'television series episode')]
[('class', 'http://www.wikidata.org/entity/Q506240'), ('className', 'television film')]
[('class', 'http://www.wikidata.org/entity/Q5398426'), ('className', 'television series')]
[('class', 'http://www.wikidata.org/entity/Q653916'), ('className', 'television pilot')]
5


I retrieved ***film (wd:Q11424)***. Now, I can retrieve only the films of a specific actor : ***Anthony LaPaglia (wd:Q308124)***.

In [43]:
queryString = """
SELECT ?film ?filmName WHERE { 

    # Retrieve films in which Cobie Smulders acted
    ?film  wdt:P161 wd:Q308124 ;
           wdt:P31  wd:Q11424  .
    
    # This returns the labels
    ?film <http://schema.org/name> ?filmName .
}
ORDER BY ?filmName
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q578700'), ('filmName', '29th Street')]
[('film', 'http://www.wikidata.org/entity/Q18150326'), ('filmName', 'A Good Marriage')]
[('film', 'http://www.wikidata.org/entity/Q21869637'), ('filmName', 'A Month of Sundays')]
[('film', 'http://www.wikidata.org/entity/Q1352085'), ('filmName', 'Analyze That')]
[('film', 'http://www.wikidata.org/entity/Q21511509'), ('filmName', 'Annabelle: Creation')]
[('film', 'http://www.wikidata.org/entity/Q1125349'), ('filmName', 'Autumn in New York')]
[('film', 'http://www.wikidata.org/entity/Q2597292'), ('filmName', 'Balibo')]
[('film', 'http://www.wikidata.org/entity/Q4409907'), ('filmName', "Betsy's Wedding")]
[('film', 'http://www.wikidata.org/entity/Q16248717'), ('filmName', 'Big Stone Gap')]
[('film', 'http://www.wikidata.org/entity/Q4968055'), ('filmName', 'Brilliant Lies')]
[('film', 'http://www.wikidata.org/entity/Q1167468'), ('filmName', 'Chameleon')]
[('film', 'http://www.wikidata.org/entity/Q1107

I have to retrieve the publication date of a film. I check if there is a property of films that contains the word "date".

In [44]:
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connect something to Avengers Infinity War
    ?film  wdt:P161 wd:Q308124 ;
            ?p ?o;
           wdt:P31  wd:Q11424  .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    
    # I use a regex to search a property that contains the word "date"
    FILTER(REGEX(?pName, "date"))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pName', 'publication date')]
[('p', 'http://www.wikidata.org/prop/direct/P1191'), ('pName', 'date of first performance')]
2


So to retrieve the publication date of a ***film (wd:Q11424)***, I can use the property ***publication date (wdt:P577)***.

Now I can finally answer to the initial question: who is the actor who acted in more films while working on ***"Without a Trace" (wd:Q826477)***.

In [46]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?film) AS ?numFilms WHERE { 

    # Retrieve all the Without a Trace cast members
    ?season wdt:P179 wd:Q826477 .        #BGP to retrieve all the seasons
    ?season wdt:P527 ?episode .          #BGP to retrieve all the episodes of a season
    ?episode wdt:P161 ?actor.       #BGP to retireve all cast members that participated in the episode
    
    # Retrieve start and end time
    wd:Q826477  wdt:P580  ?startTime ;
                wdt:P582  ?endTime   . 
    
    # Retrieve films in which actor of Without a Trace acted
    ?film  wdt:P31   wd:Q11424         ;
           wdt:P161  ?actor            ;
           wdt:P577  ?publicationDate  .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    
    # I want only films that were published while the actor was working on "Without a Trace".
    FILTER (?publicationDate > ?startTime AND ?publicationDate < ?endTime )
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numFilms)
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q355163'), ('actorName', 'Bruce Davison'), ('numFilms', '9')]
[('actor', 'http://www.wikidata.org/entity/Q373989'), ('actorName', 'David Paymer'), ('numFilms', '8')]
[('actor', 'http://www.wikidata.org/entity/Q706513'), ('actorName', 'Charles S. Dutton'), ('numFilms', '7')]
[('actor', 'http://www.wikidata.org/entity/Q308124'), ('actorName', 'Anthony LaPaglia'), ('numFilms', '7')]
[('actor', 'http://www.wikidata.org/entity/Q551608'), ('actorName', 'Enrique Murciano'), ('numFilms', '4')]
[('actor', 'http://www.wikidata.org/entity/Q139341'), ('actorName', 'Zach Grenier'), ('numFilms', '4')]
[('actor', 'http://www.wikidata.org/entity/Q3265830'), ('actorName', 'Lucille Soong'), ('numFilms', '3')]
[('actor', 'http://www.wikidata.org/entity/Q238501'), ('actorName', 'Alex Veadov'), ('numFilms', '3')]
[('actor', 'http://www.wikidata.org/entity/Q18618690'), ('actorName', 'John Livingston'), ('numFilms', '2')]
[('actor', 'http://www.wikidata.org/

In [47]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"4", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_3
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_3.json
JSON object updated


The actor who acted in more films while working on "Without a Trace" is ***Bruce Davison (wd:Q355163)*** with 9 films.

### Task 5 : Compare Without a Trace with the tv series "Cold Case" in terms of number of seasons, episods and cast members.

To compare the number of seasons and episodes I can rely on the same query used in Task 1.

In [84]:
queryString = """
SELECT ?tv_series ?name ?numEpisodes ?numSeasons WHERE { 
    
    VALUES ?tv_series { wd:Q826477 wd:Q733960 }
    
    # Retrieve numEpisodes and numSeasons
    ?tv_series  wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons .
    ?tv_series sc:name ?name
}
GROUP BY ?tv_series ?name
"""

print("Results")
x = run_query(queryString)

Results
[('tv_series', 'http://www.wikidata.org/entity/Q733960'), ('name', 'Cold Case'), ('numEpisodes', '156'), ('numSeasons', '7')]
[('tv_series', 'http://www.wikidata.org/entity/Q826477'), ('name', 'Without a Trace'), ('numEpisodes', '160'), ('numSeasons', '7')]
2


I can also retrieve who are the actors who partecipated in the highest number of episodes in ***Cold Case (wd:Q733960)*** using the same query used in Task 2.

In [85]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve The Office episodes 
    wd:Q733960 wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q1700068'), ('actorName', 'John Finn'), ('numEpisodes', '146')]
[('actor', 'http://www.wikidata.org/entity/Q232968'), ('actorName', 'Kathryn Morris'), ('numEpisodes', '146')]
[('actor', 'http://www.wikidata.org/entity/Q123836'), ('actorName', 'Danny Pino'), ('numEpisodes', '142')]
[('actor', 'http://www.wikidata.org/entity/Q1125651'), ('actorName', 'Thom Barry'), ('numEpisodes', '142')]
[('actor', 'http://www.wikidata.org/entity/Q1338682'), ('actorName', 'Jeremy Ratchford'), ('numEpisodes', '141')]
[('actor', 'http://www.wikidata.org/entity/Q241160'), ('actorName', 'Tracie Thoms'), ('numEpisodes', '95')]
[('actor', 'http://www.wikidata.org/entity/Q262241'), ('actorName', 'Nicki Aycox'), ('numEpisodes', '12')]
[('actor', 'http://www.wikidata.org/entity/Q1703231'), ('actorName', 'Jonathan LaPaglia'), ('numEpisodes', '10')]
[('actor', 'http://www.wikidata.org/entity/Q597515'), ('actorName', 'Josh Hopkins'), ('numEpisodes', '9')]
[('actor'

I want to check which TV series has the largest cast between ***Cold Case (wd:Q733960)*** and ***"Without a Trace" (wd:Q826477)***.

In [90]:
queryString = """
SELECT ?tv_series ?name (COUNT(DISTINCT ?actor) AS ?numActors) WHERE { 
    
    VALUES ?tv_series { wd:Q826477 wd:Q733960 }

    # Retrieve episodes 
    ?tv_series wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?tv_series <http://schema.org/name> ?name .
}
GROUP BY ?tv_series ?name
"""

print("Results")
x = run_query(queryString)

Results
[('tv_series', 'http://www.wikidata.org/entity/Q826477'), ('name', 'Without a Trace'), ('numActors', '30')]
[('tv_series', 'http://www.wikidata.org/entity/Q733960'), ('name', 'Cold Case'), ('numActors', '674')]
2


### Task 6 : Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

The Bacon number of an actor is the number of degrees of separation he or she has from Kevin Bacon. So, first af all, I need to retrieve Kevin Bacon.

To do this, I can use a ***REGEX*** on the surname connect through the property ***family name (wdt:P734)*** discovered in a previous query.

In [92]:
queryString = """
SELECT DISTINCT ?person ?personName ?personSurname WHERE { 

    # Retrieve surname of a person using the property family name
    ?person wdt:P734 ?surname .
    
    # This returns the labels
    ?person <http://schema.org/name> ?personName .
    ?surname <http://schema.org/name> ?personSurname .

    # Since Kevin Bacon is an actor, he probably acted in a film.
    FILTER EXISTS{
        ?film   wdt:P31   wd:Q11424 ;
                wdt:P161  ?person   .             
    }
    
    # I use a regex to search for a surname that contains the word "Bacon"
    FILTER(REGEX(?personSurname, "Bacon"))
    
}
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q3102228'), ('personName', 'Georges Baconnet'), ('personSurname', 'Baconnet')]
[('person', 'http://www.wikidata.org/entity/Q3116093'), ('personName', 'Irving Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q503597'), ('personName', 'James Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3454165'), ('personName', 'Kevin Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3491343'), ('personName', 'Sosie Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3992438'), ('personName', 'Tom Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q706678'), ('personName', 'Lloyd Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q65116263'), ('personName', 'Marco Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q5216474'), ('personNa

I have ***Kevin Bacon (wd:Q3454165)***. Now I can retrieve all the cast members of ***"Without a Trace" (wd:Q826477)*** with Kevin Bacon number equal to 2.

First, I start with cast members that have a Kevin Bacon Number equal to 1.

In [93]:
queryString = """
SELECT ?actor ?actorName ?film ?filmName WHERE { 

    # Retrieve Without a Trace actors
    wd:Q826477 wdt:P161 ?actor .
    
    # Ensure that the actor and Kevin Bacon worked together
    ?film wdt:P161 ?actor      ;
          wdt:P161 wd:Q3454165 .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q308124'), ('actorName', 'Anthony LaPaglia'), ('film', 'http://www.wikidata.org/entity/Q127760'), ('filmName', 'He Said, She Said')]
1


Now I want only cast members of ***"Without a Trace" (wd:Q826477)*** that have a Kevin Bacon Number equal to 2.

In [94]:
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    # Retrieve Without a Trace actors
    wd:Q826477 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
LIMIT 30
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q235075'), ('actorName', 'Poppy Montgomery')]
1


In [95]:
queryString = """
SELECT COUNT(DISTINCT ?actor) WHERE { 

    # Retrieve Without a Trace actors
    wd:Q826477 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
x = run_query(queryString)


Results
[('callret-0', '1')]
1


There are only one actor who partecipated to ***"Without a Trace" (wd:Q826477)*** with Kevin Bacon Number equals to 2.

I can also show how a cast member of ***"Without a Trace" (wd:Q826477)***  with Kevin Bacon Number equal to 2 is connected to ***Kevin Bacon (wd:Q3454165)***.

In [97]:
queryString = """
SELECT DISTINCT ?actorName ?filmMiddleName ?actorMiddleName ?filmName WHERE { 

    # Retrieve Without a Trace actors
    wd:Q826477 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
    ?actorMiddle <http://schema.org/name> ?actorMiddleName .
    ?filmMiddle <http://schema.org/name> ?filmMiddleName .
}
LIMIT 1
"""

print("Results")
x = run_query(queryString)

Results
[('actorName', 'Poppy Montgomery'), ('filmMiddleName', 'Dead Man on Campus'), ('actorMiddleName', 'Linda Cardellini'), ('filmName', 'Super')]
1


## Task 7

In [108]:
queryString = """
SELECT ?s ?name COUNT(DISTINCT ?aw) AS ?awards WHERE { 
    
    # Retrieve episodes 
    wd:Q733960  wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # Retrieve class of films
    ?s wdt:P161 ?actor ;
        wdt:P166 ?aw;
        wdt:P31 wd:Q5398426 .
    
    # This returns the labels
    ?s sc:name ?name .
}
GROUP BY ?s ?name
ORDER BY DESC(?awards)
"""

print("Results")
x = run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q1079'), ('name', 'Breaking Bad'), ('awards', '21')]
[('s', 'http://www.wikidata.org/entity/Q56153643'), ('name', 'Watchmen'), ('awards', '20')]
[('s', 'http://www.wikidata.org/entity/Q1132439'), ('name', 'The Practice'), ('awards', '12')]
[('s', 'http://www.wikidata.org/entity/Q16756'), ('name', 'Modern Family'), ('awards', '11')]
[('s', 'http://www.wikidata.org/entity/Q8539'), ('name', 'The Big Bang Theory'), ('awards', '10')]
[('s', 'http://www.wikidata.org/entity/Q1136370'), ('name', 'General Hospital'), ('awards', '10')]
[('s', 'http://www.wikidata.org/entity/Q1145764'), ('name', 'Guiding Light'), ('awards', '9')]
[('s', 'http://www.wikidata.org/entity/Q244803'), ('name', 'Ally McBeal'), ('awards', '9')]
[('s', 'http://www.wikidata.org/entity/Q30599007'), ('name', 'Succession'), ('awards', '8')]
[('s', 'http://www.wikidata.org/entity/Q1030713'), ('name', 'Another World'), ('awards', '8')]
[('s', 'http://www.wikidata.org/entity/Q438406