# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later
import sys
import os
import json
import pandas as pd
sys.path.insert(1, '../../../../src/')
import gt_modules.evaluation as evaluation
from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-movie4-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series HIMYM explorative search") 


Consider the following exploratory scenario:


> we are interested in the TV series "How I met your mother" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q147235` | How I met your mother        | node |
| `wd:Q23831` | The Office (US)        | node |



Also consider

```
wd:Q23831 ?p ?obj .
```

is the BGP to retrieve all **properties of The Office (US)**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should

1. Identify the BGP for tv series

2. Return the number of seasons and episodes per season of the tv series (the result set must be triples of season IRI, label and #episodes).

3. Get the number of episodes in which the cast members played a role. Who are the most present actors? (the result set must be a list of triples actor/actress IRI, label and #episodes)

4. Check who is the actor who acted in more films while working on "How I met your mother" (the result set must be a list of triples actor/actress IRI, label and #films).

5. Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members (the result set must be two elements -one for each tv series- of tv series IRI, label, #seasons, #episodes and #cast members).

6. Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 (the result set must be a list of couples actor/actress IRI and label).

7. Consider the actors who are members of the cast of HIMYM. Amongst the tv series which these actors acted return only those which received more than 2 awards (the result set must be triples of tv series IRI, label, #awards won).

In [4]:
## startup the evaluation
# setup the file and create the empty json
ipname = "m4.ipynb"
pt = os.getcwd()+os.sep+ipname
evaluation.setup(pt)

The index of this workflow is: 2_4


### Task 1

In [5]:
# properties of HIMYM
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM to something
    wd:Q147235 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Only data properties
    FILTER(isLiteral(?o))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('pName', 'AlloCiné series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pName', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pName', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1874'), ('pName', 'Netflix ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1922'), ('pName', 'first line')]
[('p', 'http://www.wikidata.org/prop/direct/P2002'), ('pName', 'Twitter username')]
[('p', 'http://www.wikidata.org/prop/direct/P2047'), ('pName', 'duration')]
[('p', 'http://www.wikidata.org/prop/direct/P214'), ('pName', 'VIAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('pName', 'GND ID')]
[('p', 'http://www.wikidata.org/

In [6]:
# instance of
queryString = """
SELECT DISTINCT ?o ?pName WHERE { 

    # Connecting HIMYM to something
    wd:Q147235 wdt:P31  ?o.

    # This returns the labels
    ?o <http://schema.org/name> ?pName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q5398426'), ('pName', 'television series')]
1


In [7]:
### insert the result of TASK 1 in the file
og_uri = "http://www.wikidata.org/entity/Q5398426"
og_name = "television series"
obj = {"uri":og_uri,"name":og_name}
evaluation.add_result(evaluation.get_index_workflow(pt),"1", evaluation.TYPE_SINGLE ,"uri", [obj] ,"all")

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated


### Task 2 : Return the number of seasons and episodes per season of the tv series.

I'm interested on the TV series ***"How I met your mother" (wd:Q147235)***, so as a starting point I show all the data properties of this TV series.

I discovered the two properties: ***number of episodes (wdt:P1113)*** and ***number of seasons (wdt:P2437)***. 

I try to use them on ***"How I met your mother" (wd:Q147235)***.

In [8]:
# use number of episodes and seasons
queryString = """
SELECT ?numEpisodes ?numSeasons WHERE { 

    # Retrieve HIMYM numEpisodes and numSeasons
    wd:Q147235  wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons  .
}
"""

print("Results")
x = run_query(queryString)

Results
[('numEpisodes', '208'), ('numSeasons', '9')]
1


Now I have to discover how many episodes there are for each season. To do this I show all the object properties of ***"How I met your mother" (wd:Q147235)***.

In [9]:
# get the object properties
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM to something
    wd:Q147235 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Exclude data properties
    FILTER(!isLiteral(?o))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pName', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('pName', 'nominated for')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('pName', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pName', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P162'), ('pName', 'producer')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pName', 'creator')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pName', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P1811'), ('pName', 'list of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1881'), ('pName', 'list of characters')]
[('p', 'http://www.wikidata.org/prop/direct/P2

I try to use another property discovered before: ***has part (wdt:P527)***.

In [10]:
# use has part
queryString = """
SELECT DISTINCT ?part ?partName WHERE { 

    # Connecting HIMYM to something using property hasPart
    wd:Q147235 wdt:P527 ?part .

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q2715578'), ('partName', 'How I Met Your Mother, season 1')]
[('part', 'http://www.wikidata.org/entity/Q338715'), ('partName', 'How I Met Your Mother, season 8')]
[('part', 'http://www.wikidata.org/entity/Q2567330'), ('partName', 'How I Met Your Mother, season 4')]
[('part', 'http://www.wikidata.org/entity/Q2555117'), ('partName', 'How I Met Your Mother, season 3')]
[('part', 'http://www.wikidata.org/entity/Q13567027'), ('partName', 'How I Met Your Mother, season 9')]
[('part', 'http://www.wikidata.org/entity/Q2438066'), ('partName', 'How I Met Your Mother, season 6')]
[('part', 'http://www.wikidata.org/entity/Q582332'), ('partName', 'How I Met Your Mother, season 5')]
[('part', 'http://www.wikidata.org/entity/Q3468515'), ('partName', 'How I Met Your Mother, season 2')]
[('part', 'http://www.wikidata.org/entity/Q2472427'), ('partName', 'How I Met Your Mother, season 7')]
9


It is possible to retrieve all the different seasons of the TV series. 
I show the properties of one of them ( ***How I Met Your Mother, season 1 (wd:Q2715578)*** )

In [11]:
# properties of the first season of HIMYM
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM S1 to something
    wd:Q2715578 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pName', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pName', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('pName', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pName', 'TV.com ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pName', 'EIDR content ID')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pName', 'Open Media Database film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pName', 'original language of film or TV show')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pName'

I have the property ***number of episodes (wdt:P1113)***. I try to use it on ***How I Met Your Mother, season 1 (wd:Q2715578)***. I can finally retrieve the number of episodes per season of  ***"How I met your mother" (wd:Q147235)***.

In [12]:
# final result
queryString = """
SELECT DISTINCT ?part ?partName ?numEpisodes WHERE { 

    # Retrieve HIMYM seasons
    wd:Q147235 wdt:P527 ?part .
    
    # Retrieve number of episodes of each HIMYM season
    ?part wdt:P1113 ?numEpisodes.

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q2715578'), ('partName', 'How I Met Your Mother, season 1'), ('numEpisodes', '22')]
[('part', 'http://www.wikidata.org/entity/Q338715'), ('partName', 'How I Met Your Mother, season 8'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q2567330'), ('partName', 'How I Met Your Mother, season 4'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q2555117'), ('partName', 'How I Met Your Mother, season 3'), ('numEpisodes', '20')]
[('part', 'http://www.wikidata.org/entity/Q13567027'), ('partName', 'How I Met Your Mother, season 9'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q2438066'), ('partName', 'How I Met Your Mother, season 6'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q582332'), ('partName', 'How I Met Your Mother, season 5'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3468515'), ('partName', 'How I Met Your Mother, season 2'), ('numEpi

In [13]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"2", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated


### Task 3 : Get the number of episodes in which the cast members played a role. Who are the most present actors?

From a previous query, I can notice that each season has also the property ***has part (wdt:P527)***.

I want to see what is connected to ***How I Met Your Mother, season 1 (wd:Q2715578)*** through this property.

In [14]:
# episodes of first season
queryString = """
SELECT ?episode ?episodeName WHERE { 

    # Retrieve episodes of HIMYM S1
    wd:Q2715578 wdt:P527 ?episode .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q11696021'), ('episodeName', 'Nothing Good Happens After 2 A.M.')]
[('episode', 'http://www.wikidata.org/entity/Q1327587'), ('episodeName', 'Okay Awesome')]
[('episode', 'http://www.wikidata.org/entity/Q3480575'), ('episodeName', 'Return of the Shirt')]
[('episode', 'http://www.wikidata.org/entity/Q467447'), ('episodeName', 'Pilot')]
[('episode', 'http://www.wikidata.org/entity/Q468587'), ('episodeName', 'Purple Giraffe')]
[('episode', 'http://www.wikidata.org/entity/Q471448'), ('episodeName', 'Sweet Taste of Liberty')]
[('episode', 'http://www.wikidata.org/entity/Q4809956'), ('episodeName', 'Game Night')]
[('episode', 'http://www.wikidata.org/entity/Q4817584'), ('episodeName', 'Drumroll, Please')]
[('episode', 'http://www.wikidata.org/entity/Q4818636'), ('episodeName', 'Life Among the Gorillas')]
[('episode', 'http://www.wikidata.org/entity/Q4818989'), ('episodeName', 'Cupcake')]
[('episode', 'http://www.wikidata.org/entity/Q4819005

Using the property ***has part (wdt:P527)*** on a single season, I can retrieve all the episodes of that season.

I have to discover the cast members, so I try to list all the properties of a specific episode: ***The Pineapple Incident (wd:Q7757165)***

In [15]:
# properties of  The Pineapple Incident, an episode of the first season
queryString = """
SELECT DISTINCT ?p ?pName ?o ?oName WHERE { 

    # Connecting The Pineapple Incident to something
    wd:Q7757165 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q199927'), ('oName', 'Alyson Hannigan')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q202304'), ('oName', 'Jason Segel')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q200566'), ('oName', 'Cobie Smulders')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q333544'), ('oName', 'Bob Saget')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q485310'), ('oName', 'Neil Patrick Harris')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q242949'), ('oName', 'Danica McKellar')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pNa

I discovered that the property ***cast member (wdt:P161)*** can be used to retrieve all the actors that partecipated in a specific episode.

Now I can count the number of episodes in which the cast members played a role, and show the most present actors.

In [16]:
# count the presence of the actors in the episodes
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve HIMYM episodes 
    wd:Q147235 wdt:P527/wdt:P527 ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 6
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorName', 'Josh Radnor'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorName', 'Alyson Hannigan'), ('numEpisodes', '143')]
[('actor', 'http://www.wikidata.org/entity/Q333544'), ('actorName', 'Bob Saget'), ('numEpisodes', '142')]
6


In [17]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"3", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated


### Task 4: Check who is the actor who acted in more films while working on "How I met your mother" 

To check if an actor acted in a film while working on ***"How I met your mother" (wd:Q147235)***, I need to know when HIMYM started and when it ended.

I can rely on these two properties discovered in a previous query: 
* ***start time (wdt:P580)*** 
* ***end time (wdt:P582)***

In [18]:
# use start and end time
queryString = """
SELECT ?startTime ?endTime WHERE { 

    # Retrieving HIMYM startTime and endTime
    wd:Q147235  wdt:P580  ?startTime ;
                wdt:P582  ?endTime   . 
}
"""

print("Results")
x = run_query(queryString)

Results
[('startTime', '2005-09-19T00:00:00Z'), ('endTime', '2014-03-31T00:00:00Z')]
1


Hence, I need to check if an actor worked in a film between ***"2005-09-19"*** and ***"2014-03-31"***. To do this, I have to understand how actors and films are connected.

First, I try to retrieve all the object propertis of a specific actor : ***Cobie Smulders (wd:Q200566)***.

In [19]:
# retrieve all the object propertis of a specific actor : Cobie Smulders 
queryString = """
SELECT ?p ?pName ?o ?oName WHERE { 

    # Connecting Cobie Smulders to something
    wd:Q200566  ?p  ?o .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pName', 'country of citizenship'), ('o', 'http://www.wikidata.org/entity/Q16'), ('oName', 'Canada')]
[('p', 'http://www.wikidata.org/prop/direct/P734'), ('pName', 'family name'), ('o', 'http://www.wikidata.org/entity/Q2018583'), ('oName', 'Smulders')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q325872'), ('oName', 'Maria')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q15208593'), ('oName', 'Francisca')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q6119607'), ('oName', 'Jacoba')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q5'), ('oName', 'human')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pName', 'languages spoken, written or signed'

Maybe there are connection in the opposite direction: ***?s ?p wd:Q200566***.

In [20]:
# look at the opposite direction
queryString = """
SELECT ?s ?sName ?p ?pName WHERE { 

    # Connecting something to Cobie Smulders
    ?s ?p wd:Q200566 .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?s <http://schema.org/name> ?sName .
}
ORDER BY ?pName
LIMIT 20
"""

print("Results")
x = run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q21866873'), ('sName', 'The Intervention'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q537911'), ('sName', 'Agents of S.H.I.E.L.D.'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q51963292'), ('sName', 'Marvel Cinematic Universe Phase One'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q63405798'), ('sName', 'The Infinity Saga'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q65070140'), ('sName', 'Stumptown'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q11696021'), ('sName', 'Nothing Good Happens After 2 A.M.'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')

I can use the property ***cast member (wdt:P161)*** as before, but I need to understand how to select only films and not other TV series.

I use the property ***instanceOf (wdt:P31)*** on ***Avengers: Infinity War (wd:Q23780914)***.

In [21]:
# instance of a film
queryString = """
SELECT ?class ?className WHERE { 

    # Retrieve class of Avengers Infinity War
    wd:Q23780914 wdt:P31 ?class .
    
    # This returns the labels
    ?class <http://schema.org/name> ?className .
}
"""

print("Results")
x = run_query(queryString)

Results
[('class', 'http://www.wikidata.org/entity/Q11424'), ('className', 'film')]
1


I retrieved ***film (wd:Q11424)***. Now, I can retrieve only the films of a specific actor : ***Cobie Smulders (wd:Q200566)***.

In [22]:
# retrieve only the films of a specific actor : Cobie Smulders
queryString = """
SELECT ?film ?filmName WHERE { 

    # Retrieve films in which Cobie Smulders acted
    ?film  wdt:P161 wd:Q200566 ;
           wdt:P31  wd:Q11424  .
    
    # This returns the labels
    ?film <http://schema.org/name> ?filmName .
}
ORDER BY ?filmName
"""

print("Results")
x = run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q14171368'), ('filmName', 'Avengers: Age of Ultron')]
[('film', 'http://www.wikidata.org/entity/Q23781155'), ('filmName', 'Avengers: Endgame')]
[('film', 'http://www.wikidata.org/entity/Q23780914'), ('filmName', 'Avengers: Infinity War')]
[('film', 'http://www.wikidata.org/entity/Q1765358'), ('filmName', 'Captain America: The Winter Soldier')]
[('film', 'http://www.wikidata.org/entity/Q7729669'), ('filmName', 'Delivery Man')]
[('film', 'http://www.wikidata.org/entity/Q3012583'), ('filmName', 'Grassroots')]
[('film', 'http://www.wikidata.org/entity/Q21168538'), ('filmName', 'Jack Reacher: Never Go Back')]
[('film', 'http://www.wikidata.org/entity/Q27663881'), ('filmName', 'Killing Gunther')]
[('film', 'http://www.wikidata.org/entity/Q27888468'), ('filmName', 'Lennon or McCartney')]
[('film', 'http://www.wikidata.org/entity/Q18703883'), ('filmName', 'Results')]
[('film', 'http://www.wikidata.org/entity/Q1767513'), ('filmName', 'Safe Haven

I have to retrieve the publication date of a film. I check if there is a property of ***Avengers: Infinity War (wd:Q23780914)*** that contains the word "date".

In [23]:
# find publication date of a film
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connect something to Avengers Infinity War
    wd:Q23780914 ?p ?o.
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    
    # I use a regex to search a property that contains the word "date"
    FILTER(REGEX(?pName, "date"))
}
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pName', 'publication date')]
1


So to retrieve the publication date of a ***film (wd:Q11424)***, I can use the property ***publication date (wdt:P577)***.

Now I can finally answer to the initial question: who is the actor who acted in more films while working on ***"How I met your mother" (wd:Q147235)***.

In [24]:
# find the final result 
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?film) AS ?numFilms WHERE { 

    # Retrieve all the HIMYM cast members
    wd:Q147235 wdt:P161 ?actor .
    
    # Retrieve films in which actor of HIMYM acted
    ?film  wdt:P31   wd:Q11424         ;
           wdt:P161  ?actor            ;
           wdt:P577  ?publicationDate  .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    
    # Retrieving HIMYM startTime and endTime
    wd:Q147235  wdt:P580  ?startTime ;
                wdt:P582  ?endTime   . 
    
    # I want only films that were published while the actor was working on "How I met your mother".
    FILTER (?publicationDate > ?startTime AND ?publicationDate < ?endTime )
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numFilms)
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('actorName', 'Danny Glover'), ('numFilms', '40')]
[('actor', 'http://www.wikidata.org/entity/Q229669'), ('actorName', 'Malin Åkerman'), ('numFilms', '21')]
[('actor', 'http://www.wikidata.org/entity/Q469579'), ('actorName', 'Mircea Monroe'), ('numFilms', '19')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('actorName', 'Judy Greer'), ('numFilms', '18')]
[('actor', 'http://www.wikidata.org/entity/Q1319539'), ('actorName', 'Thomas Lennon'), ('numFilms', '18')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('actorName', 'Will Forte'), ('numFilms', '17')]
[('actor', 'http://www.wikidata.org/entity/Q1189470'), ('actorName', 'Jimmi Simpson'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q566037'), ('actorName', 'Scoot McNairy'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q530646'), ('actorName', 'Ray Wise'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q716

In [25]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"4", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated


The actor who acted in more films while working on "How I met your mother" is ***Danny Glover (wd:Q192165)*** with 40 films.

### Task 5 
Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members (the result set must be two elements -one for each tv series- of tv series IRI, label, #seasons, #episodes and #cast members).

To compare the number of seasons and episodes I can rely on the same query used in Task 1.

In [26]:
queryString = """
SELECT ?tv_series ?name ?numEpisodes ?numSeasons WHERE { 
    
    VALUES ?tv_series { wd:Q147235 wd:Q23831 }
    
    # Retrieve numEpisodes and numSeasons
    ?tv_series  wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons .
    ?tv_series sc:name ?name
}
GROUP BY ?tv_series ?name
"""

print("Results")
x = run_query(queryString)

Results
[('tv_series', 'http://www.wikidata.org/entity/Q23831'), ('name', 'The Office'), ('numEpisodes', '201'), ('numSeasons', '9')]
[('tv_series', 'http://www.wikidata.org/entity/Q147235'), ('name', 'How I Met Your Mother'), ('numEpisodes', '208'), ('numSeasons', '9')]
2


I can also retrieve who are the actors who partecipated in the highest number of episodes in ***The Office (US) (wd:Q23831)*** using the same query used in Task 2.

In [27]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q313039'), ('actorName', 'John Krasinski'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q72077'), ('actorName', 'Ellie Kemper'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q238877'), ('actorName', 'Jenna Fischer'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q926912'), ('actorName', 'Craig Robinson'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q254766'), ('actorName', 'Catherine Tate'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q349548'), ('actorName', 'Rainn Wilson'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q328790'), ('actorName', 'Ed Helms'), ('numEpisodes', '1')]
7


The results are quite strange, because I have only one episode for each actor. 

I want to check if there is any problem with the first part of the query.

In [28]:
queryString = """
SELECT ?episode ?episodeName WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P527{2} ?episode .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q4838397'), ('episodeName', 'Baby Shower')]
[('episode', 'http://www.wikidata.org/entity/Q5001718'), ('episodeName', 'Business Ethics')]
[('episode', 'http://www.wikidata.org/entity/Q5185169'), ('episodeName', 'Crime Aid')]
[('episode', 'http://www.wikidata.org/entity/Q5017095'), ('episodeName', 'Cafe Disco')]
[('episode', 'http://www.wikidata.org/entity/Q5050849'), ('episodeName', 'Casual Friday')]
[('episode', 'http://www.wikidata.org/entity/Q5155530'), ('episodeName', 'Company Picnic')]
[('episode', 'http://www.wikidata.org/entity/Q5178024'), ('episodeName', 'Couples Discount')]
[('episode', 'http://www.wikidata.org/entity/Q6927074'), ('episodeName', 'Moving On')]
[('episode', 'http://www.wikidata.org/entity/Q7914333'), ('episodeName', 'Vandalism')]
[('episode', 'http://www.wikidata.org/entity/Q5111408'), ('episodeName', 'Christmas Party')]
10


The episodes are retrieved correctly.

Maybe there is a problem with the property ***cast member (wdt:P161)***. I show all the properties of a single episode: ***Ultimatum (wd:Q7880294)***

In [29]:
queryString = """
SELECT ?p ?pName ?o ?oName WHERE { 

    # Connecting Ultimatum to something
    wd:Q7880294  ?p  ?o .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
x = run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pName', 'country of origin'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of America')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pName', 'director'), ('o', 'http://www.wikidata.org/entity/Q5239168'), ('oName', 'David Rogers')]
[('p', 'http://www.wikidata.org/prop/direct/P750'), ('pName', 'distributed by'), ('o', 'http://www.wikidata.org/entity/Q5371838'), ('oName', 'Vudu')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pName', 'distribution format'), ('o', 'http://www.wikidata.org/entity/Q723685'), ('oName', 'video on demand')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pName', 'followed by'), ('o', 'http://www.wikidata.org/entity/Q7763280'), ('oName', 'The Seminar')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows'), ('o', 'http://www.wikidata.org/entity/Q5128465'), ('oName', 'Classy Christmas')]
[('p', 'http://www.wikidata.org/prop/direct/P31')

Ok so the episodes of ***The Office (US) (wd:Q23831)*** do not have the property ***cast member (wdt:P161)*** as for HIMYM episodes.

The only way to retrieve the actors of the TV series, is using ***cast member (wdt:P161)*** directly on the TV Series ( ***The Office (US) (wd:Q23831)*** ).

In [30]:
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q1050211'), ('actorName', 'Leslie David Baker')]
[('actor', 'http://www.wikidata.org/entity/Q1139248'), ('actorName', 'Oscar Nunez')]
[('actor', 'http://www.wikidata.org/entity/Q216221'), ('actorName', 'Steve Carell')]
[('actor', 'http://www.wikidata.org/entity/Q2238008'), ('actorName', 'Creed Bratton')]
[('actor', 'http://www.wikidata.org/entity/Q231203'), ('actorName', 'Amy Ryan')]
[('actor', 'http://www.wikidata.org/entity/Q238877'), ('actorName', 'Jenna Fischer')]
[('actor', 'http://www.wikidata.org/entity/Q254766'), ('actorName', 'Catherine Tate')]
[('actor', 'http://www.wikidata.org/entity/Q2669971'), ('actorName', 'Angela Kinsey')]
[('actor', 'http://www.wikidata.org/entity/Q2671438'), ('actorName', 'Paul Lieberstein')]
[('actor', 'http://www.wikidata.org/entity/Q269901'), ('actorName', 'Melora Hardin')]
[('actor', 'http://www.wikidata.org/entity/Q2924850'), ('actorName', 'Brian Baumgartner')]
[('actor', 'http://www.wikidata.org

I want to check which TV series has the largest cast between ***The Office (US) (wd:Q23831)*** and ***"How I met your mother" (wd:Q147235)***.

In [31]:
queryString = """
SELECT ?tv_series ?name ?numSeasons ?numEpisodes (COUNT(DISTINCT ?actor) AS ?numActors) WHERE { 
    
    VALUES ?tv_series { wd:Q147235 wd:Q23831 }

    # Retrieve actors
    ?tv_series wdt:P161 ?actor ;
                wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons .
        
    ?tv_series sc:name ?name.
}
GROUP BY ?tv_series ?name ?numSeasons  ?numEpisodes 
"""

print("Results")
x = run_query(queryString)

Results
[('tv_series', 'http://www.wikidata.org/entity/Q147235'), ('name', 'How I Met Your Mother'), ('numSeasons', '9'), ('numEpisodes', '208'), ('numActors', '480')]
[('tv_series', 'http://www.wikidata.org/entity/Q23831'), ('name', 'The Office'), ('numSeasons', '9'), ('numEpisodes', '201'), ('numActors', '25')]
2


In [32]:
## more literals associated to the same element
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    seasons = i[2][1]
    episodes = i[3][1]
    actors = i[4][1]
    for val in [seasons,episodes,actors]:
        obj = {}
        obj["refers_to"] = f_uri
        obj["refers_to_name"] = f_name
        obj["check"] = "value"
        obj["value"]= val
        objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"6", evaluation.TYPE_REFERRED ,"value", objs, elements_per_tuple = 4)

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated



Both ***"How I met your mother" (wd:Q147235)*** and ***The Office (US) (wd:Q23831)*** have nine season but HIMYM has 208 episodes while "The Office" has 201 episodes.

In ***"How I met your mother" (wd:Q147235)*** there are 480 actors, while in ***The Office (US) (wd:Q23831)*** only 25. 

Moreover, it is not possibile to determine who are the most present actors in ***The Office (US) (wd:Q23831)***, since the episodes of this TV series do not have the property ***cast member (wdt:P161)***.

Finally, I discovered that there are no common actors between ***"How I met your mother" (wd:Q147235)*** and ***The Office (US) (wd:Q23831)***.

### Task 7 : Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

The Bacon number of an actor is the number of degrees of separation he or she has from Kevin Bacon. So, first af all, I need to retrieve Kevin Bacon.

To do this, I can use a ***REGEX*** on the surname connect through the property ***family name (wdt:P734)*** discovered in a previous query.

In [35]:
# find Kevin Bacon
queryString = """
SELECT DISTINCT ?person ?personName ?personSurname WHERE { 

    # Retrieve surname of a person using the property family name
    ?person wdt:P734 ?surname .
    
    # This returns the labels
    ?person <http://schema.org/name> ?personName .
    ?surname <http://schema.org/name> ?personSurname .

    # Since Kevin Bacon is an actor, he probably acted in a film.
    FILTER EXISTS{
        ?film   wdt:P31   wd:Q11424 ;
                wdt:P161  ?person   .             
    }
    
    # I use a regex to search for a surname that contains the word "Bacon"
    FILTER(REGEX(?personSurname, "Bacon"))
    
}
LIMIT 10
"""

print("Results")
x = run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q3116093'), ('personName', 'Irving Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3454165'), ('personName', 'Kevin Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3102228'), ('personName', 'Georges Baconnet'), ('personSurname', 'Baconnet')]
[('person', 'http://www.wikidata.org/entity/Q503597'), ('personName', 'James Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3992438'), ('personName', 'Tom Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3491343'), ('personName', 'Sosie Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q706678'), ('personName', 'Lloyd Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q65116263'), ('personName', 'Marco Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q5216474'), ('personNa

I have ***Kevin Bacon (wd:Q3454165)***. Now I can retrieve all the cast members of ***"How I met your mother" (wd:Q147235)*** with Kevin Bacon number equal to 2.

First, I start with cast members that have a Kevin Bacon Number equal to 1.

In [36]:
# find actors with Kevin Bacon Number equal to 1
queryString = """
SELECT ?actor ?actorName ?film ?filmName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and Kevin Bacon worked together
    ?film wdt:P161 ?actor      ;
          wdt:P161 wd:Q3454165 .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
}
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q234715'), ('actorName', 'Jamie-Lynn Sigler'), ('film', 'http://www.wikidata.org/entity/Q16167570'), ('filmName', 'Skum Rocks!')]
[('actor', 'http://www.wikidata.org/entity/Q433355'), ('actorName', 'Patricia Belcher'), ('film', 'http://www.wikidata.org/entity/Q514348'), ('filmName', 'Flatliners')]
[('actor', 'http://www.wikidata.org/entity/Q5357354'), ('actorName', 'Todd Stashwick'), ('film', 'http://www.wikidata.org/entity/Q370893'), ('filmName', 'The Air I Breathe')]
[('actor', 'http://www.wikidata.org/entity/Q199929'), ('actorName', 'Jennifer Morrison'), ('film', 'http://www.wikidata.org/entity/Q17508638'), ('filmName', 'The Darkness')]
[('actor', 'http://www.wikidata.org/entity/Q199929'), ('actorName', 'Jennifer Morrison'), ('film', 'http://www.wikidata.org/entity/Q975358'), ('filmName', 'Stir of Echoes')]
[('actor', 'http://www.wikidata.org/entity/Q329744'), ('actorName', 'Martin Short'), ('film', 'http://www.wikidata.org/entity/Q

Now I want only cast members of ***"How I met your mother" (wd:Q147235)*** that have a Kevin Bacon Number equal to 2.

In [37]:
# find actors with Kevin Bacon Number equal to 2
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
LIMIT 30
"""

print("Results")
x = run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q44442'), ('actorName', 'Dan Castellaneta')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('actorName', 'Will Forte')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('actorName', 'Judy Greer')]
[('actor', 'http://www.wikidata.org/entity/Q4993338'), ('actorName', 'Ezra Buzzington')]
[('actor', 'http://www.wikidata.org/entity/Q236766'), ('actorName', 'Maggie Wheeler')]
[('actor', 'http://www.wikidata.org/entity/Q231006'), ('actorName', 'Jayma Mays')]
[('actor', 'http://www.wikidata.org/entity/Q229349'), ('actorName', 'Heather Morris')]
[('actor', 'http://www.wikidata.org/entity/Q171567'), ('actorName', 'Laura Prepon')]
[('actor', 'http://www.wikidata.org/entity/Q220536'), ('actorName', 'Kal Penn')]
[('actor', 'http://www.wikidata.org/entity/Q471018'), ('actorName', 'Ernie Hudson')]
[('actor', 'http://www.wikidata.org/entity/Q594265'), ('actorName', 'Neil Jackson')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), (

In [33]:
queryString = """
SELECT COUNT(DISTINCT ?actor) WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
        #FILTER(?film3 != ?film)
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
x = run_query(queryString)


Results
[('callret-0', '464')]
1


There are 464 actors who partecipated to ***"How I met your mother" (wd:Q147235)*** with Kevin Bacon Number equals to 2.

In [35]:
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
        #FILTER(?film3 != ?film)
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
x = run_query(queryString)


Results
[('actor', 'http://www.wikidata.org/entity/Q44442'), ('actorName', 'Dan Castellaneta')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('actorName', 'Will Forte')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('actorName', 'Judy Greer')]
[('actor', 'http://www.wikidata.org/entity/Q4993338'), ('actorName', 'Ezra Buzzington')]
[('actor', 'http://www.wikidata.org/entity/Q236766'), ('actorName', 'Maggie Wheeler')]
[('actor', 'http://www.wikidata.org/entity/Q231006'), ('actorName', 'Jayma Mays')]
[('actor', 'http://www.wikidata.org/entity/Q229349'), ('actorName', 'Heather Morris')]
[('actor', 'http://www.wikidata.org/entity/Q171567'), ('actorName', 'Laura Prepon')]
[('actor', 'http://www.wikidata.org/entity/Q220536'), ('actorName', 'Kal Penn')]
[('actor', 'http://www.wikidata.org/entity/Q471018'), ('actorName', 'Ernie Hudson')]
[('actor', 'http://www.wikidata.org/entity/Q594265'), ('actorName', 'Neil Jackson')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), (

In [36]:
obj = [{"uri":r[0][1],"name":r[1][1]} for r in x]
evaluation.add_result(evaluation.get_index_workflow(pt),"7", evaluation.TYPE_SET ,"uri", obj)

The index of this workflow is: 2_4
The path is /locale/data/jupyter/prando/esw/tracks/2022/ground_truths/gt_json/workflow2_4.json
JSON object updated
