# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-8f65f028f0-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series explorative search") 

## Workflow 4


Consider the following exploratory scenario:


> we are interested in the TV series "How I met your mother" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q147235` | How I met your mother        | node |
| `wd:Q23831` | The Office (US)        | node |



Also consider

```
wd:Q23831 ?p ?obj .
```

is the BGP to retrieve all **properties of The Office (US)**

The workload should


1. Return the number of seasons and episodes per season of the tv series

2. Get the number of episodes in which the cast members played a role. Who are the most present actors?

3. Check who is the actor who acted in more films while working on "How I met your mother" and who is the actor who participated in more films after the end of the tv series.

4. Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members.

5. Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

#### Example of query

In [2]:
queryString = """
SELECT * WHERE { 
    wd:Q23831 ?p ?obj .
}
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), ('obj', 'http://wikiba.se/ontology#Item')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P214'), ('obj', 'http://viaf.org/viaf/207954525')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P244'), ('obj', 'https://id.loc.gov/authorities/names/no2006017037')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P2581'), ('obj', 'http://babelnet.org/rdf/s02352929n')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P646'), ('obj', 'http://g.co/kg/m/08jgk1')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('obj', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('obj', 'tv/the_office')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('obj', '199')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q21188110')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q459435')]


10

### My Workflow

### Task 1 : Return the number of seasons and episodes per season of the tv series.

I'm interested on the TV series ***"How I met your mother" (wd:Q147235)***, so as a starting point I show all the data properties of this TV series.

In [3]:
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM to something
    wd:Q147235 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Only data properties
    FILTER(isLiteral(?o))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('pName', 'AlloCiné series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pName', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pName', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1874'), ('pName', 'Netflix ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1922'), ('pName', 'first line')]
[('p', 'http://www.wikidata.org/prop/direct/P2002'), ('pName', 'Twitter username')]
[('p', 'http://www.wikidata.org/prop/direct/P2047'), ('pName', 'duration')]
[('p', 'http://www.wikidata.org/prop/direct/P214'), ('pName', 'VIAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('pName', 'GND ID')]
[('p', 'http://www.wikidata.org/

55

I discovered the two properties: ***number of episodes (wdt:P1113)*** and ***number of seasons (wdt:P2437)***. 

I try to use them on ***"How I met your mother" (wd:Q147235)***.

In [4]:
queryString = """
SELECT ?numEpisodes ?numSeasons WHERE { 

    # Retrieve HIMYM numEpisodes and numSeasons
    wd:Q147235  wdt:P1113 ?numEpisodes ;
                wdt:P2437 ?numSeasons  .
}
"""

print("Results")
run_query(queryString)

Results
[('numEpisodes', '208'), ('numSeasons', '9')]


1

Now I have to discover how many episodes there are for each season. To do this I show all the object properties of ***"How I met your mother" (wd:Q147235)***.

In [5]:
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM to something
    wd:Q147235 ?p  ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .

    # Exclude data properties
    FILTER(!isLiteral(?o))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2438'), ('pName', 'narrator')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pName', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('pName', 'nominated for')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('pName', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pName', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P162'), ('pName', 'producer')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pName', 'creator')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pName', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P1811'), ('pName', 'list of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1881'), ('p

33

I want to see what is connected to ***"How I met your mother" (wd:Q147235)*** through the property ***list of episodes (wdt:P1811)***.

In [6]:
queryString = """
SELECT DISTINCT ?list ?listName WHERE { 

    # Connecting HIMYM to something using property list of episodes
    wd:Q147235 wdt:P1811  ?list.

    # This returns the labels
    ?list <http://schema.org/name> ?listName .
}
"""

print("Results")
run_query(queryString)

Results
[('list', 'http://www.wikidata.org/entity/Q785891'), ('listName', 'list of How I Met Your Mother episodes')]


1

I try to see what is connected to ***list of How I Met Your Mother episodes (wd:Q785891)***.

In [7]:
queryString = """
SELECT DISTINCT ?p ?pName ?o ?oName WHERE { 

    # Connecting list of HIMYM episodes to something
    wd:Q785891 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    OPTIONAL{ ?o <http://schema.org/name> ?oName .}
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q13406463'), ('oName', 'Wikimedia list article')]
[('p', 'http://www.wikidata.org/prop/direct/P360'), ('pName', 'is a list of'), ('o', 'http://www.wikidata.org/entity/Q21191270'), ('oName', 'television series episode')]
[('p', 'http://www.wikidata.org/prop/direct/P1754'), ('pName', 'category related to list'), ('o', 'http://www.wikidata.org/entity/Q8526738'), ('oName', 'Category:How I Met Your Mother episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P3921'), ('pName', 'Wikidata SPARQL query equivalent'), ('o', '?st ps:P179 wd:Q147235. ?item wdt:P31 wd:Q21191270; p:P179 ?st OPTIONAL{?st pq:P1545 ?value}')]


4

There are no useful information to retrieve the number of episodes of each season. 

I try to use an other property discovered before: ***has part (wdt:P527)***.

In [8]:
queryString = """
SELECT DISTINCT ?part ?partName WHERE { 

    # Connecting HIMYM to something using property hasPart
    wd:Q147235 wdt:P527 ?part .

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q2438066'), ('partName', 'How I Met Your Mother, season 6')]
[('part', 'http://www.wikidata.org/entity/Q2715578'), ('partName', 'How I Met Your Mother, season 1')]
[('part', 'http://www.wikidata.org/entity/Q13567027'), ('partName', 'How I Met Your Mother, season 9')]
[('part', 'http://www.wikidata.org/entity/Q2567330'), ('partName', 'How I Met Your Mother, season 4')]
[('part', 'http://www.wikidata.org/entity/Q338715'), ('partName', 'How I Met Your Mother, season 8')]
[('part', 'http://www.wikidata.org/entity/Q582332'), ('partName', 'How I Met Your Mother, season 5')]
[('part', 'http://www.wikidata.org/entity/Q3468515'), ('partName', 'How I Met Your Mother, season 2')]
[('part', 'http://www.wikidata.org/entity/Q2555117'), ('partName', 'How I Met Your Mother, season 3')]
[('part', 'http://www.wikidata.org/entity/Q2472427'), ('partName', 'How I Met Your Mother, season 7')]


9

It is possible to retrieve all the different seasons of the TV series. 
I show the properties of one of them ( ***How I Met Your Mother, season 1 (wd:Q2715578)*** )

In [9]:
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connecting HIMYM S1 to something
    wd:Q2715578 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pName', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pName', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pName', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pName', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('pName', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pName', 'TV.com ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pName', 'EIDR content ID')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pName', 'Open Media Database film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pName', 'original language of film or TV show')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pName'

19

I have the property ***number of episodes (wdt:P1113)***. I try to use it on ***How I Met Your Mother, season 1 (wd:Q2715578)***.

In [10]:
queryString = """
SELECT ?numEpisodes WHERE { 

    # Retrieve number of episoed of HIMYM S1
    wd:Q2715578 wdt:P1113 ?numEpisodes.
}
"""

print("Results")
run_query(queryString)

Results
[('numEpisodes', '22')]


1

I can finally retrieve the number of episodes per season of  ***"How I met your mother" (wd:Q147235)***.

In [11]:
queryString = """
SELECT DISTINCT ?part ?partName ?numEpisodes WHERE { 

    # Retrieve HIMYM seasons
    wd:Q147235 wdt:P527 ?part .
    
    # Retrieve number of episodes of each HIMYM season
    ?part wdt:P1113 ?numEpisodes.

    # This returns the labels
    ?part <http://schema.org/name> ?partName .
}
"""

print("Results")
run_query(queryString)

Results
[('part', 'http://www.wikidata.org/entity/Q2438066'), ('partName', 'How I Met Your Mother, season 6'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q2715578'), ('partName', 'How I Met Your Mother, season 1'), ('numEpisodes', '22')]
[('part', 'http://www.wikidata.org/entity/Q13567027'), ('partName', 'How I Met Your Mother, season 9'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q2567330'), ('partName', 'How I Met Your Mother, season 4'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q338715'), ('partName', 'How I Met Your Mother, season 8'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q582332'), ('partName', 'How I Met Your Mother, season 5'), ('numEpisodes', '24')]
[('part', 'http://www.wikidata.org/entity/Q3468515'), ('partName', 'How I Met Your Mother, season 2'), ('numEpisodes', '22')]
[('part', 'http://www.wikidata.org/entity/Q2555117'), ('partName', 'How I Met Your Mother, season 3'), ('numEpi

9

##### END TASK 1:
I can finally answer to the initial question. ***"How I met your mother" (wd:Q147235)*** has 9 seasons and there are 20, 22 or 24 episodes for each season.

### Task 2 : Get the number of episodes in which the cast members played a role. Who are the most present actors?

From a previous query, I can notice that each season has also the property ***has part (wdt:P527)***.

I want to see what is connected to ***How I Met Your Mother, season 1 (wd:Q2715578)*** through this property.

In [12]:
queryString = """
SELECT ?episode ?episodeName WHERE { 

    # Retrieve episodes of HIMYM S1
    wd:Q2715578 wdt:P527 ?episode .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
"""

print("Results")
run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q11696021'), ('episodeName', 'Nothing Good Happens After 2 A.M.')]
[('episode', 'http://www.wikidata.org/entity/Q1327587'), ('episodeName', 'Okay Awesome')]
[('episode', 'http://www.wikidata.org/entity/Q3480575'), ('episodeName', 'Return of the Shirt')]
[('episode', 'http://www.wikidata.org/entity/Q467447'), ('episodeName', 'Pilot')]
[('episode', 'http://www.wikidata.org/entity/Q468587'), ('episodeName', 'Purple Giraffe')]
[('episode', 'http://www.wikidata.org/entity/Q471448'), ('episodeName', 'Sweet Taste of Liberty')]
[('episode', 'http://www.wikidata.org/entity/Q4809956'), ('episodeName', 'Game Night')]
[('episode', 'http://www.wikidata.org/entity/Q4817584'), ('episodeName', 'Drumroll, Please')]
[('episode', 'http://www.wikidata.org/entity/Q4818636'), ('episodeName', 'Life Among the Gorillas')]
[('episode', 'http://www.wikidata.org/entity/Q4818989'), ('episodeName', 'Cupcake')]
[('episode', 'http://www.wikidata.org/entity/Q4819005

22

Using the property ***has part (wdt:P527)*** on a single season, I can retrieve all the episodes of that season.

I have to discover the cast members, so I try to list all the properties of a specific episode: ***The Pineapple Incident (wd:Q7757165)***

In [13]:
queryString = """
SELECT DISTINCT ?p ?pName ?o ?oName WHERE { 

    # Connecting The Pineapple Incident to something
    wd:Q7757165 ?p ?o.

    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q200566'), ('oName', 'Cobie Smulders')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q485310'), ('oName', 'Neil Patrick Harris')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q223455'), ('oName', 'Josh Radnor')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q199927'), ('oName', 'Alyson Hannigan')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q202304'), ('oName', 'Jason Segel')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member'), ('o', 'http://www.wikidata.org/entity/Q333544'), ('oName', 'Bob Saget')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName',

16

I discovered that the property ***cast member (wdt:P161)*** can be used to retrieve all the actors that partecipated in a specific episode.

Now I can count the number of episodes in which the cast members played a role, and show the most present actors.

In [14]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve HIMYM episodes 
    wd:Q147235 wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorName', 'Josh Radnor'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('numEpisodes', '145')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorName', 'Alyson Hannigan'), ('numEpisodes', '143')]
[('actor', 'http://www.wikidata.org/entity/Q333544'), ('actorName', 'Bob Saget'), ('numEpisodes', '142')]
[('actor', 'http://www.wikidata.org/entity/Q297128'), ('actorName', 'David Henrie'), ('numEpisodes', '48')]
[('actor', 'http://www.wikidata.org/entity/Q229914'), ('actorName', 'Lyndsy Fonseca'), ('numEpisodes', '48')]
[('actor', 'http://www.wikidata.org/entity/Q16149506'), ('actorName', 'Charlene Amoia'), ('numEpisodes', '17')]
[

10

##### END TASK 2 
With this final query I was able to get the number of episodes in which the cast members acted. 

Moreover, the most present actors are ***Cobie Smulders, Neil Patrick Harris, Josh Radnor*** and ***Jason Segel***.

### Task 3 :  Check who is the actor who acted in more films while working on "How I met your mother" and who is the actor who participated in more films after the end of the tv series.

To check if an actor acted in a film while working on ***"How I met your mother" (wd:Q147235)***, I need to know when HIMYM started and when it ended.

I can rely on these two properties discovered in a previous query: 
* ***start time (wdt:P580)*** 
* ***end time (wdt:P582)***

In [15]:
queryString = """
SELECT ?startTime ?endTime WHERE { 

    # Retrieving HIMYM startTime and endTime
    wd:Q147235  wdt:P580  ?startTime ;
                wdt:P582  ?endTime   . 
}
"""

print("Results")
run_query(queryString)

Results
[('startTime', '2005-09-19T00:00:00Z'), ('endTime', '2014-03-31T00:00:00Z')]


1

Hence, I need to check if an actor worked in a film between ***"2005-09-19"*** and ***"2014-03-31"***. To do this, I have to understand how actors and films are connected.

First, I try to retrieve all the object propertis of a specific actor : ***Cobie Smulders (wd:Q200566)***.

In [16]:
queryString = """
SELECT ?p ?pName ?o ?oName WHERE { 

    # Connecting Cobie Smulders to something
    wd:Q200566  ?p  ?o .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pName', 'country of citizenship'), ('o', 'http://www.wikidata.org/entity/Q16'), ('oName', 'Canada')]
[('p', 'http://www.wikidata.org/prop/direct/P734'), ('pName', 'family name'), ('o', 'http://www.wikidata.org/entity/Q2018583'), ('oName', 'Smulders')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q15208593'), ('oName', 'Francisca')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q6119607'), ('oName', 'Jacoba')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pName', 'given name'), ('o', 'http://www.wikidata.org/entity/Q325872'), ('oName', 'Maria')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q5'), ('oName', 'human')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pName', 'languages spoken, written or signed'

15

Maybe there are connection in the opposite direction: ***?s ?p wd:Q200566***.

In [17]:
queryString = """
SELECT ?s ?sName ?p ?pName WHERE { 

    # Connecting something to Cobie Smulders
    ?s ?p wd:Q200566 .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?s <http://schema.org/name> ?sName .
}
ORDER BY ?pName
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q65070140'), ('sName', 'Stumptown'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q51963292'), ('sName', 'Marvel Cinematic Universe Phase One'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q63405798'), ('sName', 'The Infinity Saga'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q147235'), ('sName', 'How I Met Your Mother'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q5264968'), ('sName', 'Desperation Day'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wikidata.org/entity/Q5521981'), ('sName', 'Garbage Island'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('s', 'http://www.wi

20

I can use the property ***cast member (wdt:P161)*** as before, but I need to understand how to select only films and not other TV series.

I use the property ***instanceOf (wdt:P31)*** on ***Avengers: Infinity War (wd:Q23780914)***.

In [18]:
queryString = """
SELECT ?class ?className WHERE { 

    # Retrieve class of Avengers Infinity War
    wd:Q23780914 wdt:P31 ?class .
    
    # This returns the labels
    ?class <http://schema.org/name> ?className .
}
"""

print("Results")
run_query(queryString)

Results
[('class', 'http://www.wikidata.org/entity/Q11424'), ('className', 'film')]


1

I retrieved ***film (wd:Q11424)***. Now, I can retrieve only the films of a specific actor : ***Cobie Smulders (wd:Q200566)***.

In [19]:
queryString = """
SELECT ?film ?filmName WHERE { 

    # Retrieve films in which Cobie Smulders acted
    ?film  wdt:P161 wd:Q200566 ;
           wdt:P31  wd:Q11424  .
    
    # This returns the labels
    ?film <http://schema.org/name> ?filmName .
}
ORDER BY ?filmName
"""

print("Results")
run_query(queryString)

Results
[('film', 'http://www.wikidata.org/entity/Q14171368'), ('filmName', 'Avengers: Age of Ultron')]
[('film', 'http://www.wikidata.org/entity/Q23781155'), ('filmName', 'Avengers: Endgame')]
[('film', 'http://www.wikidata.org/entity/Q23780914'), ('filmName', 'Avengers: Infinity War')]
[('film', 'http://www.wikidata.org/entity/Q1765358'), ('filmName', 'Captain America: The Winter Soldier')]
[('film', 'http://www.wikidata.org/entity/Q7729669'), ('filmName', 'Delivery Man')]
[('film', 'http://www.wikidata.org/entity/Q3012583'), ('filmName', 'Grassroots')]
[('film', 'http://www.wikidata.org/entity/Q21168538'), ('filmName', 'Jack Reacher: Never Go Back')]
[('film', 'http://www.wikidata.org/entity/Q27663881'), ('filmName', 'Killing Gunther')]
[('film', 'http://www.wikidata.org/entity/Q27888468'), ('filmName', 'Lennon or McCartney')]
[('film', 'http://www.wikidata.org/entity/Q18703883'), ('filmName', 'Results')]
[('film', 'http://www.wikidata.org/entity/Q1767513'), ('filmName', 'Safe Haven

19

I have to retrieve the publication date of a film. I check if there is a property of ***Avengers: Infinity War (wd:Q23780914)*** that contains the word "date".

In [20]:
queryString = """
SELECT DISTINCT ?p ?pName WHERE { 

    # Connect something to Avengers Infinity War
    wd:Q23780914 ?p ?o.
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    
    # I use a regex to search a property that contains the word "date"
    FILTER(REGEX(?pName, "date"))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pName', 'publication date')]


1

So to retrieve the publication date of a ***film (wd:Q11424)***, I can use the property ***publication date (wdt:P577)***.

Now I can finally answer to the initial question: who is the actor who acted in more films while working on ***"How I met your mother" (wd:Q147235)***.

In [21]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?film) AS ?numFilms WHERE { 

    # Retrieve all the HIMYM cast members
    wd:Q147235 wdt:P161 ?actor .
    
    # Retrieve films in which actor of HIMYM acted
    ?film  wdt:P31   wd:Q11424         ;
           wdt:P161  ?actor            ;
           wdt:P577  ?publicationDate  .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    
    # I want only films that were published while the actor was working on "How I met your mother".
    FILTER (?publicationDate > "2005-09-19"^^xsd:date AND ?publicationDate < "2014-03-31"^^xsd:date )
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numFilms)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('actorName', 'Danny Glover'), ('numFilms', '40')]
[('actor', 'http://www.wikidata.org/entity/Q229669'), ('actorName', 'Malin Åkerman'), ('numFilms', '21')]
[('actor', 'http://www.wikidata.org/entity/Q469579'), ('actorName', 'Mircea Monroe'), ('numFilms', '19')]
[('actor', 'http://www.wikidata.org/entity/Q1319539'), ('actorName', 'Thomas Lennon'), ('numFilms', '18')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('actorName', 'Judy Greer'), ('numFilms', '18')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('actorName', 'Will Forte'), ('numFilms', '17')]
[('actor', 'http://www.wikidata.org/entity/Q566037'), ('actorName', 'Scoot McNairy'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q1189470'), ('actorName', 'Jimmi Simpson'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q530646'), ('actorName', 'Ray Wise'), ('numFilms', '16')]
[('actor', 'http://www.wikidata.org/entity/Q716

10

Who is the actor who participated in more films after the end of the tv series ?

I can use the previous query, but I consider only films published after ***"2014-03-31"*** ( the end date of ***"How I met your mother" (wd:Q147235)*** )

In [22]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?film) AS ?numFilms WHERE { 

    # Retrieve all the HIMYM cast members
    wd:Q147235 wdt:P161 ?actor .
    
    # Retrieve films in which actor of HIMYM acted
    ?film  wdt:P31   wd:Q11424         ;
           wdt:P161  ?actor            ;
           wdt:P577  ?publicationDate  .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    
    # I want only films that were published after the end of "How I met your mother".
    FILTER (?publicationDate > "2014-03-31"^^xsd:date )
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numFilms)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('actorName', 'Danny Glover'), ('numFilms', '27')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('actorName', 'Judy Greer'), ('numFilms', '20')]
[('actor', 'http://www.wikidata.org/entity/Q6382703'), ('actorName', 'Keegan-Michael Key'), ('numFilms', '17')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('actorName', 'Will Forte'), ('numFilms', '15')]
[('actor', 'http://www.wikidata.org/entity/Q362616'), ('actorName', 'Jon Bernthal'), ('numFilms', '15')]
[('actor', 'http://www.wikidata.org/entity/Q23547'), ('actorName', 'Bryan Cranston'), ('numFilms', '14')]
[('actor', 'http://www.wikidata.org/entity/Q566037'), ('actorName', 'Scoot McNairy'), ('numFilms', '13')]
[('actor', 'http://www.wikidata.org/entity/Q311271'), ('actorName', 'John Lithgow'), ('numFilms', '12')]
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('numFilms', '11')]
[('actor', 'http://www.wikidata.org/ent

10

##### END TASK 3
The actor who acted in more films while working on "How I met your mother" is ***Danny Glover (wd:Q192165)*** with 40 films.

The actor who participated in more films after the end of the tv series is again ***Danny Glover (wd:Q192165)*** with 27 films.

### Task 4 : Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members.

To compare the number of seasons and episodes I can rely on the same query used in Task 1.

In [23]:
queryString = """
SELECT ?numEpisodesHIMYM ?numEpisodesTheOffice ?numSeasonsHIMYM ?numSeasonsTheOffice WHERE { 

    # Retrieve HIMYM numEpisodes and numSeasons
    wd:Q147235  wdt:P1113 ?numEpisodesHIMYM ;
                wdt:P2437 ?numSeasonsHIMYM  .
    
    # Retrieve The Office numEpisodes and numSeasons
    wd:Q23831   wdt:P1113 ?numEpisodesTheOffice ;
                wdt:P2437 ?numSeasonsTheOffice  . 
}
"""

print("Results")
run_query(queryString)

Results
[('numEpisodesHIMYM', '208'), ('numEpisodesTheOffice', '201'), ('numSeasonsHIMYM', '9'), ('numSeasonsTheOffice', '9')]


1

I can also retrieve who are the actors who partecipated in the highest number of episodes in ***The Office (US) (wd:Q23831)*** using the same query used in Task 2.

In [24]:
queryString = """
SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodes WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
GROUP BY ?actor ?actorName
ORDER BY DESC(?numEpisodes)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q349548'), ('actorName', 'Rainn Wilson'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q72077'), ('actorName', 'Ellie Kemper'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q238877'), ('actorName', 'Jenna Fischer'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q926912'), ('actorName', 'Craig Robinson'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q328790'), ('actorName', 'Ed Helms'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q254766'), ('actorName', 'Catherine Tate'), ('numEpisodes', '1')]
[('actor', 'http://www.wikidata.org/entity/Q313039'), ('actorName', 'John Krasinski'), ('numEpisodes', '1')]


7

The results are quite strange, because I have only one episode for each actor. 

I want to check if there is any problem with the first part of the query.

In [25]:
queryString = """
SELECT ?episode ?episodeName WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P527{2} ?episode .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q50379836'), ('episodeName', 'Classy Christmas (part 1)')]
[('episode', 'http://www.wikidata.org/entity/Q50379837'), ('episodeName', 'Classy Christmas (part 2)')]
[('episode', 'http://www.wikidata.org/entity/Q5099551'), ('episodeName', 'China')]
[('episode', 'http://www.wikidata.org/entity/Q5178024'), ('episodeName', 'Couples Discount')]
[('episode', 'http://www.wikidata.org/entity/Q6927074'), ('episodeName', 'Moving On')]
[('episode', 'http://www.wikidata.org/entity/Q7880294'), ('episodeName', 'Ultimatum')]
[('episode', 'http://www.wikidata.org/entity/Q7914333'), ('episodeName', 'Vandalism')]
[('episode', 'http://www.wikidata.org/entity/Q4838397'), ('episodeName', 'Baby Shower')]
[('episode', 'http://www.wikidata.org/entity/Q5001718'), ('episodeName', 'Business Ethics')]
[('episode', 'http://www.wikidata.org/entity/Q5185169'), ('episodeName', 'Crime Aid')]


10

The episodes are retrieved correctly.

Maybe there is a problem with the property ***cast member (wdt:P161)***. I show all the properties of a single episode: ***Ultimatum (wd:Q7880294)***

In [26]:
queryString = """
SELECT ?p ?pName ?o ?oName WHERE { 

    # Connecting Ultimatum to something
    wd:Q7880294  ?p  ?o .
    
    # This returns the labels
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
ORDER BY ?pName
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pName', 'country of origin'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of America')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pName', 'director'), ('o', 'http://www.wikidata.org/entity/Q5239168'), ('oName', 'David Rogers')]
[('p', 'http://www.wikidata.org/prop/direct/P750'), ('pName', 'distributed by'), ('o', 'http://www.wikidata.org/entity/Q5371838'), ('oName', 'Vudu')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pName', 'distribution format'), ('o', 'http://www.wikidata.org/entity/Q723685'), ('oName', 'video on demand')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pName', 'followed by'), ('o', 'http://www.wikidata.org/entity/Q7763280'), ('oName', 'The Seminar')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows'), ('o', 'http://www.wikidata.org/entity/Q5128465'), ('oName', 'Classy Christmas')]
[('p', 'http://www.wikidata.org/prop/direct/P31')

12

Ok so the episodes of ***The Office (US) (wd:Q23831)*** do not have the property ***cast member (wdt:P161)*** as for HIMYM episodes.

The only way to retrieve the actors of the TV series, is using ***cast member (wdt:P161)*** directly on the TV Series ( ***The Office (US) (wd:Q23831)*** ).

In [27]:
queryString = """
SELECT DISTINCT ?actor ?actorName WHERE { 

    # Retrieve The Office episodes 
    wd:Q23831 wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q1050211'), ('actorName', 'Leslie David Baker')]
[('actor', 'http://www.wikidata.org/entity/Q1139248'), ('actorName', 'Oscar Nunez')]
[('actor', 'http://www.wikidata.org/entity/Q216221'), ('actorName', 'Steve Carell')]
[('actor', 'http://www.wikidata.org/entity/Q2238008'), ('actorName', 'Creed Bratton')]
[('actor', 'http://www.wikidata.org/entity/Q231203'), ('actorName', 'Amy Ryan')]
[('actor', 'http://www.wikidata.org/entity/Q238877'), ('actorName', 'Jenna Fischer')]
[('actor', 'http://www.wikidata.org/entity/Q254766'), ('actorName', 'Catherine Tate')]
[('actor', 'http://www.wikidata.org/entity/Q2669971'), ('actorName', 'Angela Kinsey')]
[('actor', 'http://www.wikidata.org/entity/Q2671438'), ('actorName', 'Paul Lieberstein')]
[('actor', 'http://www.wikidata.org/entity/Q269901'), ('actorName', 'Melora Hardin')]
[('actor', 'http://www.wikidata.org/entity/Q2924850'), ('actorName', 'Brian Baumgartner')]
[('actor', 'http://www.wikidata.org

25

I want to check which TV series has the largest cast between ***The Office (US) (wd:Q23831)*** and ***"How I met your mother" (wd:Q147235)***.

In [28]:
queryString = """
SELECT (COUNT(DISTINCT ?actorHIMYM) AS ?numActorHIMYM) (COUNT(DISTINCT ?actorTheOffice) AS ?numActorTheOffice) WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235  wdt:P161 ?actorHIMYM .
    
    # Retrieve The Office actors
    wd:Q23831   wdt:P161 ?actorTheOffice . 
}
"""

print("Results")
run_query(queryString)

Results
[('numActorHIMYM', '480'), ('numActorTheOffice', '25')]


1

I can also check if there are actors who partecipated both in ***The Office (US) (wd:Q23831)*** and ***"How I met your mother" (wd:Q147235)***.

In [29]:
queryString = """
SELECT ?actor WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235  wdt:P161 ?actor .
    
    # Retrieve The Office actors
    wd:Q23831   wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
run_query(queryString)

Results
Empty


0

The are no common actors between the two TV series.

##### END TASK 4:
Both ***"How I met your mother" (wd:Q147235)*** and ***The Office (US) (wd:Q23831)*** have nine season but HIMYM has 208 episodes while "The Office" has 201 episodes.

In ***"How I met your mother" (wd:Q147235)*** there are 480 actors, while in ***The Office (US) (wd:Q23831)*** only 25. 

Moreover, it is not possibile to determine who are the most present actors in ***The Office (US) (wd:Q23831)***, since the episodes of this TV series do not have the property ***cast member (wdt:P161)***.

Finally, I discovered that there are no common actors between ***"How I met your mother" (wd:Q147235)*** and ***The Office (US) (wd:Q23831)***.

### Task 5 : Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

The Bacon number of an actor is the number of degrees of separation he or she has from Kevin Bacon. So, first af all, I need to retrieve Kevin Bacon.

To do this, I can use a ***REGEX*** on the surname connect through the property ***family name (wdt:P734)*** discovered in a previous query.

In [30]:
queryString = """
SELECT DISTINCT ?person ?personName ?personSurname WHERE { 

    # Retrieve surname of a person using the property family name
    ?person wdt:P734 ?surname .
    
    # This returns the labels
    ?person <http://schema.org/name> ?personName .
    ?surname <http://schema.org/name> ?personSurname .

    # Since Kevin Bacon is an actor, he probably acted in a film.
    FILTER EXISTS{
        ?film   wdt:P31   wd:Q11424 ;
                wdt:P161  ?person   .             
    }
    
    # I use a regex to search for a surname that contains the word "Bacon"
    FILTER(REGEX(?personSurname, "Bacon"))
    
}
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('person', 'http://www.wikidata.org/entity/Q3102228'), ('personName', 'Georges Baconnet'), ('personSurname', 'Baconnet')]
[('person', 'http://www.wikidata.org/entity/Q3116093'), ('personName', 'Irving Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q503597'), ('personName', 'James Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3454165'), ('personName', 'Kevin Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3491343'), ('personName', 'Sosie Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q3992438'), ('personName', 'Tom Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q706678'), ('personName', 'Lloyd Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q65116263'), ('personName', 'Marco Bacon'), ('personSurname', 'Bacon')]
[('person', 'http://www.wikidata.org/entity/Q5216474'), ('personNa

15

I have ***Kevin Bacon (wd:Q3454165)***. Now I can retrieve all the cast members of ***"How I met your mother" (wd:Q147235)*** with Kevin Bacon number equal to 2.

First, I start with cast members that have a Kevin Bacon Number equal to 1.

In [31]:
queryString = """
SELECT ?actor ?actorName ?film ?filmName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and Kevin Bacon worked together
    ?film wdt:P161 ?actor      ;
          wdt:P161 wd:Q3454165 .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
}
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q269891'), ('actorName', 'Julianna Guill'), ('film', 'http://www.wikidata.org/entity/Q519490'), ('filmName', 'Crazy, Stupid, Love.')]
[('actor', 'http://www.wikidata.org/entity/Q312705'), ('actorName', 'John Cho'), ('film', 'http://www.wikidata.org/entity/Q370893'), ('filmName', 'The Air I Breathe')]
[('actor', 'http://www.wikidata.org/entity/Q234715'), ('actorName', 'Jamie-Lynn Sigler'), ('film', 'http://www.wikidata.org/entity/Q16167570'), ('filmName', 'Skum Rocks!')]
[('actor', 'http://www.wikidata.org/entity/Q311271'), ('actorName', 'John Lithgow'), ('film', 'http://www.wikidata.org/entity/Q627533'), ('filmName', 'Footloose')]
[('actor', 'http://www.wikidata.org/entity/Q234137'), ('actorName', 'Megan Mullally'), ('film', 'http://www.wikidata.org/entity/Q1476932'), ('filmName', 'Queens Logic')]
[('actor', 'http://www.wikidata.org/entity/Q530646'), ('actorName', 'Ray Wise'), ('film', 'http://www.wikidata.org/entity/Q223596'), ('filmN

17

Now I want only cast members of ***"How I met your mother" (wd:Q147235)*** that have a Kevin Bacon Number equal to 2.

In [32]:
queryString = """
SELECT DISTINCT ?actorName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('actorName', 'Mircea Monroe')]
[('actorName', 'Virginia Williams')]
[('actorName', 'Kate Micucci')]
[('actorName', 'Alex Trebek')]
[('actorName', 'Eva Amurri')]
[('actorName', 'Elizabeth Bogush')]
[('actorName', 'Peter Gallagher')]
[('actorName', 'Jayma Mays')]
[('actorName', 'George Cheung')]
[('actorName', 'John Getz')]
[('actorName', 'Darcy Rose Byrnes')]
[('actorName', 'Will Sasso')]
[('actorName', 'Scoot McNairy')]
[('actorName', 'Alyssa Shafer')]
[('actorName', 'Laura Prepon')]
[('actorName', 'Kal Penn')]
[('actorName', 'Jason Lewis')]
[('actorName', 'Charlene Amoia')]
[('actorName', 'Judy Greer')]
[('actorName', 'America Olivo')]
[('actorName', 'Michael York')]
[('actorName', 'Valerie Azlynn')]
[('actorName', 'Joe Manganiello')]
[('actorName', 'Chi McBride')]
[('actorName', 'Ed Brigadier')]
[('actorName', 'Anna Camp')]
[('actorName', 'Anne Dudek')]
[('actorName', 'Seth Green')]
[('actorName', 'Lyndsy Fonseca')]
[('actorName', 'Nate Torrence')]


30

In [33]:
queryString = """
SELECT COUNT(DISTINCT ?actor) WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
}
"""

print("Results")
run_query(queryString)


Results
[('callret-0', '464')]


1

There are 464 actors who partecipated to ***"How I met your mother" (wd:Q147235)*** with Kevin Bacon Number equals to 2.

I can also show how a cast member of ***"How I met your mother" (wd:Q147235)***  with Kevin Bacon Number equal to 2 is connected to ***Kevin Bacon (wd:Q3454165)***.

In [34]:
queryString = """
SELECT DISTINCT ?actorName ?filmMiddleName ?actorMiddleName ?filmName WHERE { 

    # Retrieve HIMYM actors
    wd:Q147235 wdt:P161 ?actor .
    
    # Ensure that the actor and worked together with another actor "in the middle"
    ?filmMiddle wdt:P161 ?actor       ;
                wdt:P161 ?actorMiddle .
    
    # Ensure that the actor "in the middle" worked with Kevin Bacon
    ?film       wdt:P161 ?actorMiddle  ;
                wdt:P161 wd:Q3454165 .
    
    # Ensure that the "first" actor and Kevin Bacon did not worked together
    FILTER NOT EXISTS{
        ?film3 wdt:P161 ?actor      ;
               wdt:P161 wd:Q3454165 .
    }
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    ?film <http://schema.org/name> ?filmName .
    ?actorMiddle <http://schema.org/name> ?actorMiddleName .
    ?filmMiddle <http://schema.org/name> ?filmMiddleName .
}
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('actorName', 'John Getz'), ('filmMiddleName', 'Halt and Catch Fire'), ('actorMiddleName', "Mark O'Brien"), ('filmName', 'City on a Hill')]


1

For example, in this case
* ***John Getz*** acted in HIMYM
* ***John Getz*** worked with ***Mark O'Brien*** in ***Halt and Catch Fire***
* ***Mark O'Brien*** worked with ***Kevin Bacon*** in ***City on a Hill***

So ***John Getz*** has a Kevin Bacon Number equal to 2.

##### END TASK 5:
I discovered that there are 464 out of 480 cast members of ***"How I met your mother" (wd:Q147235)*** with Kevin Bacon Number equals to 2.

### Extra analytics query

#### Which are the episodes of ***"How I met your mother" (wd:Q147235)*** with the highest number of cast members ?


In [35]:
queryString = """
SELECT ?episode ?episodeName COUNT(DISTINCT ?actor) AS ?numActors WHERE { 

    # Retrieve HIMYM episodes 
    wd:Q147235 wdt:P527{2} ?episode .
    
    # Retrieve cast members
    ?episode wdt:P161 ?actor .
    
    # This returns the labels
    ?episode <http://schema.org/name> ?episodeName .
}
GROUP BY ?episode ?episodeName
ORDER BY DESC(?numActors)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('episode', 'http://www.wikidata.org/entity/Q6494765'), ('episodeName', 'Last Words'), ('numActors', '14')]
[('episode', 'http://www.wikidata.org/entity/Q7776702'), ('episodeName', 'The Yips'), ('numActors', '14')]
[('episode', 'http://www.wikidata.org/entity/Q16745474'), ('episodeName', 'Noretta'), ('numActors', '14')]
[('episode', 'http://www.wikidata.org/entity/Q5453909'), ('episodeName', 'First Time in New York'), ('numActors', '13')]
[('episode', 'http://www.wikidata.org/entity/Q5888848'), ('episodeName', 'Home Wreckers'), ('numActors', '13')]
[('episode', 'http://www.wikidata.org/entity/Q5917914'), ('episodeName', 'How Lily Stole Christmas'), ('numActors', '12')]
[('episode', 'http://www.wikidata.org/entity/Q7416049'), ('episodeName', 'Sandcastles in the Sand'), ('numActors', '12')]
[('episode', 'http://www.wikidata.org/entity/Q7733793'), ('episodeName', 'The Fight'), ('numActors', '12')]
[('episode', 'http://www.wikidata.org/entity/Q7753043'), ('episodeName', 'The Naked

10

#### On average, how many actors partecipate in a single episode of ***"How I met your mother" (wd:Q147235)*** ?

In [36]:
queryString = """
SELECT AVG(?numActors) AS ?avgNumActors WHERE {

    {   # Count how many actors there are in each episode
        SELECT ?episode ?episodeName COUNT(DISTINCT ?actor) AS ?numActors WHERE { 

            # Retrieve HIMYM episodes 
            wd:Q147235 wdt:P527{2} ?episode .

            # Retrieve cast members
            ?episode wdt:P161 ?actor .

            # This returns the labels
            ?episode <http://schema.org/name> ?episodeName .
        }
         GROUP BY ?episode ?episodeName
    }
}  

"""

print("Results")
run_query(queryString)

Results
[('avgNumActors', '9.116438356164384')]


1

#### How many awards ***"How I met your mother"*** won ?

From one of the first queries I obtained the properties: ***nominated for (wdt:P1411)*** and ***award received (wdt:P166)***.

I can retrieve the number of nominations and awards received by ***"How I met your mother" (wd:Q147235)***.

In [37]:
queryString = """
SELECT COUNT(DISTINCT ?nomination) AS ?numNominations COUNT(DISTINCT ?award) AS ?numAwards WHERE { 

    # Retrieve HIMYM episodes 
    wd:Q147235  wdt:P1411 ?nomination ;
                wdt:P166  ?award      .
}

"""

print("Results")
run_query(queryString)

Results
[('numNominations', '10'), ('numAwards', '7')]


1

#### Show the ***"How I met your mother"*** actors that are both in the first and last episodes of the TV series. Show also the percentage rate of numbers of episodes in which they are present.

I can use two properties retrieved in a previous query : ***followed by (wdt:P156)*** and ***follows (wdt:P155)*** to determine if an episode is the last one or the first one.

In [38]:
queryString = """
SELECT DISTINCT ?actor ?actorName ((?numEpisodesByActor * 100 / ?numEpisodes) AS ?partecipationRate) WHERE { 

    # Retrieve total number of episodes of HIMYM
    wd:Q147235  wdt:P1113 ?numEpisodes .
    
    # Retrieve episodes of HIMYM first season
    wd:Q2715578 wdt:P527 ?firstEpisode .
    
    # Retrieve episodes of HIMYM last season
    wd:Q13567027 wdt:P527 ?lastEpisode .
    
    # Retrieve cast members
    ?firstEpisode wdt:P161 ?actor .
    ?lastEpisode wdt:P161 ?actor .
    
    # This returns the labels
    ?actor <http://schema.org/name> ?actorName .
    
    # Retrieve last episode
    FILTER NOT EXISTS{ ?lastEpisode  wdt:P156 ?y .}
    
    #Retrieve first episode
    FILTER NOT EXISTS{ ?firstEpisode wdt:P155 ?x .}
    
    # Count in how many episodes the actor partecipated
    {   SELECT ?actor ?actorName COUNT(DISTINCT ?episode) AS ?numEpisodesByActor WHERE { 

            # Retrieve HIMYM episodes 
            wd:Q147235 wdt:P527{2} ?episode .

            # Retrieve cast members
            ?episode wdt:P161 ?actor .

            # This returns the labels
            ?actor <http://schema.org/name> ?actorName .
        }
        GROUP BY ?actor ?actorName
    }
}
ORDER BY DESC(?partecipationRate)
"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('partecipationRate', '69.711538461538462')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorName', 'Josh Radnor'), ('partecipationRate', '69.711538461538462')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('partecipationRate', '69.711538461538462')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('partecipationRate', '69.711538461538462')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorName', 'Alyson Hannigan'), ('partecipationRate', '68.75')]
[('actor', 'http://www.wikidata.org/entity/Q297128'), ('actorName', 'David Henrie'), ('partecipationRate', '23.076923076923077')]
[('actor', 'http://www.wikidata.org/entity/Q229914'), ('actorName', 'Lyndsy Fonseca'), ('partecipationRate', '23.076923076923077')]


7

## END