# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-3cebe828f0-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series explorative search") 

## Workflow 4


Consider the following exploratory scenario:


> we are interested in the TV series "How I met your mother" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q147235` | How I met your mother        | node |
| `wd:Q23831` | The Office (US)        | node |



Also consider

```
wd:Q23831 ?p ?obj .
```

is the BGP to retrieve all **properties of The Office (US)**

The workload should


1. Return the number of seasons and episodes per season of the tv series

2. Get the number of episodes in which the cast members played a role. Who are the most present actors?

3. Check who is the actor who acted in more films while working on "How I met your mother" and who is the actor who participated in more films after the end of the tv series.

4. Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members.

5. Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

In [1]:
# start your workflow here

In [2]:
queryString = """
SELECT *
WHERE { 

wd:Q23831 ?p ?obj .

} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), ('obj', 'http://wikiba.se/ontology#Item')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P214'), ('obj', 'http://viaf.org/viaf/207954525')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P244'), ('obj', 'https://id.loc.gov/authorities/names/no2006017037')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P2581'), ('obj', 'http://babelnet.org/rdf/s02352929n')]
[('p', 'http://www.wikidata.org/prop/direct-normalized/P646'), ('obj', 'http://g.co/kg/m/08jgk1')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('obj', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('obj', 'tv/the_office')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('obj', '199')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q21188110')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('obj', 'http://www.wikidata.org/entity/Q459435')]
[('p', 'http://www.wikid

128

## Seasons and Episodes
This section contains the queries that answer the first question: 
_Return the number of seasons and episodes per season of the tv series_

### Number of seasons and episodes
First of all I need to find how such information is stored, therefore I run a query to retrieve all The Office (US) properties. I can print them all since I discovered from the query above that they are 128. I also print the number of objects connected to the node The Office (US) (wd:Q23831) with each property just to understand the property's significance.

In [3]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q23831 ?p ?o.
?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('objects', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P674'), ('pname', 'characters'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P272'), ('pname', 'production company'), ('objects', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pname', 'image'), ('objects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre'), ('objects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P750'), ('pname', 'distributed by'), ('objects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('objects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pname', 'AllMovie title ID'), ('objects', '1')]
[('p', 'http://www.wikidata

71

I retrieve some interesting properties that will also be useful for later.

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P161`   | cast member   | predicate |
| `wdt:P57`     | director   | predicate |
| `wdt:P170`     | creator   | predicate |
| `wdt:P674`    | characters    | predicate | 
| `wdt:P1811`    | list of episodes      | predicate |
| `wdt:P1113`    | number of episodes      | predicate |
| `wdt:P2437`    | number of seasons      | predicate |
| `wdt:P582`    | end time     | predicate |
| `wdt:P580`    | start time     | predicate |

Before moving on I also check the other direction.

In [4]:
queryString = """
SELECT ?p ?pname COUNT(?s) AS ?subjects
WHERE { 

?s ?p wd:Q23831.
?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('subjects', '226')]
[('p', 'http://www.wikidata.org/prop/direct/P1441'), ('pname', 'present in work'), ('subjects', '40')]
[('p', 'http://www.wikidata.org/prop/direct/P971'), ('pname', 'category combines topics'), ('subjects', '11')]
[('p', 'http://www.wikidata.org/prop/direct/P800'), ('pname', 'notable work'), ('subjects', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('pname', 'part of'), ('subjects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P4969'), ('pname', 'derivative work'), ('subjects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic"), ('subjects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pname', 'different from'), ('subjects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P144'), ('pname', 'based on'), ('subjects', '1')]


9

I did not find any new property that can be useful, I already had "derivative work" (wdt:P4969). 

I now run the same query for How I Met Your Mother in order to check if I have the same properties.

In [5]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q147235 ?p ?o.
?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('objects', '480')]
[('p', 'http://www.wikidata.org/prop/direct/P674'), ('pname', 'characters'), ('objects', '109')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('pname', 'nominated for'), ('objects', '10')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pname', 'award received'), ('objects', '7')]
[('p', 'http://www.wikidata.org/prop/direct/P750'), ('pname', 'distributed by'), ('objects', '4')]
[('p', 'http://www.wikidata.org/prop/direct/P4073'), ('pname', 'Fandom wiki ID'), ('objects', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre'), ('objects', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P4969'), ('pname', 'derivative work'), ('objects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('objects', '2')]
[('p', 'http://www.wiki

88

In this case we have data about the screenwriters but it looks like there is no property director (wdt:P57) for How I Met Your Mother. Let us check it just to be sure.

In [6]:
queryString = """
SELECT ?o
WHERE { 

wd:Q147235 wdt:P57 ?o.

} 
"""

print("Results")
run_query(queryString)

Results
Empty


0

We will further investigate it later on if I need information about HIMYM's director. Going back to the question I am trying to answer I can use the properties "number of episodes" (wdt:P1113) and "number of seasons" (wdt:P2437) to retrieve such information.

In [7]:
queryString = """
SELECT ?himymEpisodes ?himymSeasons ?offEpisodes ?offSeasons 
WHERE { 

wd:Q147235 wdt:P1113 ?himymEpisodes;
           wdt:P2437 ?himymSeasons.

wd:Q23831 wdt:P1113 ?offEpisodes;
           wdt:P2437 ?offSeasons.

} 
"""

print("Results")
run_query(queryString)

Results
[('himymEpisodes', '208'), ('himymSeasons', '9'), ('offEpisodes', '201'), ('offSeasons', '9')]


1

Surprisingly, both series have the same number of seasons and almost the same number of episodes. 

### Number of episodes for each season
I now look for how many episodes each season has. Going back to the result of the query for all properties I notice that the property "part of" (wdt:P527) has exactly 9 objects both for HIMYM and The Office (US), therefore I select them in order to see if such property display the seasons. Since it is just a check I look only for the objects whose subject is The Office (US).

In [8]:
queryString = """
SELECT ?o ?obj ?i ?instanceOf
WHERE { 

wd:Q23831 wdt:P527 ?o.
?o wdt:P31 ?i.

?o <http://schema.org/name> ?obj.
?i <http://schema.org/name> ?instanceOf.

} 
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q3468906'), ('obj', 'The Office, season 4'), ('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season')]
[('o', 'http://www.wikidata.org/entity/Q3730261'), ('obj', 'The Office, season 6'), ('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season')]
[('o', 'http://www.wikidata.org/entity/Q3468601'), ('obj', 'The Office, season 2'), ('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season')]
[('o', 'http://www.wikidata.org/entity/Q3730253'), ('obj', 'The Office, season 9'), ('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season')]
[('o', 'http://www.wikidata.org/entity/Q3465812'), ('obj', 'The Office, season 1'), ('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season')]
[('o', 'http://www.wikidata.org/entity/Q3730255'), ('obj', 'The Office, season 8'), ('i', 'http://www.

9

My assumption was right: property "part of" (wdt:P527) retrieves the series' seasons. I now look for each season's properties.

In [9]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q23831 wdt:P527 ?season.
?season ?p ?o.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pname', 'Metacritic ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pname', 'number of episodes'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pname', 'TV.com ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pname', 'EIDR content ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pname', 'ČSFD film I

17

In order to retrieve the number of episodes each season has I can use again the property "number of episodes" (wdt:P1113). Another interesting thing is that by means of the property "part of" (wdt:P527) I can also access each episode. Before moving on I check if I get the same results for HIMYM.

In [10]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q147235 wdt:P527 ?season.
?season ?p ?o.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('objects', '208')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pname', 'distribution format'), ('objects', '16')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pname', 'Metacritic ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pname', 'Open Media Database film ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P582'), ('pname', 'end time'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pname', 'number of episodes'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', '

21

I have the same results for HIMYM. Now I look for properties in the other direction (both for The Office(US)(wd:Q23831) and HIMYM (wd:Q147235))

In [11]:
queryString = """
SELECT ?p ?pname COUNT(?s) AS ?subjects
WHERE { 

wd:Q23831 wdt:P527 ?season.
?s ?p ?season.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P4908'), ('pname', 'season'), ('subjects', '217')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('subjects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic"), ('subjects', '1')]


3

In [12]:
queryString = """
SELECT ?p ?pname COUNT(?s) AS ?subjects
WHERE { 

wd:Q147235 wdt:P527 ?season.
?s ?p ?season.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P4908'), ('pname', 'season'), ('subjects', '211')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('subjects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('pname', 'part of'), ('subjects', '3')]


3

I did not find any other interesting property. I now return the number of episodes per season as asked.

In [13]:
queryString = """
SELECT ?season ?episodes
WHERE { 

?series wdt:P527 ?s.
?s wdt:P1113 ?episodes.

?s <http://schema.org/name> ?season.
FILTER(?series = wd:Q147235 || ?series = wd:Q23831)

} 
ORDER BY ?season
"""

print("Results")
run_query(queryString)

Results
[('season', 'How I Met Your Mother, season 1'), ('episodes', '22')]
[('season', 'How I Met Your Mother, season 9'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 7'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 6'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 8'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 4'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 2'), ('episodes', '22')]
[('season', 'How I Met Your Mother, season 5'), ('episodes', '24')]
[('season', 'How I Met Your Mother, season 3'), ('episodes', '20')]
[('season', 'The Office, season 1'), ('episodes', '6')]
[('season', 'The Office, season 2'), ('episodes', '22')]
[('season', 'The Office, season 3'), ('episodes', '25')]
[('season', 'The Office, season 4'), ('episodes', '19')]
[('season', 'The Office, season 5'), ('episodes', '28')]
[('season', 'The Office, season 6'), ('episodes', '26')]
[('season', 'The Office, season 7'), ('

18

From these results I can notice that even though both series have the same number of seasons and almost the same number of episodes, the number of episodes per season is very different. In fact, HIMYM have 20-24 episodes per season while the number of episodes per season in The Office (US) varies a lot. We have in fact the first season with 6 episodes and the fifth one with 28 episodes. 

### Single Episodes
Just out of curiosity I look for the episodes and their properties in order to see if there could be something useful. First of all I check what property "part of" (wdt:P527) stores.

In [14]:
queryString = """
SELECT DISTINCT ?e ?episode ?i ?instanceOf
WHERE { 

?series wdt:P527 ?s.
?s wdt:P527 ?e.
?e wdt:P31 ?i.

?e <http://schema.org/name> ?episode.
?i <http://schema.org/name> ?instanceOf.
FILTER(?series = wd:Q147235 || ?series = wd:Q23831)

} 
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('e', 'http://www.wikidata.org/entity/Q50379836'), ('episode', 'Classy Christmas (part 1)'), ('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode')]
[('e', 'http://www.wikidata.org/entity/Q50379837'), ('episode', 'Classy Christmas (part 2)'), ('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode')]
[('e', 'http://www.wikidata.org/entity/Q5099551'), ('episode', 'China'), ('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode')]
[('e', 'http://www.wikidata.org/entity/Q5178024'), ('episode', 'Couples Discount'), ('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode')]
[('e', 'http://www.wikidata.org/entity/Q6927074'), ('episode', 'Moving On'), ('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode')]
[('e', 'http://www.wikidata.org/entity/Q7880294'), ('episode', 'Ultimatum'), ('i', 'http:/

15

The query confirms such path return the single episodes. I now look for the properties each episode have.

In [15]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

?series wdt:P527 ?s.
?s wdt:P527 ?e.
?e ?p ?o.

?p <http://schema.org/name> ?pname.

FILTER(?series = wd:Q147235 || ?series = wd:Q23831)

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('objects', '1338')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter'), ('objects', '529')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '409')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('objects', '409')]
[('p', 'http://www.wikidata.org/prop/direct/P4908'), ('pname', 'season'), ('objects', '409')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('objects', '409')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pname', 'title'), ('objects', '409')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pname', 'follows'), ('objects', '408')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pname', 'followed by'), ('objects', '407')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pname', 'TV.com ID'), ('objects', '4

49

These properties could be useful later on:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P161`   | cast member   | predicate |
| `wdt:P57`     | director  | predicate |
| `wdt:P58`    | screenwriter    | predicate | 
| `wdt:P4908`    | season    | predicate | 

## Cast Members
This section contains the queries that answer the second question:
_Get the number of episodes in which the cast members played a role. Who are the most present actors?_

### Fast-forward approach
I recall I found out that each episode have a property "cast member" (wdt:P161) therefore I could use that to count in how many episodes a cast member played a role. Firstly I check what information such property stores.

_Note: instead of using two different path to retrieve the season from the series and then the episodes from the season I use a forward path._

In [16]:
queryString = """
SELECT DISTINCT ?c ?cname ?i ?instanceOf
WHERE { 

?series wdt:P527/wdt:P527 ?e.
?e wdt:P161 ?c.
?c wdt:P31 ?i.

?c <http://schema.org/name> ?cname.
?i <http://schema.org/name> ?instanceOf.

FILTER(?series = wd:Q147235 || ?series = wd:Q23831)

} 
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q465556'), ('cname', 'Annie Ilonzeh'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q269891'), ('cname', 'Julianna Guill'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q446031'), ('cname', 'Nikki Griffin'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q312705'), ('cname', 'John Cho'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q516659'), ('cname', 'Virginia Williams'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q522856'), ('cname', 'Kate Micucci'), ('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human')]
[('c', 'http://www.wikidata.org/entity/Q200566'), ('cname', 'Cobie Smulders'), ('i', 'http://www.wikidata.org/e

25

The property "instance Of" gave me zero information about the objects therefore I try to use the property "profession" (wdt:P106).

In [17]:
queryString = """
SELECT DISTINCT ?c ?cname ?p ?profession
WHERE { 

?series wdt:P527/wdt:P527/wdt:P161 ?c.
?c wdt:P106 ?p.

?c <http://schema.org/name> ?cname.
?p <http://schema.org/name> ?profession.

FILTER(?series = wd:Q147235 || ?series = wd:Q23831)

} 
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q311271'), ('cname', 'John Lithgow'), ('p', 'http://www.wikidata.org/entity/Q9648008'), ('profession', 'banjoist')]
[('c', 'http://www.wikidata.org/entity/Q328790'), ('cname', 'Ed Helms'), ('p', 'http://www.wikidata.org/entity/Q9648008'), ('profession', 'banjoist')]
[('c', 'http://www.wikidata.org/entity/Q3306283'), ('cname', 'Dwight Hicks'), ('p', 'http://www.wikidata.org/entity/Q19841381'), ('profession', 'Canadian football player')]
[('c', 'http://www.wikidata.org/entity/Q465556'), ('cname', 'Annie Ilonzeh'), ('p', 'http://www.wikidata.org/entity/Q10798782'), ('profession', 'television actor')]
[('c', 'http://www.wikidata.org/entity/Q269891'), ('cname', 'Julianna Guill'), ('p', 'http://www.wikidata.org/entity/Q10798782'), ('profession', 'television actor')]
[('c', 'http://www.wikidata.org/entity/Q446031'), ('cname', 'Nikki Griffin'), ('p', 'http://www.wikidata.org/entity/Q10798782'), ('profession', 'television actor')]
[('c', 'http://ww

25

The property "cast member" (wdt:P161) returns the information I am looking for, therefore I will use it to return the number of episodes in which each cast member played a role. In addition, I discovered not all cast members are television actor (wd:Q10798782), I will use this information later on.

I now return the 10 most present actors for HIMYM (wd:Q147235) and The Office(US)(wd:Q23831).

In [18]:
queryString = """
SELECT DISTINCT ?cname COUNT(DISTINCT ?e) AS ?episodes
WHERE { 

wd:Q147235 wdt:P527/wdt:P527 ?e.
?e wdt:P161 ?c.

?c <http://schema.org/name> ?cname.

} 
GROUP BY ?cname
ORDER BY DESC(?episodes)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('cname', 'Josh Radnor'), ('episodes', '145')]
[('cname', 'Jason Segel'), ('episodes', '145')]
[('cname', 'Cobie Smulders'), ('episodes', '145')]
[('cname', 'Neil Patrick Harris'), ('episodes', '145')]
[('cname', 'Alyson Hannigan'), ('episodes', '143')]
[('cname', 'Bob Saget'), ('episodes', '142')]
[('cname', 'Lyndsy Fonseca'), ('episodes', '48')]
[('cname', 'David Henrie'), ('episodes', '48')]
[('cname', 'Charlene Amoia'), ('episodes', '17')]
[('cname', 'Jennifer Morrison'), ('episodes', '12')]


10

In [19]:
queryString = """
SELECT DISTINCT ?cname COUNT(DISTINCT ?e) AS ?episodes
WHERE { 

wd:Q23831 wdt:P527/wdt:P527 ?e.
?e wdt:P161 ?c.

?c <http://schema.org/name> ?cname.

} 
GROUP BY ?cname
ORDER BY DESC(?episodes)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('cname', 'Ellie Kemper'), ('episodes', '1')]
[('cname', 'Craig Robinson'), ('episodes', '1')]
[('cname', 'Catherine Tate'), ('episodes', '1')]
[('cname', 'Jenna Fischer'), ('episodes', '1')]
[('cname', 'Rainn Wilson'), ('episodes', '1')]
[('cname', 'Ed Helms'), ('episodes', '1')]
[('cname', 'John Krasinski'), ('episodes', '1')]


7

These results highlighted some crucial aspects. Firstly, these results are difficult to read because I am displaying only the actor name. For this reason, they are not very understandable for someone who does not know the series very well hence he or she does not know who is playing who. The solution could be displaying both the actor and the role name. 

As far as HIMYM is concerned, results seem correct since the most present actors played a role in 145 episodes out of 208. On the other hand, the results for The Office (US) are clearly incorrect. Therefore I will take a step back and look more closely on the information about the single episodes. The above results showed that data are stored in a different way for the two tv series therefore from now on I will look for information about the two series separetely.

### Retrieving the number of episodes each cast member appear on (The Office(US))

I recall there are 208 episodes for HIMYM and 201 for The Office(US). I will check again the seasons' properties and the episodes' properties for the two tv series. Since results seemed correct for HIMYM I look for such properties for The Office(US).

In [20]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q23831 wdt:P527 ?s.
?s ?p ?o.

?p <http://schema.org/name> ?pname.



} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pname', 'Metacritic ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pname', 'number of episodes'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pname', 'TV.com ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pname', 'EIDR content ID'), ('objects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pname', 'ČSFD film I

17

Cast Member information is not stored in a season's property therefore I look more closely to the episodes' properties.

In [21]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q23831 wdt:P527/wdt:P527 ?e.
?e ?p ?o.

?p <http://schema.org/name> ?pname.



} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pname', 'country of origin'), ('objects', '203')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pname', 'follows'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P2364'), ('pname', 'production code'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P4908'), ('pname', 'season'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pname', 'title'), ('objects', '201')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pname', 'followed by'), ('objects', '200')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter'), ('o

41

As expected, diffrently from HIMYM, the cast member property has only 7 objects therefore I cannot access information about the number of episodes each cast member played a role in from such property. I take a further step back and look at the property "cast member" (wdt:P161) and "characters" (wdt:P674) of the node The Office (US).

In [22]:
queryString = """
SELECT ?c ?cname
WHERE { 

wd:Q23831 wdt:P674 ?c.

?c <http://schema.org/name> ?cname.

} 
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q1967229'), ('cname', 'Ryan Howard')]
[('c', 'http://www.wikidata.org/entity/Q2013030'), ('cname', 'Jim Halpert')]
[('c', 'http://www.wikidata.org/entity/Q2027359'), ('cname', 'Pam Halpert')]
[('c', 'http://www.wikidata.org/entity/Q2346771'), ('cname', 'Michael Scott')]
[('c', 'http://www.wikidata.org/entity/Q4169030'), ('cname', 'Dwight Schrute')]
[('c', 'http://www.wikidata.org/entity/Q4760429'), ('cname', 'Andy Bernard')]
[('c', 'http://www.wikidata.org/entity/Q6149368'), ('cname', 'Jan Levinson')]
[('c', 'http://www.wikidata.org/entity/Q7342694'), ('cname', 'Robert California')]


8

I look for all properties characters are objects to.

In [23]:
queryString = """
SELECT ?p ?pname COUNT(?s) AS ?subjects
WHERE { 

wd:Q23831 wdt:P674 ?c.
?s ?p ?c.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P674'), ('pname', 'characters'), ('subjects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse'), ('subjects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P451'), ('pname', 'unmarried partner'), ('subjects', '2')]
[('p', 'http://www.wikidata.org/prop/direct/P25'), ('pname', 'mother'), ('subjects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P22'), ('pname', 'father'), ('subjects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1038'), ('pname', 'relative'), ('subjects', '1')]


6

I found no interesting properties therefore I look in the other direction.

In [24]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects
WHERE { 

wd:Q23831 wdt:P674 ?c.
?c ?p ?o.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pname', 'given name'), ('objects', '11')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '10')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P1441'), ('pname', 'present in work'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pname', 'sex or gender'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('pname', 'Freebase ID'), ('objects', '8')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('objects', '5')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('pname', 'child'), ('objects', '4')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse'), ('objects', '4')]
[('p', 'http://www.wikidata.org/prop/direct/P5

27

Again, no information about the presence in an episode is provided by such properties.

I now look at the cast members.

In [25]:
queryString = """
SELECT ?c ?cname 
WHERE { 

wd:Q23831 wdt:P161 ?c.

?c <http://schema.org/name> ?cname.



} 
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q1050211'), ('cname', 'Leslie David Baker')]
[('c', 'http://www.wikidata.org/entity/Q1139248'), ('cname', 'Oscar Nunez')]
[('c', 'http://www.wikidata.org/entity/Q216221'), ('cname', 'Steve Carell')]
[('c', 'http://www.wikidata.org/entity/Q2238008'), ('cname', 'Creed Bratton')]
[('c', 'http://www.wikidata.org/entity/Q231203'), ('cname', 'Amy Ryan')]
[('c', 'http://www.wikidata.org/entity/Q238877'), ('cname', 'Jenna Fischer')]
[('c', 'http://www.wikidata.org/entity/Q254766'), ('cname', 'Catherine Tate')]
[('c', 'http://www.wikidata.org/entity/Q2669971'), ('cname', 'Angela Kinsey')]
[('c', 'http://www.wikidata.org/entity/Q2671438'), ('cname', 'Paul Lieberstein')]
[('c', 'http://www.wikidata.org/entity/Q269901'), ('cname', 'Melora Hardin')]
[('c', 'http://www.wikidata.org/entity/Q2924850'), ('cname', 'Brian Baumgartner')]
[('c', 'http://www.wikidata.org/entity/Q296928'), ('cname', 'James Spader')]
[('c', 'http://www.wikidata.org/entity/Q302820

25

I retrieve all properties a cast member is object to.

In [26]:
queryString = """
SELECT ?p ?pname COUNT(?s) AS ?subjects
WHERE { 

wd:Q23831 wdt:P161 ?c.

?s ?p ?c.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('subjects', '526')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter'), ('subjects', '67')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer'), ('subjects', '48')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director'), ('subjects', '42')]
[('p', 'http://www.wikidata.org/prop/direct/P725'), ('pname', 'voice actor'), ('subjects', '35')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pname', 'winner'), ('subjects', '28')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('subjects', '13')]
[('p', 'http://www.wikidata.org/prop/direct/P162'), ('pname', 'producer'), ('subjects', '11')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse'), ('subjects', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic"), ('subjects', '6')]
[('p', 'http://www.wikidat

26

I take a closer look at the "cast member" property again, let us see which are the instance it is connected to.

In [27]:
queryString = """
SELECT ?s ?sname COUNT(DISTINCT ?c) AS ?members
WHERE { 

wd:Q23831 wdt:P161 ?c.

?s wdt:P161 ?c;
   <http://schema.org/name> ?sname.

} 
GROUP BY ?s ?sname
ORDER BY DESC(?members)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q23831'), ('sname', 'The Office'), ('members', '25')]
[('s', 'http://www.wikidata.org/entity/Q4760324'), ('sname', "Andy's Ancestry"), ('members', '7')]
[('s', 'http://www.wikidata.org/entity/Q1139794'), ('sname', 'The 40-Year-Old Virgin'), ('members', '4')]
[('s', 'http://www.wikidata.org/entity/Q368674'), ('sname', 'License to Wed'), ('members', '4')]
[('s', 'http://www.wikidata.org/entity/Q222800'), ('sname', 'Knocked Up'), ('members', '3')]
[('s', 'http://www.wikidata.org/entity/Q2060077'), ('sname', 'Walk Hard: The Dewey Cox Story'), ('members', '3')]
[('s', 'http://www.wikidata.org/entity/Q844059'), ('sname', 'New Girl'), ('members', '3')]
[('s', 'http://www.wikidata.org/entity/Q2457512'), ('sname', 'The Goods: Live Hard, Sell Hard'), ('members', '3')]
[('s', 'http://www.wikidata.org/entity/Q476726'), ('sname', 'Night at the Museum: Battle of the Smithsonian'), ('members', '3')]
[('s', 'http://www.wikidata.org/entity/Q117396'), ('sna

10

"cast member" properties is linked to the other productions the actors played a role. I cannot find the information about the episodes in which each actor appeared in The Office (US). 

By running this query I discovered 3 cast members of The Office (US) acted in New Girl, therefore I return them just for feed my curiosity and see who they are.

In [28]:
queryString = """
SELECT ?c ?cname 
WHERE { 

wd:Q23831 wdt:P161 ?c.

wd:Q844059 wdt:P161 ?c.
   
?c <http://schema.org/name> ?cname.

} 
"""

print("Results")
run_query(queryString)



Results
[('c', 'http://www.wikidata.org/entity/Q2669971'), ('cname', 'Angela Kinsey')]
[('c', 'http://www.wikidata.org/entity/Q3028200'), ('cname', 'Kate Flannery')]
[('c', 'http://www.wikidata.org/entity/Q512818'), ('cname', 'Clark Duke')]


3

### Retrieving the role information
I look for information about the character each cast member played. I start from The Office(US) and I look for any property that could link a character and a cast member in both direction.

In [29]:
queryString = """
SELECT ?p ?pname COUNT(?actor) AS ?objects
WHERE { 

wd:Q23831 wdt:P674 ?character;
          wdt:P161 ?actor.
        
?character ?p ?actor.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('objects', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer'), ('objects', '8')]


2

In [30]:
queryString = """
SELECT ?p ?pname COUNT(?character) AS ?objects
WHERE { 

wd:Q23831 wdt:P674 ?character;
          wdt:P161 ?actor.
        
?actor ?p ?character.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
"""

print("Results")
run_query(queryString)

Results
Empty


0

I discovered there is a property called "performer" (wdt:P175) that link a character with the actor that plays such role. Therefore I return all characters and who performs them (if data is available).

In [31]:
queryString = """
SELECT ?character ?performedBy 
WHERE { 

wd:Q23831 wdt:P674 ?c;
          wdt:P161 ?actor.
        
?c wdt:P175 ?actor.

?c <http://schema.org/name> ?character.
?actor <http://schema.org/name> ?performedBy.

} 
"""

print("Results")
run_query(queryString)

Results
[('character', 'Ryan Howard'), ('performedBy', 'B. J. Novak')]
[('character', 'Jim Halpert'), ('performedBy', 'John Krasinski')]
[('character', 'Pam Halpert'), ('performedBy', 'Jenna Fischer')]
[('character', 'Michael Scott'), ('performedBy', 'Steve Carell')]
[('character', 'Dwight Schrute'), ('performedBy', 'Rainn Wilson')]
[('character', 'Andy Bernard'), ('performedBy', 'Ed Helms')]
[('character', 'Jan Levinson'), ('performedBy', 'Melora Hardin')]
[('character', 'Robert California'), ('performedBy', 'James Spader')]


8

I check if such property exists also for HIMYM (Recall HIMYM have 480 cast members).

In [32]:
queryString = """
SELECT ?p ?pname COUNT(?actor) AS ?objects
WHERE { 

wd:Q147235 wdt:P674 ?character;
          wdt:P161 ?actor.
        
?character ?p ?actor.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('objects', '12')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer'), ('objects', '116')]


2

I can now return all characters and who performs them (if data is available).

In [33]:
queryString = """
SELECT ?character ?performedBy 
WHERE { 

wd:Q147235 wdt:P674 ?c;
          wdt:P161 ?actor.
        
?c wdt:P175 ?actor.

?c <http://schema.org/name> ?character.
?actor <http://schema.org/name> ?performedBy.

} 
"""

print("Results")
run_query(queryString)

Results
[('character', 'Robin Scherbatsky'), ('performedBy', 'Cobie Smulders')]
[('character', 'Don Frank'), ('performedBy', 'Benjamin Koldyke')]
[('character', 'Kevin Venkataraghavan'), ('performedBy', 'Kal Penn')]
[('character', 'Nick Podarutti'), ('performedBy', 'Michael Trucco')]
[('character', 'Genevieve Scherbatsky'), ('performedBy', 'Tracey Ullman')]
[('character', 'Katie Scherbatsky'), ('performedBy', 'Lucy Hale')]
[('character', 'Katie Scherbatsky'), ('performedBy', 'Pamela Darling')]
[('character', 'Max'), ('performedBy', 'Geoff Stults')]
[('character', 'Scooby Scooberman'), ('performedBy', 'Robbie Amell')]
[('character', 'Gael'), ('performedBy', 'Enrique Iglesias')]
[('character', 'Ted Mosby'), ('performedBy', 'Josh Radnor')]
[('character', 'Barney Stinson'), ('performedBy', 'Neil Patrick Harris')]
[('character', 'Doctor John Stangel'), ('performedBy', 'Neil Patrick Harris')]
[('character', 'Loretta Stinson'), ('performedBy', 'Frances Conroy')]
[('character', 'Carl MacLaren'

116

### Putting it all together
I recall we found no information about the number of episodes in which each cast member appears for The Office(US). However, such information is available for HIMYM, therefore I can return the the 15-most present actor and display information about the character they played if data is available.

In [34]:
queryString = """
SELECT DISTINCT ?actor ?character COUNT(DISTINCT ?e) AS ?episodes
WHERE { 

wd:Q147235 wdt:P527/wdt:P527 ?e.
?e wdt:P161 ?a.

?a <http://schema.org/name> ?actor.

OPTIONAL{ wd:Q147235 wdt:P674 ?c.
          ?c wdt:P175 ?a.
          ?c <http://schema.org/name> ?character. }

} 
GROUP BY ?actor ?character
ORDER BY DESC(?episodes)
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('actor', 'Cobie Smulders'), ('character', 'Lesbian Robin'), ('episodes', '145')]
[('actor', 'Josh Radnor'), ('character', 'Future Ted'), ('episodes', '145')]
[('actor', 'Josh Radnor'), ('character', 'Mexican Wrestler Ted'), ('episodes', '145')]
[('actor', 'Josh Radnor'), ('character', 'Ted Mosby'), ('episodes', '145')]
[('actor', 'Neil Patrick Harris'), ('character', 'Barney Stinson'), ('episodes', '145')]
[('actor', 'Jason Segel'), ('character', 'Moustache Marshall'), ('episodes', '145')]
[('actor', 'Cobie Smulders'), ('character', 'Robin Scherbatsky'), ('episodes', '145')]
[('actor', 'Jason Segel'), ('character', 'Marshall Eriksen'), ('episodes', '145')]
[('actor', 'Neil Patrick Harris'), ('character', 'Doctor John Stangel'), ('episodes', '145')]
[('actor', 'Alyson Hannigan'), ('character', 'Stripper Lily'), ('episodes', '143')]
[('actor', 'Alyson Hannigan'), ('character', 'Lily Aldrin'), ('episodes', '143')]
[('actor', 'Bob Saget'), ('episodes', '142')]
[('actor', 'David H

15

Now results are more readable, even though some actors played multiple characters. I can now assert the results I retrieved are correct because they resemble the main characters of the tv series.

## Actors working in other productions
This section contains the queries that answer the third question:
_Check who is the actor who acted in more films while working on "How I met your mother" and who is the actor who participated in more films after the end of the tv series._

In order to answer the third question I need to retrieve information about the period in which actors were working on "How I Met Your Mother". I recall I discovered the properties "end time" (wdt:P582) and "start time" (wdt:P580) when answering the first question. Therefore, I now check what type of information such properties contain.

In [35]:
queryString = """
SELECT ?startTime ?endTime
WHERE { 

wd:Q147235 wdt:P580 ?startTime;
           wdt:P582 ?endTime.

} 
"""

print("Results")
run_query(queryString)

Results
[('startTime', '2005-09-19T00:00:00Z'), ('endTime', '2014-03-31T00:00:00Z')]


1

It looks like such properties retrieve the information I need however some actors may have worked for a short period of time in HIMYM (e.g., in some seasons or only in some episodes) therefore it would be more useful to check if we have some time information on the single seasons or episodes since we know which actor played a role in each episode. I recall in some previous query I discovered episodes have the property "publication date" (wdt:P577) therefore I can use such property to retrieve the first and last appearance of an actor in the show.

In [36]:
queryString = """
SELECT ?actor MIN(?pubDate) AS ?firstApp MAX(?pubDate) AS ?lastApp
WHERE { 

wd:Q147235 wdt:P527/wdt:P527 ?e.

?e wdt:P161 ?a;
   wdt:P577 ?pubDate.
   
?a <http://schema.org/name> ?actor.

} 
GROUP BY ?actor
ORDER BY ?firstApp
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('actor', 'Marshall Manesh'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2011-05-16T00:00:00Z')]
[('actor', 'Lyndsy Fonseca'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Josh Radnor'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'David Henrie'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Jason Segel'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Cobie Smulders'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Neil Patrick Harris'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Bob Saget'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2011-10-24T00:00:00Z')]
[('actor', 'Alyson Hannigan'), ('firstApp', '2005-09-19T00:00:00Z'), ('lastApp', '2014-03-31T00:00:00Z')]
[('actor', 'Jon Bernthal'), ('firstApp', '2005-09-26T

30

I now need to filter the actors' work based on the time information. In order to do it I have to see how the time information is stored for the other films the actors were working on. Firstly, I retrieve which "instance of" (wdt:P31) are connected to actors by means of the property "cast member" (wdt:P161).

In [37]:
queryString = """
SELECT ?i ?instanceOf COUNT(?f) AS ?films
WHERE { 

wd:Q147235 wdt:P161 ?a.

?f wdt:P161 ?a;
   wdt:P31 ?i.

?i <http://schema.org/name> ?instanceOf.

} 
GROUP BY ?i ?instanceOf
ORDER BY DESC(?films)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('i', 'http://www.wikidata.org/entity/Q11424'), ('instanceOf', 'film'), ('films', '3399')]
[('i', 'http://www.wikidata.org/entity/Q5398426'), ('instanceOf', 'television series'), ('films', '1644')]
[('i', 'http://www.wikidata.org/entity/Q21191270'), ('instanceOf', 'television series episode'), ('films', '1503')]
[('i', 'http://www.wikidata.org/entity/Q506240'), ('instanceOf', 'television film'), ('films', '352')]
[('i', 'http://www.wikidata.org/entity/Q229390'), ('instanceOf', '3D film'), ('films', '85')]
[('i', 'http://www.wikidata.org/entity/Q24869'), ('instanceOf', 'feature film'), ('films', '82')]
[('i', 'http://www.wikidata.org/entity/Q3464665'), ('instanceOf', 'television series season'), ('films', '40')]
[('i', 'http://www.wikidata.org/entity/Q1259759'), ('instanceOf', 'miniseries'), ('films', '36')]
[('i', 'http://www.wikidata.org/entity/Q24862'), ('instanceOf', 'short film'), ('films', '25')]
[('i', 'http://www.wikidata.org/entity/Q15416'), ('instanceOf', 'television 

25

In order to retrieve only the films or tv series i need to filter out episodes and seasons. Since they are multiple instance I will use two regex. In the following query I return the number of films each HIMYM actor worked on.

In [38]:
queryString = """
SELECT ?a ?actor COUNT(?f) AS ?films
WHERE { 

wd:Q147235 wdt:P161 ?a.

?f wdt:P161 ?a;
   wdt:P31 ?i.
   
?a <http://schema.org/name> ?actor.
?i <http://schema.org/name> ?instanceOf.
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*"))

} 
GROUP BY ?a ?actor
ORDER BY DESC(?films)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q192165'), ('actor', 'Danny Glover'), ('films', '120')]
[('a', 'http://www.wikidata.org/entity/Q236189'), ('actor', 'Judy Greer'), ('films', '75')]
[('a', 'http://www.wikidata.org/entity/Q536437'), ('actor', 'Edward Herrmann'), ('films', '75')]
[('a', 'http://www.wikidata.org/entity/Q471018'), ('actor', 'Ernie Hudson'), ('films', '74')]
[('a', 'http://www.wikidata.org/entity/Q311271'), ('actor', 'John Lithgow'), ('films', '73')]
[('a', 'http://www.wikidata.org/entity/Q298777'), ('actor', 'Michael York'), ('films', '71')]
[('a', 'http://www.wikidata.org/entity/Q186757'), ('actor', 'Seth Green'), ('films', '66')]
[('a', 'http://www.wikidata.org/entity/Q309900'), ('actor', 'Peter Gallagher'), ('films', '65')]
[('a', 'http://www.wikidata.org/entity/Q545172'), ('actor', 'Matt Frewer'), ('films', '64')]
[('a', 'http://www.wikidata.org/entity/Q123849'), ('actor', 'Jane Seymour'), ('films', '61')]
[('a', 'http://www.wikidata.org/entity/Q1332676'),

25

Now I need to find the information about when the actors worked in the different films. In order to do that I check the properties of such films. As always I display such properties based on the number of objects connected to it.

In [40]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?o) AS ?objs
WHERE { 

wd:Q147235 wdt:P161 ?a.

?f wdt:P161 ?a;
   wdt:P31 ?i;
   ?p ?o.
   
?p <http://schema.org/name> ?pname.
?i <http://schema.org/name> ?instanceOf.
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*"))

} 
GROUP BY ?p ?pname
ORDER BY DESC(?objs)
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('objs', '19685')]
[('p', 'http://www.wikidata.org/prop/direct/P345'), ('pname', 'IMDb ID'), ('objs', '3725')]
[('p', 'http://www.wikidata.org/prop/direct/P2603'), ('pname', 'Kinopoisk film ID'), ('objs', '3515')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pname', 'title'), ('objs', '3424')]
[('p', 'http://www.wikidata.org/prop/direct/P3138'), ('pname', 'OFDb film ID'), ('objs', '3293')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('pname', 'Freebase ID'), ('objs', '3134')]
[('p', 'http://www.wikidata.org/prop/direct/P4947'), ('pname', 'TMDb movie ID'), ('objs', '3089')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pname', 'ČSFD film ID'), ('objs', '3059')]
[('p', 'http://www.wikidata.org/prop/direct/P6127'), ('pname', 'Letterboxd film ID'), ('objs', '2983')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('objs', '2916')]
[('p', 'ht

30

I discovered the property "publication date" (wdt:P577) that could contain the information I need. I run a query to return some triples with such property. 

In [41]:
queryString = """
SELECT DISTINCT ?film ?pubDate
WHERE { 

wd:Q147235 wdt:P161 ?a.

?f wdt:P161 ?a;
   wdt:P31 ?i;
   wdt:P577 ?pubDate.
   
?f <http://schema.org/name> ?film.
?i <http://schema.org/name> ?instanceOf.
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*"))

} 
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('film', 'Some Came Running'), ('pubDate', '1958-01-01T00:00:00Z')]
[('film', 'Miss Meadows'), ('pubDate', '2014-04-21T00:00:00Z')]
[('film', 'Nantucket'), ('pubDate', '2002-01-01T00:00:00Z')]
[('film', 'Magic Mike'), ('pubDate', '2012-01-01T00:00:00Z')]
[('film', 'Magic Mike'), ('pubDate', '2012-08-09T00:00:00Z')]
[('film', 'Magic Mike'), ('pubDate', '2012-08-16T00:00:00Z')]
[('film', 'Under Wraps'), ('pubDate', '1997-10-25T00:00:00Z')]
[('film', 'The Jerk'), ('pubDate', '1979-01-01T00:00:00Z')]
[('film', 'The Jerk'), ('pubDate', '1980-06-20T00:00:00Z')]
[('film', 'Austin Powers: The Spy Who Shagged Me'), ('pubDate', '1999-06-08T00:00:00Z')]
[('film', 'Austin Powers: The Spy Who Shagged Me'), ('pubDate', '1999-06-11T00:00:00Z')]
[('film', 'Austin Powers: The Spy Who Shagged Me'), ('pubDate', '1999-10-14T00:00:00Z')]
[('film', 'Rush Hour 3'), ('pubDate', '2007-07-30T00:00:00Z')]
[('film', 'Rush Hour 3'), ('pubDate', '2007-08-16T00:00:00Z')]
[('film', 'Legally Blonde'), ('pubDa

30

We have different publication date for the same film but that is not strange because it could indicate different distribution (for example in different country). Now I have all the information I need to answer the questions. Firstly, I retrieve the 10 actors who acted in more films while working in HIMYM.

In [42]:
queryString = """
SELECT ?actor COUNT(?f) AS ?films
WHERE { 

?f wdt:P161 ?a;
   wdt:P31 ?i;
   wdt:P577 ?filmDate.
   
?a <http://schema.org/name> ?actor.
?i <http://schema.org/name> ?instanceOf.
FILTER(?f != wd:Q14725).
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*")).
FILTER(?filmDate >= ?firstApp && ?filmDate <= ?lastApp).

{
    SELECT ?a MIN(?pubDate) AS ?firstApp MAX(?pubDate) AS ?lastApp
    WHERE { 

        wd:Q147235 wdt:P527/wdt:P527 ?e.

        ?e wdt:P161 ?a;
           wdt:P577 ?pubDate.

    } 
    GROUP BY ?a
}
} 
GROUP BY ?actor
ORDER BY DESC(?films)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'Neil Patrick Harris'), ('films', '38')]
[('actor', 'Jason Segel'), ('films', '30')]
[('actor', 'Cobie Smulders'), ('films', '26')]
[('actor', 'Bill Fagerbakke'), ('films', '17')]
[('actor', 'Lyndsy Fonseca'), ('films', '14')]
[('actor', 'Zachary Gordon'), ('films', '11')]
[('actor', 'Marshall Manesh'), ('films', '10')]
[('actor', 'Alyson Hannigan'), ('films', '8')]
[('actor', 'Will Forte'), ('films', '8')]
[('actor', 'Frances Conroy'), ('films', '6')]


10

I now return the 10 actors who participated in more films after the end of the tv series.

In [43]:
queryString = """
SELECT ?actor COUNT(?f) AS ?films
WHERE { 

?f wdt:P161 ?a;
   wdt:P31 ?i;
   wdt:P577 ?filmDate.
   
?a <http://schema.org/name> ?actor.
?i <http://schema.org/name> ?instanceOf.
FILTER(?f != wd:Q14725).
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*")).
FILTER(?filmDate > ?lastApp).

{
    SELECT ?a MAX(?pubDate) AS ?lastApp
    WHERE { 

        wd:Q147235 wdt:P527/wdt:P527 ?e.

        ?e wdt:P161 ?a;
           wdt:P577 ?pubDate.

    } 
    GROUP BY ?a
}
} 
GROUP BY ?actor
ORDER BY DESC(?films)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('actor', 'Judy Greer'), ('films', '100')]
[('actor', 'Will Forte'), ('films', '83')]
[('actor', 'Danny Glover'), ('films', '79')]
[('actor', 'Taran Killam'), ('films', '78')]
[('actor', 'Jon Bernthal'), ('films', '66')]
[('actor', 'John Cho'), ('films', '64')]
[('actor', 'Bryan Cranston'), ('films', '63')]
[('actor', 'Scoot McNairy'), ('films', '54')]
[('actor', 'Gary Anthony Williams'), ('films', '51')]
[('actor', 'Kevin Heffernan'), ('films', '47')]


10

The two result sets are very different from each other. As expected, the actors who worked in more films while playing a role in HIMYM are the ones who worked on HIMYM for a longer period of time, for example Neil Patrick Harris. The second question instead returns mostly different actors from the previous one.

### Screenwriters
Before moving on, I recall we have information about HIMYM screenwriters (wdt:P58) therefore I can also return the number of films or tv series each screenwriter worked on while HIMYM was airing. Firstly I retrieve how many screenwriters HIMYM have.

In [44]:
queryString = """
SELECT COUNT(DISTINCT ?sw) AS ?screenwriters
WHERE { 

wd:Q147235 wdt:P58 ?sw.

} 

"""

print("Results")
run_query(queryString)

Results
[('screenwriters', '2')]


1

Since they are only 2 I can return the number of films or tv series each screenwriter worked on while HIMYM was airing for both of them. In order to do that I use  properties "start time" (wdt:P580) and "end time" (wdt:P582). It is important to notice that this query is based on the assumption that each screenwriter worked on HIMYM for all its duration.

In [45]:
queryString = """
SELECT ?screenwriter COUNT(?f) AS ?films
WHERE { 

wd:Q147235 wdt:P58 ?sw; 
           wdt:P580 ?startTime;
           wdt:P582 ?endTime.

?f wdt:P58 ?sw;
   wdt:P31 ?i;
   wdt:P577 ?filmDate.
   
?sw <http://schema.org/name> ?screenwriter.
?i <http://schema.org/name> ?instanceOf.
FILTER(?f != wd:Q14725).
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*")).
FILTER(?filmDate > ?startTime && ?filmDate < ?endTime).

} 
GROUP BY ?screenwriter
ORDER BY DESC(?films)
"""

print("Results")
run_query(queryString)

Results
Empty


0

The result set is empty, this could mean that we have no data about other films or tv series written by them. Let us check if this is the case.

In [46]:
queryString = """
SELECT ?screenwriter ?film
WHERE { 

wd:Q147235 wdt:P58 ?sw. 

?f wdt:P58 ?sw;
   wdt:P31 ?i.

   
?sw <http://schema.org/name> ?screenwriter.
?f <http://schema.org/name> ?film.
?i <http://schema.org/name> ?instanceOf.
FILTER(!regex(?instanceOf, ".*episode.*") && !regex(?instanceOf, ".*season.*")).

} 
GROUP BY ?screenwriter
ORDER BY DESC(?films)
"""

print("Results")
run_query(queryString)

Results
[('screenwriter', 'Craig Thomas'), ('film', 'How I Met Your Mother')]
[('screenwriter', 'Carter Bays'), ('film', 'How I Met Your Mother')]


2

As expected we only have data about HIMYM therefore we cannot return any useful information following this flow.

## Comparison between HIMYM and The Office (US)
This section contains the queries that answer the fourth question:
_Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episodes and cast members._

I have already discussed the number of seasons and episodes in the first question. I will compare the two tv series in terms of cast members but I recall we do not have much information about The Office(US) cast members. I recall I discovered both tv series have the same number of season and almost the same number of episodes but the number of episodes per season differs.

I now compare the two tv series based on the cast members (wdt:P161). I recall there The Office(US) have fewer data about the cast members therefore the results may be affected by it. Firstly I return the number of cast members for each show.

In [47]:
queryString = """
SELECT COUNT(DISTINCT ?chimym) AS ?himymCast COUNT(DISTINCT ?coff) AS ?offCast
WHERE { 

wd:Q23831 wdt:P161 ?coff.

wd:Q147235 wdt:P161 ?chimym.

} 
"""

print("Results")
run_query(queryString)

Results
[('himymCast', '480'), ('offCast', '25')]


1

In order to have a fair comparison between the two I return only the 25 most-present actors in HIMYM.

In [48]:
queryString = """
SELECT DISTINCT ?actor COUNT(DISTINCT ?e) AS ?episodes
WHERE { 

wd:Q147235 wdt:P527/wdt:P527 ?e.
?e wdt:P161 ?a.

?a <http://schema.org/name> ?actor.

} 
GROUP BY ?actor 
ORDER BY DESC(?episodes)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('actor', 'Josh Radnor'), ('episodes', '145')]
[('actor', 'Jason Segel'), ('episodes', '145')]
[('actor', 'Cobie Smulders'), ('episodes', '145')]
[('actor', 'Neil Patrick Harris'), ('episodes', '145')]
[('actor', 'Alyson Hannigan'), ('episodes', '143')]
[('actor', 'Bob Saget'), ('episodes', '142')]
[('actor', 'Lyndsy Fonseca'), ('episodes', '48')]
[('actor', 'David Henrie'), ('episodes', '48')]
[('actor', 'Charlene Amoia'), ('episodes', '17')]
[('actor', 'Jennifer Morrison'), ('episodes', '12')]
[('actor', 'Marshall Manesh'), ('episodes', '11')]
[('actor', 'Sarah Chalke'), ('episodes', '10')]
[('actor', 'Ashley Williams'), ('episodes', '9')]
[('actor', 'Bill Fagerbakke'), ('episodes', '9')]
[('actor', 'Suzie Plakson'), ('episodes', '9')]
[('actor', 'Nazanin Boniadi'), ('episodes', '8')]
[('actor', 'Bob Odenkirk'), ('episodes', '7')]
[('actor', 'Bryan Callen'), ('episodes', '6')]
[('actor', 'David Burtka'), ('episodes', '5')]
[('actor', 'Joe Manganiello'), ('episodes', '5')]
[(

25

I now look for the actors' properties. As per usual, I order such properties based on the number of objects. I run two queries in order to retrieve such propertie for the two tvseries.

In [49]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?o) AS ?objs
WHERE { 
?a ?p ?o.

?p <http://schema.org/name> ?pname.
{
    SELECT ?a COUNT(DISTINCT ?e) AS ?episodes
    WHERE { 

        wd:Q147235 wdt:P527/wdt:P527 ?e.
        ?e wdt:P161 ?a.

    } 
    GROUP BY ?a
    ORDER BY DESC(?episodes)
    LIMIT 25
}

} 
GROUP BY ?p ?pname 
ORDER BY DESC(?objs)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P6262'), ('pname', 'Fandom article ID'), ('objs', '40')]
[('p', 'http://www.wikidata.org/prop/direct/P69'), ('pname', 'educated at'), ('objs', '31')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pname', 'given name'), ('objs', '29')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('objs', '27')]
[('p', 'http://www.wikidata.org/prop/direct/P1266'), ('pname', 'AlloCiné person ID'), ('objs', '26')]
[('p', 'http://www.wikidata.org/prop/direct/P2605'), ('pname', 'ČSFD person ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P2604'), ('pname', 'Kinopoisk person ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P2019'), ('pname', 'AllMovie person ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P2435'), ('pname', 'PORT person ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P4985'), ('pname', 'TMDb person ID'), ('objs', '25')]
[('p', 'htt

25

In [50]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?o) AS ?objs
WHERE { 

wd:Q23831 wdt:P161 ?a.
?a ?p ?o.

?p <http://schema.org/name> ?pname.

} 
GROUP BY ?p ?pname 
ORDER BY DESC(?objs)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P69'), ('pname', 'educated at'), ('objs', '47')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pname', 'given name'), ('objs', '33')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pname', 'image'), ('objs', '28')]
[('p', 'http://www.wikidata.org/prop/direct/P7859'), ('pname', 'WorldCat Identities ID'), ('objs', '26')]
[('p', 'http://www.wikidata.org/prop/direct/P4985'), ('pname', 'TMDb person ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('pname', 'Freebase ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P734'), ('pname', 'family name'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P214'), ('pname', 'VIAF ID'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P569'), ('pname', 'date of birth'), ('objs', '25')]
[('p', 'http://www.wikidata.org/prop/direct/P244'), ('pname', 'Library of Congress authority ID'), ('objs', '25')]
[('p', 'http://www.wikidat

25

I could return the youngest and oldest actor for the two tv series by means of the property date of birth (wdt:P569). 

In [51]:
queryString = """
SELECT ?offActor ?birthDate ?character
WHERE { 

wd:Q23831 wdt:P161 ?a.
?a wdt:P569 ?birthDate.

?a <http://schema.org/name> ?offActor.

OPTIONAL{ wd:Q23831 wdt:P674 ?c.
          ?c wdt:P175 ?a.
          ?c <http://schema.org/name> ?character. }

}
ORDER BY ?birthDate
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('offActor', 'Creed Bratton'), ('birthDate', '1943-02-08T00:00:00Z')]


1

In [52]:
queryString = """
SELECT ?offActor ?birthDate ?character
WHERE { 

wd:Q23831 wdt:P161 ?a.
?a wdt:P569 ?birthDate.

?a <http://schema.org/name> ?offActor.

OPTIONAL{ wd:Q23831 wdt:P674 ?c.
          ?c wdt:P175 ?a.
          ?c <http://schema.org/name> ?character. }

}
ORDER BY DESC(?birthDate)
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('offActor', 'Clark Duke'), ('birthDate', '1985-05-05T00:00:00Z')]


1

In [53]:
queryString = """
SELECT ?himymActor ?birthDate ?character
WHERE { 

?a wdt:P569 ?birthDate.

?a <http://schema.org/name> ?himymActor.

OPTIONAL{ wd:Q147235 wdt:P674 ?c.
          ?c wdt:P175 ?a.
          ?c <http://schema.org/name> ?character. }

{
    SELECT ?a COUNT(DISTINCT ?e) AS ?episodes
    WHERE { 

        wd:Q147235 wdt:P527/wdt:P527 ?e.
        ?e wdt:P161 ?a.

    } 
    GROUP BY ?a
    ORDER BY DESC(?episodes)
    LIMIT 25
}

}
ORDER BY ?birthDate
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('himymActor', 'Marshall Manesh'), ('birthDate', '1950-08-16T00:00:00Z'), ('character', 'Ranjit Singh')]


1

In [54]:
queryString = """
SELECT ?himymActor ?birthDate ?character
WHERE { 

?a wdt:P569 ?birthDate.

?a <http://schema.org/name> ?himymActor.

OPTIONAL{ wd:Q147235 wdt:P674 ?c.
          ?c wdt:P175 ?a.
          ?c <http://schema.org/name> ?character. }
          
{
    SELECT ?a COUNT(DISTINCT ?e) AS ?episodes
    WHERE { 

        wd:Q147235 wdt:P527/wdt:P527 ?e.
        ?e wdt:P161 ?a.

    } 
    GROUP BY ?a
    ORDER BY DESC(?episodes)
    LIMIT 25
}

}
ORDER BY DESC(?birthDate)
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('himymActor', 'Darcy Rose Byrnes'), ('birthDate', '1998-11-04T00:00:00Z'), ('character', 'Lucy Zinman')]


1

Overall it looks like the cast of HIMYM is younger than the one of The Office(US). 

Since the cast of HIMYM is large, I can return the occupation of actors who apperared in HIMYM that are not televisoon actors. I recall in a previous question I discovered property "occupation" (wdt:P106) and the node (wd:Q10798782) "televison actor".

In [55]:
queryString = """
SELECT ?occupation COUNT(DISTINCT ?a) AS ?actors
WHERE { 

wd:Q147235 wdt:P161 ?a.

?a wdt:P106 ?o.

?o <http://schema.org/name> ?occupation.

FILTER (?o != wd:Q10798782).

}
ORDER BY DESC(?actors)
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('occupation', 'actor'), ('actors', '396')]
[('occupation', 'film actor'), ('actors', '252')]
[('occupation', 'voice actor'), ('actors', '94')]
[('occupation', 'stage actor'), ('actors', '70')]
[('occupation', 'screenwriter'), ('actors', '65')]
[('occupation', 'singer'), ('actors', '52')]
[('occupation', 'model'), ('actors', '49')]
[('occupation', 'film producer'), ('actors', '48')]
[('occupation', 'film director'), ('actors', '34')]
[('occupation', 'writer'), ('actors', '27')]
[('occupation', 'comedian'), ('actors', '26')]
[('occupation', 'television producer'), ('actors', '26')]
[('occupation', 'musician'), ('actors', '24')]
[('occupation', 'singer-songwriter'), ('actors', '14')]
[('occupation', 'dancer'), ('actors', '14')]


15

Wikidata has highly specific occupation therefore most of the cast is an actor but not specifically a television actor. Then we have singers, models etc.

I can also check if there are some actors who played a role both in HIMYM and The Office(US) even though, given the lack of information, it is very unlikely.

In [56]:
queryString = """
SELECT DISTINCT ?actor
WHERE { 

wd:Q147235 wdt:P161 ?a.
wd:Q23831 wdt:P161 ?a.

?a <http://schema.org/name> ?actor.

}
"""

print("Results")
run_query(queryString)

Results
Empty


0

Finally, I can compare the two tv series based on the number of different directors they had. In fact, I previously discovered we have information about the directors of the single episodes.

In [57]:
queryString = """
SELECT COUNT(DISTINCT ?h) AS ?himymDirectors COUNT(DISTINCT ?o) AS ?offDirectors
WHERE { 

wd:Q147235 wdt:P527/wdt:P527/wdt:P57 ?h.

wd:Q23831 wdt:P527/wdt:P527/wdt:P57 ?o.

}
"""

print("Results")
run_query(queryString)

Results
[('himymDirectors', '4'), ('offDirectors', '51')]


1

It turns out HIMYM episodes are all directed by 4 directors while The Office(US)'s episodes have been directed by more different people.

## Kavin Bacon number
This section contains the queries that answer the last question:
_Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2_

Actors with Kevin Bacon number equal to 2 are actors who acted in a film with someone who acted with Kevin Bacon. Firsly I need to find the node for Kevin Bacon.

In [58]:
queryString = """
SELECT DISTINCT ?a ?actor
WHERE { 
?a wdt:P106 ?p.
?p <http://schema.org/name> ?profession.
?a <http://schema.org/name> ?actor.
FILTER(regex(?profession, ".*actor.*")).
FILTER(regex(?actor, "Kevin Bacon")).

} 
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q3454165'), ('actor', 'Kevin Bacon')]


1

Now I can retrieve all actors who worked in a movie with him by using the property "cast member"(wdt:P161). 

In [59]:
queryString = """
SELECT COUNT(DISTINCT ?a) AS ?actors
WHERE {

?f wdt:P161 ?a;
   wdt:P161 wd:Q3454165.
   
FILTER(?a != wd:Q3454165).

} 
"""

print("Results")
run_query(queryString)

Results
[('actors', '838')]


1

In the end I can retrieve how many actors of the two tv serie have bacon number equal to 2.

In [60]:
queryString = """
SELECT COUNT(DISTINCT ?hActor) AS ?himymActors COUNT(DISTINCT ?oActor) AS ?offActors
WHERE {

    wd:Q147235 wdt:P161 ?hActor.
    
    wd:Q23831 wdt:P161 ?oActor.

    ?f1 wdt:P161 ?hActor;
        wdt:P161 ?a1.
        
    ?f2 wdt:P161 ?oActor;
        wdt:P161 ?a2.

    FILTER(?a1 != ?hActor && ?a1 = ?a).
    FILTER(?a2 != ?oActor && ?a2 = ?a).

{
    SELECT DISTINCT ?a
    WHERE { 

    ?f wdt:P161 ?a;
       wdt:P161 wd:Q3454165.

    FILTER(?a != wd:Q3454165).

    } 
}

}
"""

print("Results")
run_query(queryString)

Results
[('himymActors', '480'), ('offActors', '25')]


1

From this query it looks like all cast members of both tv series have Kevin Bacon number equal to 2. I think there is a problem with the paths. 

I change the approach and return the number of actors of the two tv series in two different query. Starting from The Office(US), firstly I check if Kevin Bacon played a role in The Office (US).

In [61]:
queryString = """
SELECT DISTINCT ?a
WHERE {

wd:Q23831 wdt:P161 ?a.
   
FILTER(?a = wd:Q3454165).

} 
"""

print("Results")
run_query(queryString)

Results
Empty


0

Since Kevin Bacon did not work on The Office(US) I check if any actor of The Office(US) has worked with Kevin Bacon in any other film.

In [62]:
queryString = """
SELECT DISTINCT ?a ?actor
WHERE {

?f wdt:P161 ?a;
   wdt:P161 wd:Q3454165.
   
wd:Q23831 wdt:P161 ?a.
   
FILTER(?a != wd:Q3454165).
?a <http://schema.org/name> ?actor.

} 
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q216221'), ('actor', 'Steve Carell')]
[('a', 'http://www.wikidata.org/entity/Q349548'), ('actor', 'Rainn Wilson')]


2

Since I have two actors with Kevin Bacon number equal to 1 (because they worked with him), all the rest of the cast of The Office(US) has Kevin Bacon number equal to 2. Therefore I can count the actors who have Kevin Bacon number equal to 2 by subtracting the actors who directly worked with Kevin Bacon.

In [63]:
queryString = """
SELECT COUNT(DISTINCT ?a) AS ?offActors
WHERE {

    { wd:Q23831 wdt:P161 ?a. }
    MINUS
    {
        ?f wdt:P161 ?a;
           wdt:P161 wd:Q3454165.

        wd:Q23831 wdt:P161 ?a.
        FILTER(?a != wd:Q3454165).
        
    }

}
"""

print("Results")
run_query(queryString)

Results
[('offActors', '23')]


1

I now repeat the same reasoning for HIMYM actors. First I check if Kevin Bacon acted in HIMYM.

In [64]:
queryString = """
SELECT DISTINCT ?a
WHERE {

wd:Q147235 wdt:P161 ?a.
   
FILTER(?a = wd:Q3454165).

} 
"""

print("Results")
run_query(queryString)

Results
Empty


0

Now I check if any actors who worked in HIMYM played a role in a film together with Kevin Bacon.

In [65]:
queryString = """
SELECT DISTINCT ?a ?actor
WHERE {

?f wdt:P161 ?a;
   wdt:P161 wd:Q3454165.
   
wd:Q147235 wdt:P161 ?a.
   
FILTER(?a != wd:Q3454165).
?a <http://schema.org/name> ?actor.

} 
"""

print("Results")
run_query(queryString)

Results
[('a', 'http://www.wikidata.org/entity/Q269891'), ('actor', 'Julianna Guill')]
[('a', 'http://www.wikidata.org/entity/Q312705'), ('actor', 'John Cho')]
[('a', 'http://www.wikidata.org/entity/Q234715'), ('actor', 'Jamie-Lynn Sigler')]
[('a', 'http://www.wikidata.org/entity/Q311271'), ('actor', 'John Lithgow')]
[('a', 'http://www.wikidata.org/entity/Q234137'), ('actor', 'Megan Mullally')]
[('a', 'http://www.wikidata.org/entity/Q530646'), ('actor', 'Ray Wise')]
[('a', 'http://www.wikidata.org/entity/Q199929'), ('actor', 'Jennifer Morrison')]
[('a', 'http://www.wikidata.org/entity/Q433355'), ('actor', 'Patricia Belcher')]
[('a', 'http://www.wikidata.org/entity/Q234514'), ('actor', 'Camryn Manheim')]
[('a', 'http://www.wikidata.org/entity/Q362616'), ('actor', 'Jon Bernthal')]
[('a', 'http://www.wikidata.org/entity/Q329744'), ('actor', 'Martin Short')]
[('a', 'http://www.wikidata.org/entity/Q5357354'), ('actor', 'Todd Stashwick')]
[('a', 'http://www.wikidata.org/entity/Q782662'), ('a

16

Since I have 16 actors with Kevin Bacon number equal to 1 (because they worked with him), all the rest of the cast of HIMYM has Kevin Bacon number equal to 2. Therefore I can count them by subtracting the actors who directly worked with Kevin Bacon.

In [66]:
queryString = """
SELECT COUNT(DISTINCT ?a) AS ?himymActors
WHERE {

    { wd:Q147235 wdt:P161 ?a. }
    MINUS
    {
        ?f wdt:P161 ?a;
        wdt:P161 wd:Q3454165.
   
        wd:Q147235 wdt:P161 ?a.
   
        FILTER(?a != wd:Q3454165).

    }

}
"""

print("Results")
run_query(queryString)

Results
[('himymActors', '464')]


1