# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-1e291c28f0-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series explorative search") 

## Workflow 4


Consider the following exploratory scenario:


> we are interested in the TV series "How I met your mother" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q147235` | How I met your mother        | node |
| `wd:Q23831` | The Office (US)        | node |



Also consider

```
wd:Q23831 ?p ?obj .
```

is the BGP to retrieve all **properties of The Office (US)**

The workload should


1. Return the number of seasons and episodes per season of the tv series

2. Get the number of episodes in which the cast members played a role. Who are the most present actors?

3. Check who is the actor who acted in more films while working on "How I met your mother" and who is the actor who participated in more films after the end of the tv series.

4. Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members.

5. Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2

## Task 1 - (HOW I MET YOUR MOTHER)

#### Getting all properties of HIMYM

In [25]:
queryString = """
SELECT distinct  ?p ?pname WHERE { 

?sub ?p wd:Q147235.

?p <http://schema.org/name> ?pname .
}
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1423'), ('pname', 'template has topic')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pname', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P144'), ('pname', 'based on')]
[('p', 'http://www.wikidata.org/prop/direct/P1441'), ('pname', 'present in work')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic")]
[('p', 'http://www.wikidata.org/prop/direct/P800'), ('pname', 'notable work')]
[('p', 'http://www.wikidata.org/prop/direct/P971'), ('pname', 'category combines topics')]


8

In [8]:
queryString = """
SELECT distinct  ?p ?pname WHERE { 

wd:Q147235 ?p ?obj.

# this returns the labels
?p <http://schema.org/name> ?pname .
} 
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2438'), ('pname', 'narrator')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pname', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('pname', 'AlloCiné series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pname', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('pname', 'nominated for')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('pname', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pname', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pname', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pname', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member')]
[('p', 'http://www.wikida

88

#### Get The properties of an episode of the serie (with "part of the serie" wdt:P179)

In [7]:
queryString = """
SELECT distinct ?p ?pname (COUNT(*) AS ?number)WHERE { 

?sub wdt:P179 wd:Q147235;
     ?p ?obj.

?p <http://schema.org/name> ?pname .
} 
GROUP BY ?p ?pname
ORDER BY DESC (?number)
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('number', '1331')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter'), ('number', '335')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('number', '220')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show'), ('number', '220')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series'), ('number', '220')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pname', 'EIDR content ID'), ('number', '218')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID'), ('number', '217')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pname', 'Open Media Database film ID'), ('number', '217')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pname', 'TV.com ID'), ('number', '217')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pna

50

#### Getting all directors of the series

In [None]:
queryString = """
SELECT distinct ?directorname WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P57 ?director.

?director <http://schema.org/name> ?directorname .
} 
"""
print("Results")
run_query(queryString)

#### Get the number of seasons

In [2]:
queryString = """
SELECT distinct ?seasons WHERE { 

wd:Q147235 wdt:P2437 ?seasons.
} 
"""
print("Results")
run_query(queryString)

Results
[('seasons', '9')]


1

#### I retrieve the seasons of HIMWM with "has part" (wdt:P527)

In [3]:
queryString = """
SELECT distinct ?seasons
WHERE { 

wd:Q147235 wdt:P527 ?seasons.
} 
"""
print("Results")
run_query(queryString)

Results
[('seasons', 'http://www.wikidata.org/entity/Q2438066')]
[('seasons', 'http://www.wikidata.org/entity/Q2715578')]
[('seasons', 'http://www.wikidata.org/entity/Q13567027')]
[('seasons', 'http://www.wikidata.org/entity/Q2567330')]
[('seasons', 'http://www.wikidata.org/entity/Q338715')]
[('seasons', 'http://www.wikidata.org/entity/Q582332')]
[('seasons', 'http://www.wikidata.org/entity/Q3468515')]
[('seasons', 'http://www.wikidata.org/entity/Q2555117')]
[('seasons', 'http://www.wikidata.org/entity/Q2472427')]


9

#### ...and i inspect their properties

In [11]:
queryString = """
SELECT distinct ?p ?pname
WHERE { 

wd:Q147235 wdt:P527 ?seasons.
?seasons ?p ?obj.

?p <http://schema.org/name> ?pname .
} 
"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('pname', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pname', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1712'), ('pname', 'Metacritic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pname', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('pname', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('pname', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('pname', 'TV.com ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('pname', 'EIDR content ID')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P3302'), ('pname', 'Open Media Database film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P364'), ('pname', 'original language of film or TV show')]
[('p', 'http://www.wikidata.org/prop/direct/P437'), ('pname'

21

#### Found "number of episodes" (wdt:P1113)

In [12]:
queryString = """
SELECT distinct ?season ?number
WHERE { 

wd:Q147235 wdt:P527 ?s.
?s wdt:P1113 ?number.

?s <http://schema.org/name> ?season .
} 

"""
print("Results")
run_query(queryString)

Results
[('season', 'How I Met Your Mother, season 6'), ('number', '24')]
[('season', 'How I Met Your Mother, season 1'), ('number', '22')]
[('season', 'How I Met Your Mother, season 9'), ('number', '24')]
[('season', 'How I Met Your Mother, season 4'), ('number', '24')]
[('season', 'How I Met Your Mother, season 8'), ('number', '24')]
[('season', 'How I Met Your Mother, season 5'), ('number', '24')]
[('season', 'How I Met Your Mother, season 2'), ('number', '22')]
[('season', 'How I Met Your Mother, season 3'), ('number', '20')]
[('season', 'How I Met Your Mother, season 7'), ('number', '24')]


9

In [4]:
queryString = """
SELECT distinct ?actor ?actorname WHERE { 

wd:Q147235 wdt:P161 ?actor.

?actor <http://schema.org/name> ?actorname .
} 

"""
print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q465556'), ('actorname', 'Annie Ilonzeh')]
[('actor', 'http://www.wikidata.org/entity/Q269891'), ('actorname', 'Julianna Guill')]
[('actor', 'http://www.wikidata.org/entity/Q4888924'), ('actorname', 'Benjamin Koldyke')]
[('actor', 'http://www.wikidata.org/entity/Q446031'), ('actorname', 'Nikki Griffin')]
[('actor', 'http://www.wikidata.org/entity/Q312705'), ('actorname', 'John Cho')]
[('actor', 'http://www.wikidata.org/entity/Q469579'), ('actorname', 'Mircea Monroe')]
[('actor', 'http://www.wikidata.org/entity/Q516659'), ('actorname', 'Virginia Williams')]
[('actor', 'http://www.wikidata.org/entity/Q522856'), ('actorname', 'Kate Micucci')]
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorname', 'Cobie Smulders')]
[('actor', 'http://www.wikidata.org/entity/Q435839'), ('actorname', 'Ashley Williams')]
[('actor', 'http://www.wikidata.org/entity/Q446481'), ('actorname', 'Harvey Fierstein')]
[('actor', 'http://www.wikidata.org/e

480

In [None]:
MIssing episodi famosi

## Task 1 - (The Office)

### Get all Propertites of the office

In [None]:
queryString = """
SELECT distinct  ?p ?pname WHERE { 

wd:Q23831 ?p ?obj .

?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

In [None]:
queryString = """
SELECT distinct  ?p ?pname WHERE { 

?sub ?p wd:Q23831 .

?p <http://schema.org/name> ?pname.
} 
"""
print("Results")
run_query(queryString)

#### Searching for all Seasons, first try with "part of the series"

In [5]:
queryString = """
SELECT distinct ?sub ?subname
WHERE { 

?sub wdt:P179 wd:Q23831 .

?sub <http://schema.org/name> ?subname .
} 
"""

print("Results")
run_query(queryString)

Results
[('sub', 'http://www.wikidata.org/entity/Q50379836'), ('subname', 'Classy Christmas (part 1)')]
[('sub', 'http://www.wikidata.org/entity/Q50379837'), ('subname', 'Classy Christmas (part 2)')]
[('sub', 'http://www.wikidata.org/entity/Q5099551'), ('subname', 'China')]
[('sub', 'http://www.wikidata.org/entity/Q5128465'), ('subname', 'Classy Christmas')]
[('sub', 'http://www.wikidata.org/entity/Q5178024'), ('subname', 'Couples Discount')]
[('sub', 'http://www.wikidata.org/entity/Q6927074'), ('subname', 'Moving On')]
[('sub', 'http://www.wikidata.org/entity/Q7880294'), ('subname', 'Ultimatum')]
[('sub', 'http://www.wikidata.org/entity/Q7914333'), ('subname', 'Vandalism')]
[('sub', 'http://www.wikidata.org/entity/Q3465812'), ('subname', 'The Office, season 1')]
[('sub', 'http://www.wikidata.org/entity/Q3468601'), ('subname', 'The Office, season 2')]
[('sub', 'http://www.wikidata.org/entity/Q3468797'), ('subname', 'The Office, season 3')]
[('sub', 'http://www.wikidata.org/entity/Q3468

226

#### Too much data, i try the other way with "The Office-hasPart-"

In [6]:
queryString = """
SELECT distinct  ?obj ?objname WHERE { 

wd:Q23831 wdt:P527 ?obj .

?obj <http://schema.org/name> ?objname .
} 

"""
print("Results")
run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q3465812'), ('objname', 'The Office, season 1')]
[('obj', 'http://www.wikidata.org/entity/Q3468601'), ('objname', 'The Office, season 2')]
[('obj', 'http://www.wikidata.org/entity/Q3468797'), ('objname', 'The Office, season 3')]
[('obj', 'http://www.wikidata.org/entity/Q3468906'), ('objname', 'The Office, season 4')]
[('obj', 'http://www.wikidata.org/entity/Q3468990'), ('objname', 'The Office, season 5')]
[('obj', 'http://www.wikidata.org/entity/Q3730253'), ('objname', 'The Office, season 9')]
[('obj', 'http://www.wikidata.org/entity/Q3730255'), ('objname', 'The Office, season 8')]
[('obj', 'http://www.wikidata.org/entity/Q3730261'), ('objname', 'The Office, season 6')]
[('obj', 'http://www.wikidata.org/entity/Q3730263'), ('objname', 'The Office, season 7')]


9

#### Found! Now i display all properties of seasons

In [None]:
queryString = """
SELECT distinct  ?p ?pname WHERE { 

wd:Q3465812 ?p ?obj .

?p <http://schema.org/name> ?pname .
} 

"""
print("Results")
run_query(queryString)

#### I found the property "number of episodes" now i combine everything to get the number of episodes for each season

In [13]:
queryString = """
SELECT distinct  ?seasonname ?number WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P1113 ?number.

?season <http://schema.org/name> ?seasonname.
} 

"""
print("Results")
run_query(queryString)

Results
[('seasonname', 'The Office, season 1'), ('number', '6')]
[('seasonname', 'The Office, season 2'), ('number', '22')]
[('seasonname', 'The Office, season 3'), ('number', '25')]
[('seasonname', 'The Office, season 4'), ('number', '19')]
[('seasonname', 'The Office, season 5'), ('number', '28')]
[('seasonname', 'The Office, season 9'), ('number', '25')]
[('seasonname', 'The Office, season 8'), ('number', '24')]
[('seasonname', 'The Office, season 6'), ('number', '26')]
[('seasonname', 'The Office, season 7'), ('number', '26')]


9

## Task 2 (The Office)

#### Try to get list of all episodes

In [3]:
queryString = """
SELECT distinct  ?episodename WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P527 ?episode.

?episode <http://schema.org/name> ?episodename.
} 

"""
print("Results")
run_query(queryString)

Results
[('episodename', 'Classy Christmas (part 1)')]
[('episodename', 'Classy Christmas (part 2)')]
[('episodename', 'China')]
[('episodename', 'Couples Discount')]
[('episodename', 'Moving On')]
[('episodename', 'Ultimatum')]
[('episodename', 'Vandalism')]
[('episodename', 'Baby Shower')]
[('episodename', 'Business Ethics')]
[('episodename', 'Crime Aid')]
[('episodename', 'Branch Closing')]
[('episodename', 'Diwali')]
[('episodename', 'The Merger')]
[('episodename', 'Cafe Disco')]
[('episodename', 'Casual Friday')]
[('episodename', 'Company Picnic')]
[('episodename', 'Job Fair')]
[('episodename', 'Goodbye, Toby (part 1)')]
[('episodename', 'Goodbye, Toby (part 2)')]
[('episodename', 'Goodbye, Michael')]
[('episodename', 'Gossip')]
[('episodename', "Michael's Last Dundies")]
[('episodename', 'The Meeting')]
[('episodename', 'Training Day')]
[('episodename', 'Doomsday')]
[('episodename', 'Gettysburg')]
[('episodename', "Pam's Replacement")]
[('episodename', 'Tallahassee')]
[('episoden

201

#### Good, now retrieve all properties of an episode

In [None]:
queryString = """
SELECT distinct  ?p  ?pname WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P527 ?episode.
?episode ?p ?obj.

?p <http://schema.org/name> ?pname.
} 

"""
print("Results")
run_query(queryString)

#### Found "cast member"

In [4]:
queryString = """
SELECT distinct ?actorname WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P527 ?episode.
?episode wdt:P161 ?actor.

?actor <http://schema.org/name> ?actorname.
} 

"""
print("Results")
run_query(queryString)

Results
[('actorname', 'Jenna Fischer')]
[('actorname', 'Catherine Tate')]
[('actorname', 'John Krasinski')]
[('actorname', 'Ed Helms')]
[('actorname', 'Rainn Wilson')]
[('actorname', 'Ellie Kemper')]
[('actorname', 'Craig Robinson')]


7

#### Very few data i check the in properties of episodes

In [7]:
queryString = """
SELECT distinct  ?p  ?pname WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P527 ?episode.
?sub ?p ?episode.

# this returns the labels
?p <http://schema.org/name> ?pname.
} 

"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pname', 'follows')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pname', 'followed by')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pname', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('pname', 'has part')]
[('p', 'http://www.wikidata.org/prop/direct/P921'), ('pname', 'main subject')]


6

#### I cannot find the cast for each episode so it is not possible to count, i just return the main cast of the serie

In [None]:
queryString = """
SELECT distinct  ?actorname WHERE { 

wd:Q23831 wdt:P161 ?actor .

# this returns the labels
?actor <http://schema.org/name> ?actorname .

} 

"""

print("Results")
run_query(queryString)

## Task 2 (HIMYM)

#### I reuse a query of the task 1 to get all the cast members for each episode of the serie

In [15]:
queryString = """
SELECT distinct ?actorname WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P161 ?actor.

?actor <http://schema.org/name> ?actorname.
} limit 10

"""
print("Results")
run_query(queryString)

Results
[('actorname', 'Annie Ilonzeh')]
[('actorname', 'Julianna Guill')]
[('actorname', 'Nikki Griffin')]
[('actorname', 'John Cho')]
[('actorname', 'Virginia Williams')]
[('actorname', 'Kate Micucci')]
[('actorname', 'Cobie Smulders')]
[('actorname', 'Ashley Williams')]
[('actorname', 'Harvey Fierstein')]
[('actorname', 'Robbie Amell')]


10

#### Now i remove the "distinct" to count each appearance of the actor in the episodes of the series

In [17]:
queryString = """
SELECT ?actor ?actorname (COUNT(*) as ?appearance) WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P161 ?actor.

?actor <http://schema.org/name> ?actorname.
} ORDER BY DESC (?appearance)

"""
print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorname', 'Neil Patrick Harris'), ('appearance', '145')]
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorname', 'Cobie Smulders'), ('appearance', '145')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorname', 'Jason Segel'), ('appearance', '145')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorname', 'Josh Radnor'), ('appearance', '145')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorname', 'Alyson Hannigan'), ('appearance', '143')]
[('actor', 'http://www.wikidata.org/entity/Q333544'), ('actorname', 'Bob Saget'), ('appearance', '142')]
[('actor', 'http://www.wikidata.org/entity/Q229914'), ('actorname', 'Lyndsy Fonseca'), ('appearance', '48')]
[('actor', 'http://www.wikidata.org/entity/Q297128'), ('actorname', 'David Henrie'), ('appearance', '48')]
[('actor', 'http://www.wikidata.org/entity/Q16149506'), ('actorname', 'Charlene Amoia'), ('appearance', '17')]
[('actor',

214

#### We saw that the cast is not reported in all episodes of the series (as they are around 200). However we get that the recurrent actors are those 5 who belongs to the main cast

## Task 3 - Actors that worked on movies while on HIMYM and after

#### Get All properties of actors

In [2]:
queryString = """
SELECT distinct ?p ?pname WHERE { 

wd:Q147235 wdt:P161 ?actor.
?actor ?p ?obj.

# this returns the labels
?p <http://schema.org/name> ?pname.
} 

"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P7837'), ('pname', 'Nederlands Fotomuseum photographer ID')]
[('p', 'http://www.wikidata.org/prop/direct/P7836'), ('pname', 'Livelib.ru person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1005'), ('pname', 'Portuguese National Library ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('pname', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1015'), ('pname', 'NORAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('p', 'http://www.wikidata.org/prop/direct/P103'), ('pname', 'native language')]
[('p', 'http://www.wikidata.org/prop/direct/P1038'), ('pname', 'relative')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P109'), ('pname', 'signature')]
[('p', 'http://www.wikidata.org/prop/di

606

#### Trying a subcase with only an actor (Josh Radnor wd:Q223455 main protagonist)

In [14]:
queryString = """
SELECT distinct ?p ?pname WHERE { 

wd:Q223455 ?p ?actor.

?p <http://schema.org/name> ?pname .
} 

"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('pname', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P1220'), ('pname', 'Internet Broadway Database person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1263'), ('pname', 'NNDB people ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1266'), ('pname', 'AlloCiné person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pname', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P140'), ('pname', 'religion')]
[('p', 'http://www.wikidata.org/prop/direct/P1695'), ('pname', 'NLP ID (unique)')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pname', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P19'), ('pname', 'place of birth')]
[('p', 'http://www.wikidata.org/prop/direct/P1969'),

53

In [23]:
queryString = """
SELECT distinct ?p ?pname WHERE { 

?sub ?p wd:Q223455.

?p <http://schema.org/name> ?pname .
} 

"""
print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter')]
[('p', 'http://www.wikidata.org/prop/direct/P971'), ('pname', 'category combines topics')]


5

#### Test: trying to get all actor that played a role in HIMYM in year 2008

#### Get the release date of and episode with "publication date"(wdt:P577)

In [28]:
# Publication Date P577

queryString = """
SELECT distinct ?date
WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P577 ?date   

} 
limit 1
"""

print("Results")
run_query(queryString)

Results
[('date', '2011-02-14T00:00:00Z')]


1

#### For exmple i get episodes of the series released in 2008

In [18]:
# Publication Date P577
# Cast member P161
queryString = """
SELECT distinct ?name
WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P577 ?date   

filter (SUBSTR(xsd:string(?date),1,4)="2008")

# this returns the labels
?sub <http://schema.org/name> ?name .
} 

"""

print("Results")
run_query(queryString)

Results
[('name', 'Do I Know You?')]
[('name', 'I Heart NJ')]
[('name', 'The Best Burger in New York')]
[('name', 'Little Minnesota')]
[('name', 'Happily Ever After')]
[('name', "Not a Father's Day")]
[('name', 'Woooo!')]
[('name', 'Intervention')]
[('name', 'Shelter Island')]
[('name', 'The Fight')]
[('name', 'The Naked Man')]
[('name', 'Everything Must Go')]
[('name', 'Miracles')]
[('name', 'No Tomorrow')]
[('name', 'Ten Sessions')]
[('name', 'The Bracket')]
[('name', 'The Chain of Screaming')]
[('name', 'Sandcastles in the Sand')]
[('name', 'Rebound Bro')]
[('name', 'The Goat')]


20

#### ....Get the actors

In [45]:
# Publication Date P577
# Cast member P161
queryString = """
SELECT distinct ?name
WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P577 ?date;
     wdt:P161 ?actor

filter (SUBSTR(xsd:string(?date),1,4)="2008")

# this returns the labels
?actor <http://schema.org/name> ?name .
} 
limit 5
"""

print("Results")
run_query(queryString)

Results
[('name', 'Virginia Williams')]
[('name', 'Cobie Smulders')]
[('name', 'Jamie-Lynn Sigler')]
[('name', 'Britney Spears')]
[('name', 'Sarah Chalke')]


5

#### Now i need to know the class of movies, i inspect the productions where actress "Cobie Smulders"(wd:Q200566) played a role

In [19]:
# Cast member P161
queryString = """
SELECT DISTINCT ?prod ?prodname WHERE{

?prod wdt:P161 wd:Q200566.
?prod <http://schema.org/name> ?prodname .
}
"""
print("Results")
run_query(queryString)


Results
[('prod', 'http://www.wikidata.org/entity/Q65070140'), ('prodname', 'Stumptown')]
[('prod', 'http://www.wikidata.org/entity/Q51963292'), ('prodname', 'Marvel Cinematic Universe Phase One')]
[('prod', 'http://www.wikidata.org/entity/Q63405798'), ('prodname', 'The Infinity Saga')]
[('prod', 'http://www.wikidata.org/entity/Q147235'), ('prodname', 'How I Met Your Mother')]
[('prod', 'http://www.wikidata.org/entity/Q5264968'), ('prodname', 'Desperation Day')]
[('prod', 'http://www.wikidata.org/entity/Q5521981'), ('prodname', 'Garbage Island')]
[('prod', 'http://www.wikidata.org/entity/Q7080458'), ('prodname', 'Oh Honey')]
[('prod', 'http://www.wikidata.org/entity/Q5888848'), ('prodname', 'Home Wreckers')]
[('prod', 'http://www.wikidata.org/entity/Q7858076'), ('prodname', 'Twin Beds')]
[('prod', 'http://www.wikidata.org/entity/Q8074081'), ('prodname', 'Zoo or False')]
[('prod', 'http://www.wikidata.org/entity/Q17113214'), ('prodname', 'They Came Together')]
[('prod', 'http://www.wiki

173

#### I inspect "The Avengers" that i know is a movie. I print all "instance of"

In [21]:
# Cast member P161
queryString = """
SELECT DISTINCT ?obj ?objname WHERE{

wd:Q182218 wdt:P31 ?obj.
?obj <http://schema.org/name> ?objname .
}
"""
print("Results")
run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q11424'), ('objname', 'film')]
[('obj', 'http://www.wikidata.org/entity/Q229390'), ('objname', '3D film')]


2

#### I count all entities that are instance of "film" to check if we have data

In [23]:
# Cast member P161
queryString = """
SELECT DISTINCT (COUNT(?sub) AS ?number) WHERE{

?sub wdt:P31 wd:Q11424.

}
"""
print("Results")
run_query(queryString)

Results
[('number', '263260')]


1

#### Good amount of movies

#### Next step: retrieve and count all movies on which actors worked on in the year they played a role in HIMYM

In [25]:
# Publication Date P577
# Cast member P161
#I use instance of "film" to pick movies (P31)

queryString = """
SELECT distinct ?name (count(distinct ?movie) as ?number) WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P577 ?date;
     wdt:P161 ?actor.

?movie wdt:P31 wd:Q11424; 
       wdt:P577 ?date2;
       wdt:P161 ?actor.
filter (SUBSTR(xsd:string(?date),1,4) = SUBSTR(xsd:string(?date2),1,4) ) #Substring to compare only the year


?actor <http://schema.org/name> ?name .
?movie <http://schema.org/name> ?moviename .
} 
group by(?name)
order by desc (?number)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('name', 'Jason Segel'), ('number', '11')]
[('name', 'Neil Patrick Harris'), ('number', '10')]
[('name', 'Danny Glover'), ('number', '7')]
[('name', 'Frances Conroy'), ('number', '7')]
[('name', 'Marshall Manesh'), ('number', '6')]
[('name', 'Lyndsy Fonseca'), ('number', '5')]
[('name', 'Cobie Smulders'), ('number', '5')]
[('name', 'Bill Fagerbakke'), ('number', '5')]
[('name', 'Judy Greer'), ('number', '5')]
[('name', 'John Cho'), ('number', '4')]


10

#### Second Part of the task: get movies after end of series

In [6]:
#I use the "end time property of the serie found at the beginning. Then i filter only movies made after the serie's end

# endtime P582
# Publication Date P577
# Cast member P161


queryString = """

SELECT DISTINCT ?actor ?name (COUNT(DISTINCT ?movie) AS ?number) WHERE{

{
#Query 1:Get All Pairs Actor-Movie
SELECT DISTINCT ?actor ?movie ?date2 WHERE { 

    ?sub wdt:P179 wd:Q147235;
         wdt:P161 ?actor.

    ?movie wdt:P31 wd:Q11424; 
         wdt:P577 ?date2;
         wdt:P161 ?actor.

    } 
}
{
#Query 2: Get end date of the serie
SELECT DISTINCT ?date  WHERE {
    wd:Q147235 wdt:P582 ?date.
    }
}

?actor <http://schema.org/name> ?name .

FILTER(?date2 > ?date)

}

GROUP BY ?actor ?name
ORDER BY DESC (?number)
LIMIT 10

"""

print("Results")
run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('name', 'Danny Glover'), ('number', '14')]
[('actor', 'http://www.wikidata.org/entity/Q236189'), ('name', 'Judy Greer'), ('number', '12')]
[('actor', 'http://www.wikidata.org/entity/Q566037'), ('name', 'Scoot McNairy'), ('number', '12')]
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('name', 'Cobie Smulders'), ('number', '11')]
[('actor', 'http://www.wikidata.org/entity/Q311271'), ('name', 'John Lithgow'), ('number', '11')]
[('actor', 'http://www.wikidata.org/entity/Q312705'), ('name', 'John Cho'), ('number', '8')]
[('actor', 'http://www.wikidata.org/entity/Q231091'), ('name', 'Morena Baccarin'), ('number', '7')]
[('actor', 'http://www.wikidata.org/entity/Q310322'), ('name', 'Will Sasso'), ('number', '7')]
[('actor', 'http://www.wikidata.org/entity/Q1319744'), ('name', 'Will Forte'), ('number', '7')]
[('actor', 'http://www.wikidata.org/entity/Q362616'), ('name', 'Jon Bernthal'), ('number', '7')]


10

## Task 4 - Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members 

#### I report the two queries made in task 1 to get the number of episode of each seasons of the two series

In [13]:
queryString = """
SELECT distinct ?season ?number
WHERE { 

wd:Q147235 wdt:P527 ?s.
?s wdt:P1113 ?number.

?s <http://schema.org/name> ?season .
} 

"""
print("Results")
run_query(queryString)

Results
[('season', 'How I Met Your Mother, season 6'), ('number', '24')]
[('season', 'How I Met Your Mother, season 1'), ('number', '22')]
[('season', 'How I Met Your Mother, season 9'), ('number', '24')]
[('season', 'How I Met Your Mother, season 4'), ('number', '24')]
[('season', 'How I Met Your Mother, season 8'), ('number', '24')]
[('season', 'How I Met Your Mother, season 5'), ('number', '24')]
[('season', 'How I Met Your Mother, season 2'), ('number', '22')]
[('season', 'How I Met Your Mother, season 3'), ('number', '20')]
[('season', 'How I Met Your Mother, season 7'), ('number', '24')]


9

#### Same query for The office

In [14]:
queryString = """
SELECT distinct  ?seasonname ?number
WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P1113 ?number.

?season <http://schema.org/name> ?seasonname.
} 

"""
print("Results")
run_query(queryString)

Results
[('seasonname', 'The Office, season 1'), ('number', '6')]
[('seasonname', 'The Office, season 2'), ('number', '22')]
[('seasonname', 'The Office, season 3'), ('number', '25')]
[('seasonname', 'The Office, season 4'), ('number', '19')]
[('seasonname', 'The Office, season 5'), ('number', '28')]
[('seasonname', 'The Office, season 9'), ('number', '25')]
[('seasonname', 'The Office, season 8'), ('number', '24')]
[('seasonname', 'The Office, season 6'), ('number', '26')]
[('seasonname', 'The Office, season 7'), ('number', '26')]


9

#### Pretty obvious that both series have the same number of season, now i want to ask if "HIMYM" has more episodes than "The Office". I combine the two queries in an ASK one

In [27]:
queryString = """
ASK WHERE{

{
SELECT distinct (SUM(?number) as ?himym) WHERE { 

wd:Q147235 wdt:P527 ?s.
?s wdt:P1113 ?number.
} 
}
{

SELECT distinct (SUM(?number) as ?theoffice) WHERE { 

wd:Q23831 wdt:P527 ?season.
?season wdt:P1113 ?number.
} 
}

FILTER(?himym>?theoffice)
}
"""
print("Results")
run_ask_query(queryString)

Results


{'head': {'link': []}, 'boolean': True}

#### So HIMYM has more episoed than The office

#### Let's see if some actor played a role in both Series

In [15]:
queryString = """
SELECT distinct ?actorname WHERE { 

wd:Q23831 wdt:P161 ?actor.

?sub wdt:P179 wd:Q147235;
     wdt:P161 ?actor.
     
?actor <http://schema.org/name> ?actorname .

} 

"""

print("Results")
run_query(queryString)

Results
Empty


0

#### Let's count HIMYM cast members

In [18]:
queryString = """
SELECT distinct (count( distinct ?actor) as ?number)
WHERE { 

?sub wdt:P179 wd:Q147235;
     wdt:P161 ?actor.
} 
"""
print("Results")
run_query(queryString)

Results
[('number', '214')]


1

#### And the office...I remember that for the office we don't have the cast for each episode but only the main cast

In [20]:
queryString= """
SELECT distinct (count (distinct ?actor) as ?number) WHERE { 

wd:Q23831 wdt:P161 ?actor.
}
"""
print("Results")
run_query(queryString)


Results
[('number', '25')]


1

## Task 5 -Return how many of the actors who are members of the cast of the tv series have Kavin Bacon number equal to 2

#### I retrieve Kevin Bacon entity...I start to search for an occupation like actor or something by inspecting the occupations of one actor that i found, Neil Patrick Harris (wd:Q485310). I use the found property "occupation" (wdt:P106)

In [29]:
queryString= """
SELECT distinct ?occupation ?occupationname WHERE { 

wd:Q485310 wdt:P106 ?occupation.

?occupation <http://schema.org/name> ?occupationname.
}
"""
print("Results")
run_query(queryString)


Results
[('occupation', 'http://www.wikidata.org/entity/Q10798782'), ('occupationname', 'television actor')]
[('occupation', 'http://www.wikidata.org/entity/Q10800557'), ('occupationname', 'film actor')]
[('occupation', 'http://www.wikidata.org/entity/Q177220'), ('occupationname', 'singer')]
[('occupation', 'http://www.wikidata.org/entity/Q18814623'), ('occupationname', 'autobiographer')]
[('occupation', 'http://www.wikidata.org/entity/Q2259451'), ('occupationname', 'stage actor')]
[('occupation', 'http://www.wikidata.org/entity/Q2405480'), ('occupationname', 'voice actor')]
[('occupation', 'http://www.wikidata.org/entity/Q2526255'), ('occupationname', 'film director')]
[('occupation', 'http://www.wikidata.org/entity/Q28389'), ('occupationname', 'screenwriter')]
[('occupation', 'http://www.wikidata.org/entity/Q3282637'), ('occupationname', 'film producer')]
[('occupation', 'http://www.wikidata.org/entity/Q3387717'), ('occupationname', 'theatrical director')]
[('occupation', 'http://www

17

#### Found actor(wd:Q33999) now i try to regex Kevin Bacon among all actors

In [30]:
queryString= """
SELECT distinct ?entity ?name WHERE { 

?entity wdt:P106 wd:Q33999.

?entity <http://schema.org/name> ?name.
FILTER CONTAINS (?name,"Bacon")
}
"""
print("Results")
run_query(queryString)

Results
[('entity', 'http://www.wikidata.org/entity/Q16031668'), ('name', 'Frank Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q74864775'), ('name', 'Bessie Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q3116093'), ('name', 'Irving Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q24844419'), ('name', 'Tim Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q3454165'), ('name', 'Kevin Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q3491343'), ('name', 'Sosie Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q706678'), ('name', 'Lloyd Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q3102228'), ('name', 'Georges Baconnet')]
[('entity', 'http://www.wikidata.org/entity/Q3992438'), ('name', 'Tom Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q65116263'), ('name', 'Marco Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q5216474'), ('name', 'Daniel Bacon')]
[('entity', 'http://www.wikidata.org/entity/Q6794548'), ('name', 'Max Bacon')]
[('entity', 'ht

21

#### Get all productions where Kevin Bacon played a role (wd:Q3454165)

In [None]:
queryString = """
SELECT distinct ?prod WHERE { 

?sub wdt:P161 wd:Q3454165.

?sub <http://schema.org/name> ?prod.
} 

"""

print("Results")
run_query(queryString)

#### Get all the actors that played a role in a production with Kevin Bacon

In [None]:
queryString = """
SELECT distinct  ?name WHERE { 

?sub wdt:P161 wd:Q3454165;
     wdt:P161 ?actor.

?actor <http://schema.org/name> ?name.
} 
"""

print("Results")
run_query(queryString)

#### Get the actors of the series that played a role in a production with someone that played a role with Kevin Bacon (Kevin Bacon numer=2)

In [4]:
queryString = """
SELECT DISTINCT ?name WHERE { 

#Query 1: HIMYM Cast
{ 
?prod wdt:P161 wd:Q3454165;
      wdt:P161 ?actor.

?ep wdt:P179 wd:Q147235;
     wdt:P161 ?actor2.

?both wdt:P161 ?actor;
      wdt:P161 ?actor2.

FILTER NOT EXISTS {?prod wdt:P161 ?actor2.}
} 

UNION
{
wd:Q23831 wdt:P161 ?actor2.

?prod wdt:P161 wd:Q3454165;
     wdt:P161 ?actor.
     
?both wdt:P161 ?actor;
      wdt:P161 ?actor2.

FILTER NOT EXISTS {?prod wdt:P161 ?actor2}
}

# this returns the labels
?actor2 <http://schema.org/name> ?name.
}
"""

print("Results")
run_query(queryString)

Results
[('name', 'John Cho')]
[('name', 'Virginia Williams')]
[('name', 'Kate Micucci')]
[('name', 'Eva Amurri')]
[('name', 'Elizabeth Bogush')]
[('name', 'Jayma Mays')]
[('name', 'George Cheung')]
[('name', 'Darcy Rose Byrnes')]
[('name', 'Will Sasso')]
[('name', 'Scoot McNairy')]
[('name', 'Alyssa Shafer')]
[('name', 'Laura Prepon')]
[('name', 'Jennifer Morrison')]
[('name', 'Kal Penn')]
[('name', 'Jason Lewis')]
[('name', 'Jesse Heiman')]
[('name', 'Charlene Amoia')]
[('name', 'Judy Greer')]
[('name', 'Jennifer Grey')]
[('name', 'Rachelle Lefevre')]
[('name', 'America Olivo')]
[('name', 'Michael York')]
[('name', 'Emily Baldoni')]
[('name', 'Valerie Azlynn')]
[('name', 'Joe Manganiello')]
[('name', 'Chi McBride')]
[('name', 'Ed Brigadier')]
[('name', 'Anne Dudek')]
[('name', 'Kevin Christy')]
[('name', 'Connie Sawyer')]
[('name', 'Lyndsy Fonseca')]
[('name', 'Nate Torrence')]
[('name', 'Italia Ricci')]
[('name', 'Erin Cahill')]
[('name', 'David Henrie')]
[('name', 'Charles Robinson

237

#### We see that almost everybody (only two left out) have a score of 2