# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-9214f28053-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Tv series HIMYM explorative search") 


Consider the following exploratory scenario:


> we are interested in the TV series "How I met your mother" and we want to investigate the main aspects related to the actors and directors involved in the production, know the numerber of seasons and check what are the episodes which got the higher success/impact.


## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P4969`    | derivative work      | predicate |
| `wd:Q147235` | How I met your mother        | node |
| `wd:Q23831` | The Office (US)        | node |



Also consider

```
wd:Q23831 ?p ?obj .
```

is the BGP to retrieve all **properties of The Office (US)**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should

1. Identify the BGP for tv series

2. Return the number of seasons and episodes per season of the tv series (the result set must be triples of season IRI, label and #episodes).

3. Get the number of episodes in which the cast members played a role. Who are the most present actors? (the result set must be a list of triples actor/actress IRI, label and #episodes)

4. Check who is the actor who acted in more films while working on "How I met your mother" (the result set must be a list of triples actor/actress IRI, label and #films).

5. Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members (the result set must be two elements -one for each tv series- of tv series IRI, label, #seasons, #episodes and #cast members).

6. Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 (the result set must be a list of couples actor/actress IRI and label).

7. Consider the actors who are members of the cast of HIMYM. Amongst the tv series which these actors acted return only those which received more than 2 awards (the result set must be triples of tv series IRI, label, #awards won).

## Task 1
Identify the BGP for tv series

In [12]:
# query example
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # bind something
   wd:Q147235 ?p ?obj . #HIMYM
   # get the label
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2438'), ('name', 'narrator')]
[('p', 'http://www.wikidata.org/prop/direct/P1113'), ('name', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('name', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1267'), ('name', 'AlloCiné series ID')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('name', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('name', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('name', 'nominated for')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('name', "topic's main template")]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('name', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('name', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('name', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('name', 'cast member')]
[('p', 'http://www.wikidata.org/prop/

In [13]:
#Previous query results don't look very promising
#Intuitively, we want to see what "HIMYM" is instance of 

queryString = """
SELECT DISTINCT ?type ?Tname
WHERE {
   wd:Q147235 wdt:P31 ?type . #HIMYM a
   
   ?type sc:name ?Tname.
}
"""

print("Results")
x=run_query(queryString)

Results
[('type', 'http://www.wikidata.org/entity/Q5398426'), ('Tname', 'television series')]
1


Final query for this task

In [14]:
# Final query

queryString = """
SELECT DISTINCT ?type ?Tname
WHERE {
   wd:Q147235 wdt:P31 ?type . #HIMYM a
   
   ?type sc:name ?Tname.
}
"""

print("Results")
x=run_query(queryString)

Results
[('type', 'http://www.wikidata.org/entity/Q5398426'), ('Tname', 'television series')]
1


## Task 2
Return the number of seasons and episodes per season of the tv series (the result set must be triples of season IRI, label and #episodes).

In [15]:
# Let's explore properties/objects related to HIMYM but filtering with the keywords episod and season

queryString = """
SELECT DISTINCT ?p ?Pname ?obj ?Oname 
WHERE {
   wd:Q147235 ?p ?obj . #HIMYM
   
   ?p sc:name ?Pname.
   ?obj sc:name ?Oname
   FILTER(REGEX(?Oname,"(episod|season)"))
}
LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1811'), ('Pname', 'list of episodes'), ('obj', 'http://www.wikidata.org/entity/Q785891'), ('Oname', 'list of How I Met Your Mother episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P2670'), ('Pname', 'has parts of the class'), ('obj', 'http://www.wikidata.org/entity/Q21664088'), ('Oname', 'two-part episode')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('Pname', 'has part'), ('obj', 'http://www.wikidata.org/entity/Q3468515'), ('Oname', 'How I Met Your Mother, season 2')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('Pname', 'has part'), ('obj', 'http://www.wikidata.org/entity/Q13567027'), ('Oname', 'How I Met Your Mother, season 9')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('Pname', 'has part'), ('obj', 'http://www.wikidata.org/entity/Q2472427'), ('Oname', 'How I Met Your Mother, season 7')]
[('p', 'http://www.wikidata.org/prop/direct/P527'), ('Pname', 'has part'), ('obj', 'http://www.wikidata.org/en

In [18]:
#We have seen from previous query that the property hasPart links the Tv Serie and its seasons, let's see if it works the same for episodes

queryString = """
SELECT DISTINCT ?objTypeN ?obj ?Oname 
WHERE {
   wd:Q2715578 wdt:P527 ?obj . #HIMYMSeason1 hasPart
   ?obj wdt:P31 ?objType.      #a
   
   ?objType sc:name ?objTypeN.
   ?obj sc:name ?Oname
}
LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q11696021'), ('Oname', 'Nothing Good Happens After 2 A.M.')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q1327587'), ('Oname', 'Okay Awesome')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q3480575'), ('Oname', 'Return of the Shirt')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q467447'), ('Oname', 'Pilot')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q468587'), ('Oname', 'Purple Giraffe')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q471448'), ('Oname', 'Sweet Taste of Liberty')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q4809956'), ('Oname', 'Game Night')]
[('objTypeN', 'television series episode'), ('obj', 'http://www.wikidata.org/entity/Q4817584'), ('Onam

In [19]:
#Episodes also use the hasPart property, we have all the information needed to build the query :

queryString = """
SELECT DISTINCT ?season ?seasonName (count(distinct ?episode) as ?count) 
WHERE {
   wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
   ?season wdt:P527 ?episode .   #hasPart
   
   ?season sc:name ?seasonName.
}
GROUP BY ?season ?seasonName
ORDER BY ?seasonName
LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('season', 'http://www.wikidata.org/entity/Q2715578'), ('seasonName', 'How I Met Your Mother, season 1'), ('count', '22')]
[('season', 'http://www.wikidata.org/entity/Q3468515'), ('seasonName', 'How I Met Your Mother, season 2'), ('count', '22')]
[('season', 'http://www.wikidata.org/entity/Q2555117'), ('seasonName', 'How I Met Your Mother, season 3'), ('count', '20')]
[('season', 'http://www.wikidata.org/entity/Q2567330'), ('seasonName', 'How I Met Your Mother, season 4'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q582332'), ('seasonName', 'How I Met Your Mother, season 5'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q2438066'), ('seasonName', 'How I Met Your Mother, season 6'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q2472427'), ('seasonName', 'How I Met Your Mother, season 7'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q338715'), ('seasonName', 'How I Met Your Mother, season 8'), ('count', '24')]
[(

Final query for this task

In [20]:
# Final Query

queryString = """
SELECT DISTINCT ?season ?seasonName (count(distinct ?episode) as ?count) 
WHERE {
   wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
   ?season wdt:P527 ?episode . #hasPart
   
   ?season sc:name ?seasonName.
}
GROUP BY ?season ?seasonName
ORDER BY ?seasonName
LIMIT 30
"""

print("Results")
x=run_query(queryString)

#Total of 208

Results
[('season', 'http://www.wikidata.org/entity/Q2715578'), ('seasonName', 'How I Met Your Mother, season 1'), ('count', '22')]
[('season', 'http://www.wikidata.org/entity/Q3468515'), ('seasonName', 'How I Met Your Mother, season 2'), ('count', '22')]
[('season', 'http://www.wikidata.org/entity/Q2555117'), ('seasonName', 'How I Met Your Mother, season 3'), ('count', '20')]
[('season', 'http://www.wikidata.org/entity/Q2567330'), ('seasonName', 'How I Met Your Mother, season 4'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q582332'), ('seasonName', 'How I Met Your Mother, season 5'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q2438066'), ('seasonName', 'How I Met Your Mother, season 6'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q2472427'), ('seasonName', 'How I Met Your Mother, season 7'), ('count', '24')]
[('season', 'http://www.wikidata.org/entity/Q338715'), ('seasonName', 'How I Met Your Mother, season 8'), ('count', '24')]
[(

## Task 3
Get the number of episodes in which the cast members played a role. Who are the most present actors? (the result set must be a list of triples actor/actress IRI, label and #episodes)

In [22]:
# We got from the very first query the bgp of cast member: [('p', 'http://www.wikidata.org/prop/direct/P161'), ('name', 'cast member')]
# Let's see the castmember professions (URI given : wdt:P106) relted to acting

queryString = """
SELECT DISTINCT ?profession ?Pname
WHERE {
   wd:Q147235 wdt:P161 ?castMember . #HIMYM castMember
   ?castMember wdt:P106 ?profession. #profession
   
   ?profession sc:name ?Pname.
   FILTER(REGEX(?Pname,"actor"))
}
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('profession', 'http://www.wikidata.org/entity/Q10798782'), ('Pname', 'television actor')]
[('profession', 'http://www.wikidata.org/entity/Q10800557'), ('Pname', 'film actor')]
[('profession', 'http://www.wikidata.org/entity/Q2259451'), ('Pname', 'stage actor')]
[('profession', 'http://www.wikidata.org/entity/Q2405480'), ('Pname', 'voice actor')]
[('profession', 'http://www.wikidata.org/entity/Q33999'), ('Pname', 'actor')]
[('profession', 'http://www.wikidata.org/entity/Q11481802'), ('Pname', 'dub actor')]
[('profession', 'http://www.wikidata.org/entity/Q488111'), ('Pname', 'pornographic actor')]
[('profession', 'http://www.wikidata.org/entity/Q948329'), ('Pname', 'character actor')]
[('profession', 'http://www.wikidata.org/entity/Q970153'), ('Pname', 'child actor')]
[('profession', 'http://www.wikidata.org/entity/Q1954956'), ('Pname', 'musical theatre actor')]
10


In [4]:
#we take randomly Q1327587, an episode from a previous query

# query example
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # bind something
   wd:Q1327587 ?p ?obj . #HIMYMRandomEpisode
   # get the label
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('name', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('name', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('name', 'follows')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('name', 'followed by')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('name', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('name', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('name', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P2364'), ('name', 'production code')]
[('p', 'http://www.wikidata.org/prop/direct/P2529'), ('name', 'ČSFD film ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2638'), ('name', 'TV.com ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2704'), ('name', 'EIDR content ID')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P

In [6]:
#Let's see what an episode is instance of

queryString = """
SELECT DISTINCT ?epTypeName ?epType ?p ?pName
WHERE {
   
   wd:Q1327587 wdt:P31 ?epType . #HIMYMRandomEpisode
   ?epType ?p ?obj.
   
   ?epType sc:name ?epTypeName.
   ?p sc:name ?pName.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('epTypeName', 'television series episode'), ('epType', 'http://www.wikidata.org/entity/Q21191270'), ('p', 'http://www.wikidata.org/prop/direct/P1424'), ('pName', "topic's main template")]
[('epTypeName', 'television series episode'), ('epType', 'http://www.wikidata.org/entity/Q21191270'), ('p', 'http://www.wikidata.org/prop/direct/P1552'), ('pName', 'has quality')]
[('epTypeName', 'television series episode'), ('epType', 'http://www.wikidata.org/entity/Q21191270'), ('p', 'http://www.wikidata.org/prop/direct/P1709'), ('pName', 'equivalent class')]
[('epTypeName', 'television series episode'), ('epType', 'http://www.wikidata.org/entity/Q21191270'), ('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from')]
[('epTypeName', 'television series episode'), ('epType', 'http://www.wikidata.org/entity/Q21191270'), ('p', 'http://www.wikidata.org/prop/direct/P1963'), ('pName', 'properties for this type')]
[('epTypeName', 'television series episode'), ('epType', 'htt

In [18]:
#Let's explore tv series episode properties

queryString = """
SELECT DISTINCT ?p ?pName
WHERE {
   
   ?ep wdt:P31 wd:Q21191270 . #TVepisode
   ?ep ?p ?obj.
   
   ?p sc:name ?pName.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

#We have now the confirmation that tv series ep has also cast member

Results
[('p', 'http://www.wikidata.org/prop/direct/P1258'), ('pName', 'Rotten Tomatoes ID')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1476'), ('pName', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows')]
[('p', 'http://www.wikidata.org/prop/direct/P156'), ('pName', 'followed by')]
[('p', 'http://www.wikidata.org/prop/direct/P1562'), ('pName', 'AllMovie title ID')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pName', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P179'), ('pName', 'part of the series')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P2061'), ('pName', 'aspect ratio')]
[('p', 'http://www.wikidata.org/prop/direct/P2364'), ('pName', 'production code')]
[('p', 'http://www.wikidata.org/prop/direct/P2515'), ('pName', 'costume designer')]
[('p', 'http://www.wikidat

In [8]:
#Tv series episode have the property cast member
#We can now build the answer :

queryString = """
SELECT DISTINCT ?actor ?actorName (count(distinct ?episode) as ?count)
WHERE {
   
   wd:Q147235 wdt:P161 ?actor; #HIMYM castmember
             wdt:P527 ?season . #HIMYM hasPart
   ?season wdt:P527 ?episode . #hasPart
   
   ?episode wdt:P161 ?actor.
          
   ?actor sc:name ?actorName.
}
GROUP BY ?actor ?actorName
LIMIT 50
"""

print("Results")
x=run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q234514'), ('actorName', 'Camryn Manheim'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q1255263'), ('actorName', 'Jason Jones'), ('count', '4')]
[('actor', 'http://www.wikidata.org/entity/Q488335'), ('actorName', 'Kevin Michael Richardson'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q27322'), ('actorName', 'Jason Lewis'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q469914'), ('actorName', 'Lindsay Sloane'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q1189095'), ('actorName', 'Courtney Ford'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q558412'), ('actorName', 'Rick Malambri'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q470005'), ('actorName', 'Joe Manganiello'), ('count', '5')]
[('actor', 'http://www.wikidata.org/entity/Q298777'), ('actorName', 'Michael York'), ('count', '1')]
[('actor', 'http://www.wikidata.org/entity/Q3306283'), ('actorN

In [12]:
#We add an order by to see the most present ones:

queryString = """
SELECT DISTINCT ?actor ?actorName (count(distinct ?episode) as ?count)
WHERE {
   
   wd:Q147235 wdt:P161 ?actor; #TheOffice castmember
             wdt:P527 ?season . #TheOffice hasPart
   ?season wdt:P527 ?episode . #hasPart
   
   ?episode wdt:P161 ?actor.
          
   ?actor sc:name ?actorName.
}
GROUP BY ?actor ?actorName
ORDER BY DESC (?count)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorName', 'Josh Radnor'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorName', 'Alyson Hannigan'), ('count', '143')]
[('actor', 'http://www.wikidata.org/entity/Q297128'), ('actorName', 'David Henrie'), ('count', '48')]
[('actor', 'http://www.wikidata.org/entity/Q229914'), ('actorName', 'Lyndsy Fonseca'), ('count', '48')]
[('actor', 'http://www.wikidata.org/entity/Q16149506'), ('actorName', 'Charlene Amoia'), ('count', '17')]
[('actor', 'http://www.wikidata.org/entity/Q199929'), ('actorName', 'Jennifer Morrison'), ('count', '12')]
[('actor', 'http://www.wikidata.org/entity/Q5528

Final query for this task

In [13]:
# Final query

queryString = """
SELECT DISTINCT ?actor ?actorName (count(distinct ?episode) as ?count)
WHERE {
   
   wd:Q147235 wdt:P161 ?actor; #HIMYM castmember
             wdt:P527 ?season . #HIMYM hasPart
   ?season wdt:P527 ?episode . #hasPart
   
   ?episode wdt:P161 ?actor.
          
   ?actor sc:name ?actorName.
}
GROUP BY ?actor ?actorName
ORDER BY DESC (?count)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

#We have a leading top5 (who supposedly are the main protagonists)

Results
[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q223455'), ('actorName', 'Josh Radnor'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('count', '145')]
[('actor', 'http://www.wikidata.org/entity/Q199927'), ('actorName', 'Alyson Hannigan'), ('count', '143')]
[('actor', 'http://www.wikidata.org/entity/Q297128'), ('actorName', 'David Henrie'), ('count', '48')]
[('actor', 'http://www.wikidata.org/entity/Q229914'), ('actorName', 'Lyndsy Fonseca'), ('count', '48')]
[('actor', 'http://www.wikidata.org/entity/Q16149506'), ('actorName', 'Charlene Amoia'), ('count', '17')]
[('actor', 'http://www.wikidata.org/entity/Q199929'), ('actorName', 'Jennifer Morrison'), ('count', '12')]
[('actor', 'http://www.wikidata.org/entity/Q5528

## Task 4
Check who is the actor who acted in more films while working on "How I met your mother" (the result set must be a list of triples actor/actress IRI, label and #films).

In [14]:
#"While working on HIMYM" guides us to look for an "end of production date" of the tv show. We also need to find the bgp for films, let's start with it
#To find bgp of film, let's take one actor/actress randomly and get the type of things where he/she was cast member
#[('actor', 'http://www.wikidata.org/entity/Q200566'), ('actorName', 'Cobie Smulders'), ('count', '145')]


queryString = """
SELECT DISTINCT ?pieceType ?pieceTypeN
WHERE {
   
   ?piece wdt:P161 wd:Q200566; #castMember CobieSmulders
          wdt:P31 ?pieceType. #a
             
   ?pieceType sc:name ?pieceTypeN.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('pieceType', 'http://www.wikidata.org/entity/Q24856'), ('pieceTypeN', 'film series')]
[('pieceType', 'http://www.wikidata.org/entity/Q11424'), ('pieceTypeN', 'film')]
[('pieceType', 'http://www.wikidata.org/entity/Q21191270'), ('pieceTypeN', 'television series episode')]
[('pieceType', 'http://www.wikidata.org/entity/Q5398426'), ('pieceTypeN', 'television series')]
[('pieceType', 'http://www.wikidata.org/entity/Q229390'), ('pieceTypeN', '3D film')]
[('pieceType', 'http://www.wikidata.org/entity/Q374821'), ('pieceTypeN', 'film poster')]
6


In [17]:
#we have bgp of films and tv series here. Let's look for a "end of production"-like property

queryString = """
SELECT DISTINCT ?p ?pName
WHERE {
   
   ?tvserie wdt:P31 wd:Q5398426; #a tvserie
            ?p ?obj.
             
   ?p sc:name ?pName.
   FILTER(REGEX(?pName,"(date|time|schedule)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pName', 'publication date')]
[('p', 'http://www.wikidata.org/prop/direct/P580'), ('pName', 'start time')]
[('p', 'http://www.wikidata.org/prop/direct/P582'), ('pName', 'end time')]
[('p', 'http://www.wikidata.org/prop/direct/P585'), ('pName', 'point in time')]
[('p', 'http://www.wikidata.org/prop/direct/P1191'), ('pName', 'date of first performance')]
[('p', 'http://www.wikidata.org/prop/direct/P576'), ('pName', 'dissolved, abolished or demolished date')]
[('p', 'http://www.wikidata.org/prop/direct/P6949'), ('pName', 'announcement date')]
[('p', 'http://www.wikidata.org/prop/direct/P3999'), ('pName', 'date of official closure')]
[('p', 'http://www.wikidata.org/prop/direct/P6458'), ('pName', 'Mtime movie ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2754'), ('pName', 'production date')]
[('p', 'http://www.wikidata.org/prop/direct/P2348'), ('pName', 'time period')]
[('p', 'http://www.wikidata.org/prop/direct/P569'), ('pNa

In [25]:
#Same for episodes 

queryString = """
SELECT DISTINCT ?p ?pName
WHERE {
   
   ?tvserie wdt:P31 wd:Q21191270; #a tvseriedpisode
            ?p ?obj.
             
   ?p sc:name ?pName.
   FILTER(REGEX(?pName,"(date|time|schedule)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pName', 'publication date')]
[('p', 'http://www.wikidata.org/prop/direct/P1191'), ('pName', 'date of first performance')]
[('p', 'http://www.wikidata.org/prop/direct/P2913'), ('pName', 'date depicted')]
[('p', 'http://www.wikidata.org/prop/direct/P2754'), ('pName', 'production date')]
[('p', 'http://www.wikidata.org/prop/direct/P580'), ('pName', 'start time')]
[('p', 'http://www.wikidata.org/prop/direct/P582'), ('pName', 'end time')]
[('p', 'http://www.wikidata.org/prop/direct/P585'), ('pName', 'point in time')]
[('p', 'http://www.wikidata.org/prop/direct/P6949'), ('pName', 'announcement date')]
[('p', 'http://www.wikidata.org/prop/direct/P421'), ('pName', 'located in time zone')]
9


In [3]:
#The "production date" property should be the most interesting for us based on the question

queryString = """
SELECT DISTINCT ?proddate
WHERE {   
      wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
      ?season wdt:P527 ?episode   . #hasPart
      
      ?episode wdt:P31 wd:Q21191270;   #a TVserieEpisode
               wdt:P161 ?actor;        #castMember
               wdt:P2754 ?proddate     #productionDate
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
Empty


In [48]:
#production date retrieves empty, we can use publication date as an alternative

queryString = """
SELECT DISTINCT ?publidate
WHERE {   
      wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
      ?season wdt:P527 ?episode .   #hasPart
      
      ?episode wdt:P31 wd:Q21191270;  #a tvepisode
               wdt:P161 ?actor;       #castmember
               wdt:P577 ?publidate    #publidate
   
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('publidate', '2011-02-14T00:00:00Z')]
[('publidate', '2011-02-21T00:00:00Z')]
[('publidate', '2011-02-07T00:00:00Z')]
[('publidate', '2010-04-19T00:00:00Z')]
[('publidate', '2010-05-03T00:00:00Z')]
[('publidate', '2010-04-12T00:00:00Z')]
[('publidate', '2010-10-18T00:00:00Z')]
[('publidate', '2010-10-25T00:00:00Z')]
[('publidate', '2010-10-11T00:00:00Z')]
[('publidate', '2006-04-10T00:00:00Z')]
[('publidate', '2005-10-17T00:00:00Z')]
[('publidate', '2005-10-10T00:00:00Z')]
[('publidate', '2005-09-19T00:00:00Z')]
[('publidate', '2005-09-26T00:00:00Z')]
[('publidate', '2005-10-03T00:00:00Z')]
[('publidate', '2006-02-27T00:00:00Z')]
[('publidate', '2006-01-23T00:00:00Z')]
[('publidate', '2006-03-20T00:00:00Z')]
[('publidate', '2006-03-06T00:00:00Z')]
[('publidate', '2006-01-09T00:00:00Z')]
20


In [50]:
#We will have to get the most recent publication date of episodes for each actor: let's try with a subquery

queryString = """
SELECT ?publidate
WHERE {   
          wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
          ?season wdt:P527 ?episode . #hasPart
          ?episode wdt:P31 wd:Q21191270; #a tvepisode
                   wdt:P161 ?actor; #castmember
                   wdt:P577 ?publidate #productiondate
                   
         FILTER(?publidate=?maxdate && ?actor=?inneractor)
     {
         SELECT ?inneractor (MAX(?publidate2) as ?maxdate)
         WHERE {
              wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
              ?season wdt:P527 ?episode . #hasPart
              ?episode wdt:P31 wd:Q21191270; #a tvepisode
                       wdt:P161 ?inneractor; #castmember
                       wdt:P577 ?publidate2 #productiondate
     }
         GROUP BY (?inneractor)
     }
}
ORDER BY DESC (?publidate)
LIMIT 20
"""

print("Results")
x=run_query(queryString)



Results
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2014-03-31T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
[('publidate', '2011-10-24T00:00:00Z')]
20


In [55]:
#We can now add the publication date of films where actor played :

queryString = """
SELECT ?actor ?actorName (count(?film) as ?countfilms)
WHERE {   
          wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
          ?season wdt:P527 ?episode . #hasPart
          ?episode wdt:P31 wd:Q21191270; #a tvepisode
                   wdt:P161 ?actor; #castmember
                   wdt:P577 ?publidate #productiondate

         FILTER(?publidate=?maxdate && ?actor=?inneractor)
     {
         SELECT ?inneractor (MAX(?publidate2) as ?maxdate)
         WHERE {
              wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
              ?season wdt:P527 ?episode . #hasPart
              ?episode wdt:P31 wd:Q21191270; #a tvepisode
                       wdt:P161 ?inneractor; #castmember
                       wdt:P577 ?publidate2 #productiondate
     }
     GROUP BY (?inneractor)
     }
     
     ?film wdt:P31 wd:Q11424;        #a film
           wdt:P161 ?actor;          #castmember
           wdt:P577 ?publidatefilm.  #publiDate
           
     ?actor sc:name ?actorName.
     
     FILTER(?publidatefilm < ?publidate)
     
} 
GROUP BY ?actor ?actorName
ORDER BY DESC(?countfilms)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('actorName', 'Danny Glover'), ('countfilms', '94')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('countfilms', '84')]
[('actor', 'http://www.wikidata.org/entity/Q298777'), ('actorName', 'Michael York'), ('countfilms', '76')]
[('actor', 'http://www.wikidata.org/entity/Q311271'), ('actorName', 'John Lithgow'), ('countfilms', '71')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('countfilms', '70')]
[('actor', 'http://www.wikidata.org/entity/Q49004'), ('actorName', 'Patrick Swayze'), ('countfilms', '63')]
[('actor', 'http://www.wikidata.org/entity/Q3045427'), ('actorName', 'George Cheung'), ('countfilms', '58')]
[('actor', 'http://www.wikidata.org/entity/Q131332'), ('actorName', 'Amanda Peet'), ('countfilms', '57')]
[('actor', 'http://www.wikidata.org/entity/Q2636696'), ('actorName', 'Alan Fudge'), ('countfilms', '56')]
[('actor', 'http://www

In [4]:
#Let's try with the property prod date instead of publi (for the films)

queryString = """
SELECT ?actor ?actorName (count(?film) as ?countfilms)
WHERE {   
      wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
      ?season wdt:P527 ?episode . #hasPart
      ?episode wdt:P31 wd:Q21191270; #a tvepisode
               wdt:P161 ?actor; #castmember
               wdt:P577 ?publidate #productiondate
               
     FILTER(?publidate=?maxdate && ?actor=?inneractor)
     {
     SELECT ?inneractor (MAX(?publidate2) as ?maxdate)
     WHERE {
          wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
          ?season wdt:P527 ?episode . #hasPart
          ?episode wdt:P31 wd:Q21191270; #a tvepisode
                   wdt:P161 ?inneractor; #castmember
                   wdt:P577 ?publidate2 #productiondate
     }
     GROUP BY (?inneractor)
     }
     
     ?film wdt:P31 wd:Q11424;        #a film
           wdt:P161 ?actor;          #castmember
           wdt:P2754 ?publidatefilm. #prodDate 
           
     ?actor sc:name ?actorName.
     
     FILTER(?publidatefilm < ?publidate)
     
} 
GROUP BY ?actor ?actorName
ORDER BY DESC(?countfilms)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

#lack of prod date also for films, so we will keep the publi date for the answer

Results
Empty


Final query for this task

In [6]:
# Final query
#I use a subquery to get the latest date release of the episodes an actor played in, and filter with the release date of films he/she played in

queryString = """
SELECT ?actor ?actorName (count(?film) as ?countfilms)
WHERE {   
      wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
      ?season wdt:P527 ?episode . #hasPart
      ?episode wdt:P31 wd:Q21191270; #a tvepisode
               wdt:P161 ?actor; #castmember
               wdt:P577 ?publidate #productiondate
               
     FILTER(?publidate=?maxdate && ?actor=?inneractor)
     {
     SELECT ?inneractor (MAX(?publidate2) as ?maxdate)
     WHERE {
          wd:Q147235 wdt:P527 ?season . #HIMYM hasPart
          ?season wdt:P527 ?episode . #hasPart
          ?episode wdt:P31 wd:Q21191270; #a tvepisode
                   wdt:P161 ?inneractor; #castmember
                   wdt:P577 ?publidate2 #productiondate
     }
     GROUP BY (?inneractor)
     }
     
     ?film wdt:P31 wd:Q11424;       #a film
           wdt:P161 ?actor;         #castmember
           wdt:P577 ?publidatefilm. #publidate
           
     ?actor sc:name ?actorName.
     
     FILTER(?publidatefilm < ?publidate)
     
} 
GROUP BY ?actor ?actorName
ORDER BY DESC(?countfilms)
LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q192165'), ('actorName', 'Danny Glover'), ('countfilms', '94')]
[('actor', 'http://www.wikidata.org/entity/Q485310'), ('actorName', 'Neil Patrick Harris'), ('countfilms', '84')]
[('actor', 'http://www.wikidata.org/entity/Q298777'), ('actorName', 'Michael York'), ('countfilms', '76')]
[('actor', 'http://www.wikidata.org/entity/Q311271'), ('actorName', 'John Lithgow'), ('countfilms', '71')]
[('actor', 'http://www.wikidata.org/entity/Q202304'), ('actorName', 'Jason Segel'), ('countfilms', '70')]
[('actor', 'http://www.wikidata.org/entity/Q49004'), ('actorName', 'Patrick Swayze'), ('countfilms', '63')]
[('actor', 'http://www.wikidata.org/entity/Q3045427'), ('actorName', 'George Cheung'), ('countfilms', '58')]
[('actor', 'http://www.wikidata.org/entity/Q131332'), ('actorName', 'Amanda Peet'), ('countfilms', '57')]
[('actor', 'http://www.wikidata.org/entity/Q2636696'), ('actorName', 'Alan Fudge'), ('countfilms', '56')]
[('actor', 'http://www

## Task 5
Compare HIMYM with the tv series "The Office (US)" in terms of number of seasons, episods and cast members (the result set must be two elements -one for each tv series- of tv series IRI, label, #seasons, #episodes and #cast members).

In [7]:
# For this task we will need the keyword VALUES. Let's just retun the two series iri and label (the office bgp is given)

queryString = """
SELECT DISTINCT ?series ?seriesName
WHERE {
   VALUES ?series { wd:Q147235 wd:Q23831 }.
   
   ?series sc:name ?seriesName.
}

LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q147235'), ('seriesName', 'How I Met Your Mother')]
[('series', 'http://www.wikidata.org/entity/Q23831'), ('seriesName', 'The Office')]
2


In [8]:
# We now need to count the 3 asked variables for both series

queryString = """
SELECT DISTINCT ?series ?seriesName (count(distinct ?season) as ?seasonCount) (count(distinct ?episode) as ?episodeCount) (count(distinct ?castmember) as ?castmemberCount) 
WHERE {
   VALUES ?series {wd:Q147235 wd:Q23831}
   
   ?series wdt:P527 ?season ;#HIMYM hasPart
           wdt:P161 ?castmember .
   ?season wdt:P527 ?episode . #hasPart
   
   ?series sc:name ?seriesName
}

LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q147235'), ('seriesName', 'How I Met Your Mother'), ('seasonCount', '9'), ('episodeCount', '208'), ('castmemberCount', '480')]
[('series', 'http://www.wikidata.org/entity/Q23831'), ('seriesName', 'The Office'), ('seasonCount', '9'), ('episodeCount', '201'), ('castmemberCount', '25')]
2


Final query for this task

In [70]:
# Final Query

queryString = """
SELECT DISTINCT ?series ?seriesName (count(distinct ?season) as ?seasonCount) (count(distinct ?episode) as ?episodeCount) (count(distinct ?castmember) as ?castmemberCount) 
WHERE {
   VALUES ?series {wd:Q147235 wd:Q23831}
   
   ?series wdt:P527 ?season ;#HIMYM hasPart
           wdt:P161 ?castmember .
   ?season wdt:P527 ?episode . #hasPart
   
   ?series sc:name ?seriesName
}

LIMIT 30
"""

print("Results")
x=run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q147235'), ('seriesName', 'How I Met Your Mother'), ('seasonCount', '9'), ('episodeCount', '208'), ('castmemberCount', '480')]
[('series', 'http://www.wikidata.org/entity/Q23831'), ('seriesName', 'The Office'), ('seasonCount', '9'), ('episodeCount', '201'), ('castmemberCount', '25')]
2


## Task 6
Return how many of the actors who are members of the cast of the tv series have [Kavin Bacon number](https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon#:~:text=Kevin%20Bacon%20himself%20has%20a,Bacon%20number%20is%20N%2B1.) equal to 2 (the result set must be a list of couples actor/actress IRI and label).

In [9]:
# According to the link, bacon number applies to actors, let's check if the property exists in wikidata :

queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   ?actor wdt:P31 wd:Q33999; #actor
          ?p ?obj. 
   
   ?p sc:name ?name.
   FILTER(REGEX(?name,"([Bb]acon|number|distance|degree)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
Empty


In [10]:
#Or object :

queryString = """
SELECT DISTINCT ?obj ?Oname
WHERE {

      ?actor wdt:P31 wd:Q33999; #a actor
          ?p ?obj. 
   
   ?obj sc:name ?Oname.
   FILTER(REGEX(?Oname,"([Bb]acon|number|distance|degree)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
Empty


In [3]:
#We did not find anything using regex on prop/obj related to actors.
#Let's get directly Kevin Bacon's bgp as he supposedly should be related to bacon number

queryString = """
SELECT DISTINCT ?actor ?Actorname
WHERE {

    ?piece wdt:P161 ?actor. #castmember
   
   ?actor sc:name ?Actorname.
   FILTER(REGEX(?Actorname,"([Kk]evin [Bb]acon)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor', 'http://www.wikidata.org/entity/Q3454165'), ('Actorname', 'Kevin Bacon')]
1


In [10]:
#Let's explore his related prop

queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   wd:Q3454165 ?p ?obj . #KevinBacon

   ?p sc:name ?name.
   FILTER(REGEX(?name,"([Bb]acon|number|distance|degree)"))

}
LIMIT 80
"""

print("Results")
x=run_query(queryString)

Results
Empty


In [6]:
#same for objects

queryString = """
SELECT DISTINCT ?obj ?name
WHERE {
   wd:Q3454165 ?p ?obj . #KevinBacon

   ?obj sc:name ?name.
    FILTER(REGEX(?name,"([Bb]acon|number|distance|degree)"))

}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q3308009'), ('name', 'Michael Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q6951606'), ('name', 'Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q3491343'), ('name', 'Sosie Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q17085462'), ('name', 'Kevin Bacon filmography')]
[('obj', 'http://www.wikidata.org/entity/Q1285964'), ('name', 'Edmund Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q32710290'), ('name', 'Category:Kevin Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q3519857'), ('name', 'The Bacon Brothers')]
[('obj', 'http://www.wikidata.org/entity/Q8455196'), ('name', 'Category:Films directed by Kevin Bacon')]
8


In [11]:
#Let's try a different approach by exploring the given  "wdt:P4969	derivative work    predicate"
#It is FROM (so maybe derivated from) kevin bacon, let's check it with regex

queryString = """
SELECT DISTINCT ?obj ?name
WHERE {
   ?s wdt:P4969 ?obj . #derivativeWork

   ?obj sc:name ?name.
    FILTER(REGEX(?name,"([Bb]acon|number|distance|degree)"))

}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q81209971'), ('name', 'Friar Bacon’s Brass Head')]
[('obj', 'http://www.wikidata.org/entity/Q722700'), ('name', 'Erdős–Bacon number')]
[('obj', 'http://www.wikidata.org/entity/Q70706634'), ('name', 'Ysgol Gynradd Pencarnisiog, Pencarnisiog, Tŷ Croes, Anglesey, LL63 5RY : summary report for parents on the inspection under Section 28 of the Education Act 2005 : school number : 660/2160 : date of inspection : 16/06/2009')]
[('obj', 'http://www.wikidata.org/entity/Q70694861'), ('name', 'Summary report for parents : inspection under section 10 of the Schools Inspection Act 1966 : Ysgol Botwnnog, Botwnnog, Llŷn, Gwynedd, LL53 8PY : school number : 661/4003 : date of the inspection : 5-9 November, 2001')]
[('obj', 'http://www.wikidata.org/entity/Q69648989'), ('name', 'Bodnant Junior School, Nant Hall Road, Prestatyn, Denbighshire, LL19 9NW : inspection under Section 28 of the Education Act 2005 : a summary report for parents : school number : 6

In [15]:
#We found the BGP for : [('obj', 'http://www.wikidata.org/entity/Q722700'), ('name', 'Erdős–Bacon number')]
#Bacon number could be a subclass of it :

queryString = """
SELECT DISTINCT ?s ?name
WHERE {
   ?s wdt:P279 wd:Q722700 . #subclassOf erdosBaconNumber

   ?s sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
Empty


In [16]:
#It's not, let's see then the related prop and obj

queryString = """
SELECT DISTINCT ?obj ?name
WHERE {
   wd:Q722700 ?p ?obj . #erdosBaconNumber

   ?obj sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q173746'), ('name', 'Paul Erdős')]
[('obj', 'http://www.wikidata.org/entity/Q42'), ('name', 'Douglas Adams')]
[('obj', 'http://www.wikidata.org/entity/Q3454165'), ('name', 'Kevin Bacon')]
[('obj', 'http://www.wikidata.org/entity/Q25972335'), ('name', 'Bacon number')]
[('obj', 'http://www.wikidata.org/entity/Q2742711'), ('name', 'distance')]
[('obj', 'http://www.wikidata.org/entity/Q243972'), ('name', 'Erdős number')]
6


In [3]:
#We found the bgp of bacon number, let's find how it is related to people (actors)

queryString = """
SELECT DISTINCT ?p ?pname ?objType ?name
WHERE {
   wd:Q25972335 ?p ?obj.
   ?obj wdt:P31 ?objType.
   
   ?objType sc:name ?name.
   ?p sc:name ?pname
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1269'), ('pname', 'facet of'), ('objType', 'http://www.wikidata.org/entity/Q14947863'), ('name', 'parlour game')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pname', 'named after'), ('objType', 'http://www.wikidata.org/entity/Q5'), ('name', 'human')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objType', 'http://www.wikidata.org/entity/Q865746'), ('name', 'metric function')]
[('p', 'http://www.wikidata.org/prop/direct/P4969'), ('pname', 'derivative work'), ('objType', 'http://www.wikidata.org/entity/Q2742711'), ('name', 'distance')]
4


In [64]:
#It looks like there is no Bacon Number individuals in the db, nor that this class is related to any human 
#we will change strategy and calculate it ourselves :
#If an actor played in the same piece as Kevin Bacon, his bacon number is 1, as below :

queryString = """
SELECT DISTINCT ?actor ?actorName 
WHERE {
   ?piece wdt:P161 wd:Q3454165; #castmember KevinBacon
          wdt:P161 ?actor. #castmember

    ?actor sc:name ?actorName.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('actor', 'http://www.wikidata.org/entity/Q15867224'), ('actorName', "Mark O'Brien")]
[('actor', 'http://www.wikidata.org/entity/Q16185788'), ('actorName', 'Kyle Catlett')]
[('actor', 'http://www.wikidata.org/entity/Q269891'), ('actorName', 'Julianna Guill')]
[('actor', 'http://www.wikidata.org/entity/Q391359'), ('actorName', 'David Harbour')]
[('actor', 'http://www.wikidata.org/entity/Q449580'), ('actorName', 'Lynne Thigpen')]
[('actor', 'http://www.wikidata.org/entity/Q918866'), ('actorName', 'Troy Garity')]
[('actor', 'http://www.wikidata.org/entity/Q1101612'), ('actorName', 'Clint Howard')]
[('actor', 'http://www.wikidata.org/entity/Q164328'), ('actorName', 'David Koechner')]
[('actor', 'http://www.wikidata.org/entity/Q189400'), ('actorName', 'Brooke Shields')]
[('actor', 'http://www.wikidata.org/entity/Q19609853'), ('actorName', 'Eloy Casados')]
[('actor', 'http://www.wikidata.org/entity/Q198684'), ('actorName', 'Wayne Duvall')]
[('actor', 'http://www.wikidata.org/entity/

In [67]:
#Let's get now the actors with a bacon number of 2

#If actor2 has a bacon number of 2, it means that :
#We need to find an actor1 who played in a piece1 where Kevin Bacon also played
#Then we find a piece2 different from piece 1 where actor1 and actor2 played but not Kevin Bacon (actor 1 and 2 being 2 different people)

queryString = """
SELECT DISTINCT ?actor2 ?actor2Name 
WHERE {
   ?piece1 wdt:P161 wd:Q3454165; #castmember KevinBacon
          wdt:P161 ?actor1. #castmember
    
    ?piece2 wdt:P161 ?actor1;
            wdt:P161 ?actor2.
    
    FILTER(?actor1 != wd:Q3454165 && ?actor2 != wd:Q3454165 && ?actor1!=?actor2 && ?piece1!=?piece2).
    
    ?actor2 sc:name ?actor2Name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor2', 'http://www.wikidata.org/entity/Q1354565'), ('actor2Name', 'Julian Wadham')]
[('actor2', 'http://www.wikidata.org/entity/Q18162913'), ('actor2Name', 'Callum Turner')]
[('actor2', 'http://www.wikidata.org/entity/Q17100475'), ('actor2Name', 'Brian Kubach')]
[('actor2', 'http://www.wikidata.org/entity/Q266888'), ('actor2Name', 'Vinessa Shaw')]
[('actor2', 'http://www.wikidata.org/entity/Q5799862'), ('actor2Name', 'Josh Helman')]
[('actor2', 'http://www.wikidata.org/entity/Q2151703'), ('actor2Name', 'James Murtaugh')]
[('actor2', 'http://www.wikidata.org/entity/Q6758083'), ('actor2Name', 'Marcus Carl Franklin')]
[('actor2', 'http://www.wikidata.org/entity/Q504338'), ('actor2Name', 'Joshua Cox')]
[('actor2', 'http://www.wikidata.org/entity/Q16013023'), ('actor2Name', 'William Lanteau')]
[('actor2', 'http://www.wikidata.org/entity/Q20973982'), ('actor2Name', 'Doug McKeon')]
[('actor2', 'http://www.wikidata.org/entity/Q271883'), ('actor2Name', 'Melinda Dillon')]
[('actor2'

In [66]:
#Checking if results are the same using multiple filters 

queryString = """
SELECT DISTINCT ?actor2 ?actor2Name 
WHERE {
   ?piece1 wdt:P161 wd:Q3454165; #castmember KevinBacon
          wdt:P161 ?actor1. #castmember
    
    ?piece2 wdt:P161 ?actor1;
            wdt:P161 ?actor2.
    
    FILTER(?actor1 != wd:Q3454165). 
    FILTER(?actor2 != wd:Q3454165).
    FILTER(?actor1 != ?actor2).
    FILTER(?piece1 != ?piece2).
    
    ?actor2 sc:name ?actor2Name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor2', 'http://www.wikidata.org/entity/Q1354565'), ('actor2Name', 'Julian Wadham')]
[('actor2', 'http://www.wikidata.org/entity/Q18162913'), ('actor2Name', 'Callum Turner')]
[('actor2', 'http://www.wikidata.org/entity/Q17100475'), ('actor2Name', 'Brian Kubach')]
[('actor2', 'http://www.wikidata.org/entity/Q266888'), ('actor2Name', 'Vinessa Shaw')]
[('actor2', 'http://www.wikidata.org/entity/Q5799862'), ('actor2Name', 'Josh Helman')]
[('actor2', 'http://www.wikidata.org/entity/Q2151703'), ('actor2Name', 'James Murtaugh')]
[('actor2', 'http://www.wikidata.org/entity/Q6758083'), ('actor2Name', 'Marcus Carl Franklin')]
[('actor2', 'http://www.wikidata.org/entity/Q504338'), ('actor2Name', 'Joshua Cox')]
[('actor2', 'http://www.wikidata.org/entity/Q16013023'), ('actor2Name', 'William Lanteau')]
[('actor2', 'http://www.wikidata.org/entity/Q20973982'), ('actor2Name', 'Doug McKeon')]
[('actor2', 'http://www.wikidata.org/entity/Q271883'), ('actor2Name', 'Melinda Dillon')]
[('actor2'

Final query for this task

In [15]:
# Final Query

#If actor2 has a bacon number of 2, it means that :
#We need to find an actor1 who played in a piece1 where Kevin Bacon also played
#Then we find a piece2 where actor1 and actor2 played together but not Kevin Bacon
#We need a filter making sure that: actor 1 and 2 are not Kevin Bacon nor the same person + Piece 1 and 2 are not the same.

queryString = """
SELECT DISTINCT ?actor2 ?actor2Name 
WHERE {
    ?piece1 wdt:P161 wd:Q3454165; #castmember KevinBacon
            wdt:P161 ?actor1.     #castmember
    
    ?piece2 wdt:P161 ?actor1; #castmember
            wdt:P161 ?actor2. #castmember
    
    FILTER(?actor1 != wd:Q3454165 && ?actor2 != wd:Q3454165 && ?actor1 != ?actor2 && ?piece1 != ?piece2).
    
    ?actor2 sc:name ?actor2Name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('actor2', 'http://www.wikidata.org/entity/Q1354565'), ('actor2Name', 'Julian Wadham')]
[('actor2', 'http://www.wikidata.org/entity/Q18162913'), ('actor2Name', 'Callum Turner')]
[('actor2', 'http://www.wikidata.org/entity/Q17100475'), ('actor2Name', 'Brian Kubach')]
[('actor2', 'http://www.wikidata.org/entity/Q266888'), ('actor2Name', 'Vinessa Shaw')]
[('actor2', 'http://www.wikidata.org/entity/Q5799862'), ('actor2Name', 'Josh Helman')]
[('actor2', 'http://www.wikidata.org/entity/Q2151703'), ('actor2Name', 'James Murtaugh')]
[('actor2', 'http://www.wikidata.org/entity/Q6758083'), ('actor2Name', 'Marcus Carl Franklin')]
[('actor2', 'http://www.wikidata.org/entity/Q504338'), ('actor2Name', 'Joshua Cox')]
[('actor2', 'http://www.wikidata.org/entity/Q16013023'), ('actor2Name', 'William Lanteau')]
[('actor2', 'http://www.wikidata.org/entity/Q20973982'), ('actor2Name', 'Doug McKeon')]
[('actor2', 'http://www.wikidata.org/entity/Q271883'), ('actor2Name', 'Melinda Dillon')]
[('actor2'

## Task 7
Consider the actors who are members of the cast of HIMYM. Amongst the tv series which these actors acted return only those which received more than 2 awards (the result set must be triples of tv series IRI, label, #awards won).

In [16]:
# For this task, we have all the necessary bgps except the awards one. Let's see if it is related to tv series:

queryString = """
SELECT DISTINCT ?p ?pname 
WHERE {
   wd:Q5398426 ?p ?obj.

   ?p sc:name ?pname.
   FILTER(REGEX(?pname,"(award|reward|prize|decoration)"))
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
Empty


In [17]:
#Let's check without the regex

queryString = """
SELECT DISTINCT ?p ?pname ?obj ?Oname
WHERE {
   wd:Q5398426 ?p ?obj.

   ?p sc:name ?pname.
   ?obj sc:name ?Oname.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1963'), ('pname', 'properties for this type'), ('obj', 'http://www.wikidata.org/entity/P1113'), ('Oname', 'number of episodes')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pname', 'subclass of'), ('obj', 'http://www.wikidata.org/entity/Q15416'), ('Oname', 'television program')]
[('p', 'http://www.wikidata.org/prop/direct/P1963'), ('pname', 'properties for this type'), ('obj', 'http://www.wikidata.org/entity/P580'), ('Oname', 'start time')]
[('p', 'http://www.wikidata.org/prop/direct/P1963'), ('pname', 'properties for this type'), ('obj', 'http://www.wikidata.org/entity/P582'), ('Oname', 'end time')]
[('p', 'http://www.wikidata.org/prop/direct/P1963'), ('pname', 'properties for this type'), ('obj', 'http://www.wikidata.org/entity/P1476'), ('Oname', 'title')]
[('p', 'http://www.wikidata.org/prop/direct/P5869'), ('pname', 'model item'), ('obj', 'http://www.wikidata.org/entity/Q131758'), ('Oname', 'Desperate Housewives')]
[('p

In [18]:
#Let's explore objects related to "properties for this type"

queryString = """
SELECT DISTINCT ?obj ?Oname 
WHERE {
   wd:Q5398426 wdt:P1963 ?obj. #tvSerie properties

   ?obj sc:name ?Oname.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/P1113'), ('Oname', 'number of episodes')]
[('obj', 'http://www.wikidata.org/entity/P580'), ('Oname', 'start time')]
[('obj', 'http://www.wikidata.org/entity/P582'), ('Oname', 'end time')]
[('obj', 'http://www.wikidata.org/entity/P1476'), ('Oname', 'title')]
[('obj', 'http://www.wikidata.org/entity/P921'), ('Oname', 'main subject')]
[('obj', 'http://www.wikidata.org/entity/P527'), ('Oname', 'has part')]
[('obj', 'http://www.wikidata.org/entity/P495'), ('Oname', 'country of origin')]
[('obj', 'http://www.wikidata.org/entity/P136'), ('Oname', 'genre')]
[('obj', 'http://www.wikidata.org/entity/P2437'), ('Oname', 'number of seasons')]
[('obj', 'http://www.wikidata.org/entity/P449'), ('Oname', 'original broadcaster')]
[('obj', 'http://www.wikidata.org/entity/P840'), ('Oname', 'narrative location')]
[('obj', 'http://www.wikidata.org/entity/P915'), ('Oname', 'filming location')]
[('obj', 'http://www.wikidata.org/entity/P161'), ('Oname', 'cast me

In [26]:
#We see the award received property, let's try counting all the awards received by HIMYM

queryString = """
SELECT ?award ?awardName
WHERE {
   wd:Q147235 wdt:P166 ?award. #HIMYM award
   ?award sc:name ?awardName.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('award', 'http://www.wikidata.org/entity/Q2300462'), ('awardName', 'Streamy Awards')]
[('award', 'http://www.wikidata.org/entity/Q3002950'), ('awardName', "Critics' Choice Television Award for Best Supporting Actor in a Comedy Series")]
[('award', 'http://www.wikidata.org/entity/Q7243510'), ('awardName', 'Primetime Emmy Award for Outstanding Single-Camera Picture Editing for a Comedy Series')]
[('award', 'http://www.wikidata.org/entity/Q23011211'), ('awardName', 'International TV Audience Award for Best Comedy TV Series')]
[('award', 'http://www.wikidata.org/entity/Q7243501'), ('awardName', 'Primetime Emmy Award for Outstanding Cinematography for a Multi-Camera Series')]
[('award', 'http://www.wikidata.org/entity/Q17985609'), ('awardName', 'Primetime Emmy Award for Outstanding Art Direction for a Contemporary Program')]
[('award', 'http://www.wikidata.org/entity/Q30633707'), ('awardName', 'Primetime Emmy Award for Outstanding Multi-Camera Picture Editing for a Comedy Series')

In [42]:
#Consider the actors who are members of the cast of HIMYM. 
#Amongst the tv series which these actors acted return only those which received more than 2 awards (the result set must be triples of tv series IRI, label, #awards won).
#We can now try :

queryString = """
SELECT ?series ?seriesName (count(distinct ?award) as ?count)

WHERE {
    wd:Q147235 wdt:P161 ?actorH. #HIMYM castmember


   ?series wdt:P31 wd:Q5398426; #a tvseries
           wdt:P161 ?actorH; #castmember
           wdt:P166 ?award. #award
           
   ?series sc:name ?seriesName

}
GROUP BY ?series ?seriesName
HAVING(count(distinct ?award)>2)
ORDER BY DESC(?count)

LIMIT 200
"""

print("Results")
x=run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q23628'), ('seriesName', 'The Sopranos'), ('count', '22')]
[('series', 'http://www.wikidata.org/entity/Q1079'), ('seriesName', 'Breaking Bad'), ('count', '21')]
[('series', 'http://www.wikidata.org/entity/Q1132439'), ('seriesName', 'The Practice'), ('count', '12')]
[('series', 'http://www.wikidata.org/entity/Q16756'), ('seriesName', 'Modern Family'), ('count', '11')]
[('series', 'http://www.wikidata.org/entity/Q79784'), ('seriesName', 'Friends'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q8539'), ('seriesName', 'The Big Bang Theory'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q1136370'), ('seriesName', 'General Hospital'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q244803'), ('seriesName', 'Ally McBeal'), ('count', '9')]
[('series', 'http://www.wikidata.org/entity/Q1030713'), ('seriesName', 'Another World'), ('count', '8')]
[('series', 'http://www.wikidata.org/entity/Q438406'), 

In [23]:
#Same query as above but not using the distinct for awards : 

queryString = """
SELECT ?series ?seriesName (count(?award) as ?count)

WHERE {
    wd:Q147235 wdt:P161 ?actorH. #HIMYM castmember


   ?series wdt:P31 wd:Q5398426; #a tvseries
           wdt:P161 ?actorH;    #castmember
           wdt:P166 ?award.     #award
           
   ?series sc:name ?seriesName

}
GROUP BY ?series ?seriesName
HAVING(count(?award)>2)
ORDER BY DESC(?count)

LIMIT 200
"""

print("Results")
x=run_query(queryString)


Results
[('series', 'http://www.wikidata.org/entity/Q147235'), ('seriesName', 'How I Met Your Mother'), ('count', '3360')]
[('series', 'http://www.wikidata.org/entity/Q34659'), ('seriesName', 'My Name Is Earl'), ('count', '119')]
[('series', 'http://www.wikidata.org/entity/Q1079'), ('seriesName', 'Breaking Bad'), ('count', '105')]
[('series', 'http://www.wikidata.org/entity/Q16756'), ('seriesName', 'Modern Family'), ('count', '88')]
[('series', 'http://www.wikidata.org/entity/Q117396'), ('seriesName', 'CSI: Vegas'), ('count', '84')]
[('series', 'http://www.wikidata.org/entity/Q438406'), ('seriesName', "Grey's Anatomy"), ('count', '80')]
[('series', 'http://www.wikidata.org/entity/Q8539'), ('seriesName', 'The Big Bang Theory'), ('count', '60')]
[('series', 'http://www.wikidata.org/entity/Q19570'), ('seriesName', 'The Good Wife'), ('count', '39')]
[('series', 'http://www.wikidata.org/entity/Q244803'), ('seriesName', 'Ally McBeal'), ('count', '36')]
[('series', 'http://www.wikidata.org/en

Final query for this task

In [27]:
# Final query

queryString = """
SELECT ?series ?seriesName (count(distinct ?award) as ?count)

WHERE {
    wd:Q147235 wdt:P161 ?actorH. #HIMYM castmember


   ?series wdt:P31 wd:Q5398426;   #a tvseries
           wdt:P161 ?actorH;      #castmember
           wdt:P166 ?award.       #award
           
   ?series sc:name ?seriesName

}
GROUP BY ?series ?seriesName
HAVING(count(distinct ?award)>2)
ORDER BY DESC(?count)

LIMIT 200
"""

print("Results")
x=run_query(queryString)

Results
[('series', 'http://www.wikidata.org/entity/Q23628'), ('seriesName', 'The Sopranos'), ('count', '22')]
[('series', 'http://www.wikidata.org/entity/Q1079'), ('seriesName', 'Breaking Bad'), ('count', '21')]
[('series', 'http://www.wikidata.org/entity/Q1132439'), ('seriesName', 'The Practice'), ('count', '12')]
[('series', 'http://www.wikidata.org/entity/Q16756'), ('seriesName', 'Modern Family'), ('count', '11')]
[('series', 'http://www.wikidata.org/entity/Q79784'), ('seriesName', 'Friends'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q8539'), ('seriesName', 'The Big Bang Theory'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q1136370'), ('seriesName', 'General Hospital'), ('count', '10')]
[('series', 'http://www.wikidata.org/entity/Q244803'), ('seriesName', 'Ally McBeal'), ('count', '9')]
[('series', 'http://www.wikidata.org/entity/Q1030713'), ('seriesName', 'Another World'), ('count', '8')]
[('series', 'http://www.wikidata.org/entity/Q438406'), 