# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [28]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-9d624085bd-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Production company explorative search") 

Consider the following exploratory information need:

> investigate the main companies working on cinema-related content. We want to know which are the main television production companies and the film production companies, which company distributes more film and some consideration about awards.

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P106`     | profession   | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q36479`    | The Lion King      | node |





Also consider

```
wd:Q36479 ?p ?obj .
```

is the BGP to retrieve all **properties of The Lion King**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should:


1. Identify the BGP for films

2. Identify the BGP for Netflix

3. Identify the BGP for television production company

4. Identify the BGP for film production company

5. Find the top-5 companies (amongst television production company and film production company) which produced the highest number of crime films (the result set must be a list of triples company IRI, label and #crime film).

6. Find the company (amongst television production company and film production company) which distributed more films (of any kind of film genre) that they did not produce (the result set must be a list of triples company IRI, label and #film).

7. Find how many company are listed in every stock exchange market (the result set must be a list of triples stock exchange IRI, label and #companies). 

8. Identify the BGP for Academy Award

9. Find the companies (among television and film production companies) that won at least 5 Academy Awards for Best Actress for the movies they produced (the result set must be a list of triples company IRI, label and #awards).

## Task 1
Identify the BGP for films

In [29]:
# checking if lion king is an instance of films
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # instance of
   wd:Q36479 wdt:P31 ?p.
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q29168811'), ('name', 'animated feature film')]
1


In [30]:
# checking if animated feature film is a subclass of films
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # subclass of
   wd:Q29168811 wdt:P279 ?p.
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q202866'), ('name', 'animated film')]
[('p', 'http://www.wikidata.org/entity/Q24869'), ('name', 'feature film')]
2


In [31]:
# checking if animated film is a subclass of films
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   # subclass of
   wd:Q202866 wdt:P279 ?p.
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q11424'), ('name', 'film')]
1


In [32]:
# bgp of films
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q11424 ?p ?obj.
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1014'), ('name', 'Art & Architecture Thesaurus ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1036'), ('name', 'Dewey Decimal Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P1051'), ('name', 'PSH ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1151'), ('name', "topic's main Wikimedia portal")]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('name', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1245'), ('name', 'OmegaWiki Defined Meaning')]
[('p', 'http://www.wikidata.org/prop/direct/P1256'), ('name', 'Iconclass notation')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('name', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1368'), ('name', 'LNB ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('name', "topic's main template")]
[('p',

Final query for this task

In [6]:
# bgp of films
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   # instance of
   wd:Q36479 wdt:P31 ?animated_feature_film .
   #subclass of
   ?animated_feature_film wdt:P279 ?animated_film .
   #subclass of
   ?animated_film wdt:P279 ?film .
   
   ?film ?p ?obj .
   ?p sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1014'), ('name', 'Art & Architecture Thesaurus ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1036'), ('name', 'Dewey Decimal Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P1051'), ('name', 'PSH ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1151'), ('name', "topic's main Wikimedia portal")]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('name', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1245'), ('name', 'OmegaWiki Defined Meaning')]
[('p', 'http://www.wikidata.org/prop/direct/P1256'), ('name', 'Iconclass notation')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('name', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1368'), ('name', 'LNB ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('name', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1424'), ('name', "topic's main template")]
[('p',

## Task 2
Identify the BGP for Netflix

In [7]:
# write your queries

In [45]:
# looking at all objects of lion king, maybe it was made by netflix
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   #bgp of lion king
   wd:Q36479 ?prd ?p .
   ?p sc:name ?name .
   FILTER REGEX(?name, "^[Nn].*") .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q2365445'), ('name', 'Nala')]
[('p', 'http://www.wikidata.org/entity/Q28828203'), ('name', 'Nancy Kniep')]
[('p', 'http://www.wikidata.org/entity/Q28859490'), ('name', 'Natalie Franscioni-Karp')]
[('p', 'http://www.wikidata.org/entity/Q491264'), ('name', 'Nathan Lane')]
[('p', 'http://www.wikidata.org/entity/Q823422'), ('name', 'National Film Registry')]
[('p', 'http://www.wikidata.org/entity/Q907311'), ('name', 'Netflix')]
[('p', 'http://www.wikidata.org/entity/Q1993004'), ('name', 'Niketa Calame')]
[('p', 'http://www.wikidata.org/entity/Q25558510'), ('name', 'Noni White')]
[('p', 'http://www.wikidata.org/entity/Q23817729'), ('name', 'no age restriction')]
9


In [50]:
# got Netflix(wd:Q907311) is one of the objects of the lion king
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   #bgp of netflix
   wd:Q907311 ?p ?obj .
   ?p sc:name ?name.
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1661'), ('name', 'Alexa rank')]
[('p', 'http://www.wikidata.org/prop/direct/P9618'), ('name', 'AlternativeTo software ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3861'), ('name', 'App Store app ID')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('name', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2531'), ('name', 'Box Office Mojo studio ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5019'), ('name', 'Brockhaus Enzyklopädie online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5531'), ('name', 'Central Index Key')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P2088'), ('name', 'Crunchbase organization ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6849'), ('name', 'DR topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6181'), ('name', 'Disney A to Z ID')]
[('p', 'http://www.wikidata.

Final query for this task

In [10]:
# write your final query

In [69]:
# netflix is one of the objects of lion kinf
queryString = """
SELECT ?p ?name
WHERE {
   
   wd:Q36479 ?prd ?netflixObj .
   ?netflixObj sc:name ?netflix.
   FILTER Regex(?netflix, "Netflix") .
   
   ?netflixObj ?p ?obj .
   ?p sc:name ?name.
   
}
 order by ?name
 LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1661'), ('name', 'Alexa rank')]
[('p', 'http://www.wikidata.org/prop/direct/P9618'), ('name', 'AlternativeTo software ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3861'), ('name', 'App Store app ID')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('name', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2531'), ('name', 'Box Office Mojo studio ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5019'), ('name', 'Brockhaus Enzyklopädie online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5531'), ('name', 'Central Index Key')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P2088'), ('name', 'Crunchbase organization ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6849'), ('name', 'DR topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6181'), ('name', 'Disney A to Z ID')]
[('p', 'http://www.wikidata.

## Task 3
Identify the BGP for television production company

In [58]:
# write your queries

In [66]:
# checking all subjects of netflix for television production company
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q907311 ?pr ?p.
   ?p sc:name ?name . 
   FILTER REGEX(?name, "^[Tt]") .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q47472306'), ('name', 'Template:Netflix')]
[('p', 'http://www.wikidata.org/entity/Q54007319'), ('name', 'Template:Netflix title')]
[('p', 'http://www.wikidata.org/entity/Q849363'), ('name', 'The Vanguard Group')]
[('p', 'http://www.wikidata.org/entity/Q10689397'), ('name', 'television production company')]
4


Final query for this task

In [None]:
# write your final query

In [79]:
# television production company is one of the objects of netfix
# and that object's name is "television production company"
queryString = """
SELECT ?p ?name
WHERE {
   
   wd:Q907311 ?pr ?televionProductionCompanyObj .
   ?televionProductionCompanyObj sc:name ?televionProductionCompany . 
   FILTER REGEX(?televionProductionCompany, "television production company") .
   
   ?televionProductionCompanyObj ?p ?obj .
   ?p sc:name ?name .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P2671'), ('name', 'Google Knowledge Graph ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3417'), ('name', 'Quora topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('p', 'http://www.wikidata.org/prop/direct/P452'), ('name', 'industry')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P910'), ('name', "topic's main category")]
9


## Task 4
Identify the BGP for film production company

In [None]:
# write your queries

In [80]:
# checking if film production company is related to television production company in any way

# Q10689397 = television production company
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q10689397 ?pr ?p.
   ?p sc:name ?name . 
   FILTER REGEX(?name, ".*company.*") .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q1762059'), ('name', 'film production company')]
[('p', 'http://www.wikidata.org/entity/Q11396960'), ('name', 'production company')]
2


In [81]:
#film production company is one of the objects of television production company
# and the name of this object is film production company

# Q10689397 = television production company
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q10689397 ?pr ?p.
   ?p sc:name ?name . 
   FILTER REGEX(?name, ".*film production company.*") .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q1762059'), ('name', 'film production company')]
1


Final query for this task

In [None]:
# write your final query

In [87]:
#film production company is one of the objects of television production company
# the name od this object is film production company

# Q10689397 = television production company
queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q10689397 ?pr ?filmProductionCompanyObject.
   ?filmProductionCompanyObject sc:name ?filmProductionCompanyObjectName . 
   FILTER REGEX(?filmProductionCompanyObjectName, ".*film production company.*") .
   
   ?filmProductionCompanyObject ?p ?obj .
   ?p sc:name ?name .
}
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('name', 'Freebase ID')]
[('p', 'http://www.wikidata.org/prop/direct/P227'), ('name', 'GND ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8408'), ('name', 'KBpedia ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('p', 'http://www.wikidata.org/prop/direct/P452'), ('name', 'industry')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('name', 'part of')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('p', 'http://www.wikidata.org/prop/direct/P910'), ('name', "topic's main category")]
9


## Task 5
Find the top-5 companies (amongst television production company and film production company) which produced the highest number of crime films (the result set must be a list of triples company IRI, label and #crime film).

In [None]:
# write your queries

In [199]:
# checking if films are realted to crime in any way

# Q11424 = films
# wdt:P31 = instance of

queryString = """
SELECT DISTINCT ?prd ?p ?prdname ?name
WHERE {
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm ?prd ?p .
   ?p sc:name ?name.
   FILTER REGEX(?name, ".*[Cc]rime.*") .
   
   ?prd sc:name ?prdname.
   
}
order by ?name
 LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('prd', 'http://www.wikidata.org/prop/direct/P921'), ('p', 'http://www.wikidata.org/entity/Q190357'), ('prdname', 'main subject'), ('name', 'Central Office of the State Justice Administrations for the Investigation of National Socialist Crimes')]
[('prd', 'http://www.wikidata.org/prop/direct/P136'), ('p', 'http://www.wikidata.org/entity/Q3697149'), ('prdname', 'genre'), ('name', 'Crime')]
[('prd', 'http://www.wikidata.org/prop/direct/P161'), ('p', 'http://www.wikidata.org/entity/Q46663'), ('prdname', 'cast member'), ('name', 'Crime & the City Solution')]
[('prd', 'http://www.wikidata.org/prop/direct/P406'), ('p', 'http://www.wikidata.org/entity/Q5185165'), ('prdname', 'soundtrack release'), ('name', 'Crime + Punishment in Suburbia')]
[('prd', 'http://www.wikidata.org/prop/direct/P1889'), ('p', 'http://www.wikidata.org/entity/Q2514298'), ('prdname', 'different from'), ('name', 'Crime After School')]
[('prd', 'http://www.wikidata.org/prop/direct/P1889'), ('p', 'http://www.wikida

In [204]:
# checking if films are realted to television production company in any way

# Q11424 = films
# Q10689397 = television production company
# wdt:P31 = instance of
queryString = """
SELECT DISTINCT ?firstHop ?firstHopName ?secondHop ?secondHopName
WHERE {
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm ?firstHop ?something .
   ?something ?secondHop wd:Q1762059.
   ?firstHop sc:name ?firstHopName.
   ?secondHop sc:name ?secondHopName
   
   
}
order by ?firstHopName
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('firstHop', 'http://www.wikidata.org/prop/direct/P161'), ('firstHopName', 'cast member'), ('secondHop', 'http://www.wikidata.org/prop/direct/P31'), ('secondHopName', 'instance of')]
[('firstHop', 'http://www.wikidata.org/prop/direct/P88'), ('firstHopName', 'commissioned by'), ('secondHop', 'http://www.wikidata.org/prop/direct/P31'), ('secondHopName', 'instance of')]
[('firstHop', 'http://www.wikidata.org/prop/direct/P767'), ('firstHopName', 'contributor to the creative work or subject'), ('secondHop', 'http://www.wikidata.org/prop/direct/P31'), ('secondHopName', 'instance of')]
[('firstHop', 'http://www.wikidata.org/prop/direct/P3931'), ('firstHopName', 'copyright holder'), ('secondHop', 'http://www.wikidata.org/prop/direct/P31'), ('secondHopName', 'instance of')]
[('firstHop', 'http://www.wikidata.org/prop/direct/P170'), ('firstHopName', 'creator'), ('secondHop', 'http://www.wikidata.org/prop/direct/P31'), ('secondHopName', 'instance of')]
[('firstHop', 'http://www.wikidata.

In [181]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films produced by either television production company or film production company

# Q11424 = films
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer

queryString = """
SELECT DISTINCT *
WHERE {
   
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P162 ?producer .
   {
       ?producer wdt:P31 wd:Q1762059.
   }
   UNION
   {
       ?producer wdt:P31 wd:Q10689397 .
   }
}

order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('randomFilm', 'http://www.wikidata.org/entity/Q7659277'), ('producer', 'http://www.wikidata.org/entity/Q947354')]
[('randomFilm', 'http://www.wikidata.org/entity/Q21819857'), ('producer', 'http://www.wikidata.org/entity/Q23017119')]
[('randomFilm', 'http://www.wikidata.org/entity/Q5115110'), ('producer', 'http://www.wikidata.org/entity/Q41468')]
[('randomFilm', 'http://www.wikidata.org/entity/Q22694584'), ('producer', 'http://www.wikidata.org/entity/Q2114144')]
[('randomFilm', 'http://www.wikidata.org/entity/Q300502'), ('producer', 'http://www.wikidata.org/entity/Q159846')]
[('randomFilm', 'http://www.wikidata.org/entity/Q470344'), ('producer', 'http://www.wikidata.org/entity/Q41468')]
[('randomFilm', 'http://www.wikidata.org/entity/Q815425'), ('producer', 'http://www.wikidata.org/entity/Q1254356')]
[('randomFilm', 'http://www.wikidata.org/entity/Q7549559'), ('producer', 'http://www.wikidata.org/entity/Q126399')]
[('randomFilm', 'http://www.wikidata.org/entity/Q3282562'), ('p

In [189]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films with genre crime
# and films produced by either television production company or film production company

# Q11424 = films
# P136 = genre
# Q4655749 = crime
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer

queryString = """
SELECT  *
WHERE {
   
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P136 wd:Q3697149 .
}

order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('randomFilm', 'http://www.wikidata.org/entity/Q4655749')]
1


Final query for this task

In [None]:
# write your final query

## Task 6
Find the company (amongst television production company and film production company) which distributed more films (of any kind of film genre) that they did not produce (the result set must be a list of triples company IRI, label and #film).

In [None]:
# write your queries

In [221]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films produced by either television production company or film production company
# using films distributed by and then checking distributed_by != produced_by

# Q11424 = films
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer
# P750 = distributed by

queryString = """
SELECT DISTINCT ?distributor ?distributorName (count(?randomFilm) as ?numberOfFilmsDistributed)
WHERE {
   
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P162 ?producer .
   {
       ?producer wdt:P31 wd:Q1762059.
   }
   UNION
   {
       ?producer wdt:P31 wd:Q10689397 .
   }
   
   ?randomFilm wdt:P750 ?distributor .
   ?distributor sc:name ?distributorName
   
   Filter (?distributor != ?producer)
}
group by ?distributor ?distributorName
order by desc (?numberOfFilmsDistributed)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('distributor', 'http://www.wikidata.org/entity/Q907311'), ('distributorName', 'Netflix'), ('numberOfFilmsDistributed', '116')]
[('distributor', 'http://www.wikidata.org/entity/Q80948336'), ('distributorName', 'FandangoNow'), ('numberOfFilmsDistributed', '25')]
[('distributor', 'http://www.wikidata.org/entity/Q159846'), ('distributorName', 'Paramount Pictures'), ('numberOfFilmsDistributed', '20')]
[('distributor', 'http://www.wikidata.org/entity/Q3963283'), ('distributorName', 'Società Anonima Stefano Pittaluga'), ('numberOfFilmsDistributed', '12')]
[('distributor', 'http://www.wikidata.org/entity/Q1066018'), ('distributorName', 'Toei Company'), ('numberOfFilmsDistributed', '12')]
[('distributor', 'http://www.wikidata.org/entity/Q179200'), ('distributorName', 'Metro-Goldwyn-Mayer'), ('numberOfFilmsDistributed', '10')]
[('distributor', 'http://www.wikidata.org/entity/Q564960'), ('distributorName', 'Constantin Film'), ('numberOfFilmsDistributed', '10')]
[('distributor', 'http://

Final query for this task

In [None]:
# write your final query

In [222]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films produced by either television production company or film production company
# using films distributed by and then checking distributed_by != produced_by

# Q11424 = films
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer
# P750 = distributed by

queryString = """
SELECT DISTINCT ?distributor ?distributorName (count(?randomFilm) as ?numberOfFilmsDistributed)
WHERE {
   
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P162 ?producer .
   {
       ?producer wdt:P31 wd:Q1762059.
   }
   UNION
   {
       ?producer wdt:P31 wd:Q10689397 .
   }
   
   ?randomFilm wdt:P750 ?distributor .
   ?distributor sc:name ?distributorName
   
   Filter (?distributor != ?producer)
}
group by ?distributor ?distributorName
order by desc (?numberOfFilmsDistributed)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('distributor', 'http://www.wikidata.org/entity/Q907311'), ('distributorName', 'Netflix'), ('numberOfFilmsDistributed', '116')]
[('distributor', 'http://www.wikidata.org/entity/Q80948336'), ('distributorName', 'FandangoNow'), ('numberOfFilmsDistributed', '25')]
[('distributor', 'http://www.wikidata.org/entity/Q159846'), ('distributorName', 'Paramount Pictures'), ('numberOfFilmsDistributed', '20')]
[('distributor', 'http://www.wikidata.org/entity/Q3963283'), ('distributorName', 'Società Anonima Stefano Pittaluga'), ('numberOfFilmsDistributed', '12')]
[('distributor', 'http://www.wikidata.org/entity/Q1066018'), ('distributorName', 'Toei Company'), ('numberOfFilmsDistributed', '12')]
[('distributor', 'http://www.wikidata.org/entity/Q179200'), ('distributorName', 'Metro-Goldwyn-Mayer'), ('numberOfFilmsDistributed', '10')]
[('distributor', 'http://www.wikidata.org/entity/Q564960'), ('distributorName', 'Constantin Film'), ('numberOfFilmsDistributed', '10')]
[('distributor', 'http://

## Task 7
Find how many company are listed in every stock exchange market (the result set must be a list of triples stock exchange IRI, label and #companies).

In [None]:
# write your queries

In [257]:
# looking at a random film production company to see if it has a property for stock exchange

# Q1762059 = film production company
# wdt:P31 = instance of

queryString = """
SELECT DISTINCT ?pr ?obj ?prName ?objName
WHERE {
   
   
   ?randomCompany wdt:P31 wd:Q1762059 .
   ?randomCompany ?pr ?obj .
   
   ?pr sc:name ?prName .
   ?obj sc:name ?objName .
   
   FILTER regex(?objName, ".*[Ee]xchange.*") .
   
}
# group by ?distributor ?distributorName
order by ?prName
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('pr', 'http://www.wikidata.org/prop/direct/P414'), ('obj', 'http://www.wikidata.org/entity/Q496672'), ('prName', 'stock exchange'), ('objName', 'Hong Kong Stock Exchange')]
[('pr', 'http://www.wikidata.org/prop/direct/P414'), ('obj', 'http://www.wikidata.org/entity/Q217475'), ('prName', 'stock exchange'), ('objName', 'Tokyo Stock Exchange')]
[('pr', 'http://www.wikidata.org/prop/direct/P414'), ('obj', 'http://www.wikidata.org/entity/Q13677'), ('prName', 'stock exchange'), ('objName', 'New York Stock Exchange')]
[('pr', 'http://www.wikidata.org/prop/direct/P414'), ('obj', 'http://www.wikidata.org/entity/Q732670'), ('prName', 'stock exchange'), ('objName', 'Australian Securities Exchange')]
[('pr', 'http://www.wikidata.org/prop/direct/P414'), ('obj', 'http://www.wikidata.org/entity/Q1661737'), ('prName', 'stock exchange'), ('objName', 'Indonesia Stock Exchange')]
5


In [258]:
# looking at a random film production company to see if it has a property to tell if it is a company

# Q1762059 = film production company
# wdt:P31 = instance of

queryString = """
SELECT DISTINCT ?pr ?obj ?prName ?objName
WHERE {
   
   
   ?randomCompany wdt:P31 wd:Q1762059 .
   ?randomCompany ?pr ?obj .
   
   ?pr sc:name ?prName .
   ?obj sc:name ?objName .
   
   FILTER regex(?objName, ".*company.*") .
   
}
# group by ?distributor ?distributorName
order by ?prName
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('pr', 'http://www.wikidata.org/prop/direct/P1889'), ('obj', 'http://www.wikidata.org/entity/Q16220833'), ('prName', 'different from'), ('objName', 'Star Film (Dutch East Indies company)')]
[('pr', 'http://www.wikidata.org/prop/direct/P452'), ('obj', 'http://www.wikidata.org/entity/Q1762059'), ('prName', 'industry'), ('objName', 'film production company')]
[('pr', 'http://www.wikidata.org/prop/direct/P452'), ('obj', 'http://www.wikidata.org/entity/Q11396960'), ('prName', 'industry'), ('objName', 'production company')]
[('pr', 'http://www.wikidata.org/prop/direct/P452'), ('obj', 'http://www.wikidata.org/entity/Q1331793'), ('prName', 'industry'), ('objName', 'media company')]
[('pr', 'http://www.wikidata.org/prop/direct/P452'), ('obj', 'http://www.wikidata.org/entity/Q1589009'), ('prName', 'industry'), ('objName', 'privately held company')]
[('pr', 'http://www.wikidata.org/prop/direct/P31'), ('obj', 'http://www.wikidata.org/entity/Q10689397'), ('prName', 'instance of'), ('objNam

In [274]:
# we know that every company is an instance of company
# and that stock exchange property of every company lists the stock exchange
# will group all objects of company by their stock exchange property

# Q783794 = company
# wdt:P31 = instance of
# P414 = stock exchange

queryString = """
SELECT DISTINCT ?stockExchange ?stockExchangeName (count(?randomCompany) as ?numberOfCompanies)
WHERE {
   
   
   ?randomCompany wdt:P31 wd:Q783794 .
   ?randomCompany wdt:P414 ?stockExchange .
   
   ?stockExchange sc:name ?stockExchangeName
   
}
group by ?stockExchange ?stockExchangeName
order by desc (?numberOfCompanies)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('stockExchange', 'http://www.wikidata.org/entity/Q517750'), ('stockExchangeName', 'Shenzhen Stock Exchange'), ('numberOfCompanies', '143')]
[('stockExchange', 'http://www.wikidata.org/entity/Q2006583'), ('stockExchangeName', 'Tadawul'), ('numberOfCompanies', '113')]
[('stockExchange', 'http://www.wikidata.org/entity/Q82059'), ('stockExchangeName', 'NASDAQ'), ('numberOfCompanies', '50')]
[('stockExchange', 'http://www.wikidata.org/entity/Q739514'), ('stockExchangeName', 'Shanghai Stock Exchange'), ('numberOfCompanies', '49')]
[('stockExchange', 'http://www.wikidata.org/entity/Q13677'), ('stockExchangeName', 'New York Stock Exchange'), ('numberOfCompanies', '31')]
[('stockExchange', 'http://www.wikidata.org/entity/Q171240'), ('stockExchangeName', 'London Stock Exchange'), ('numberOfCompanies', '12')]
[('stockExchange', 'http://www.wikidata.org/entity/Q496672'), ('stockExchangeName', 'Hong Kong Stock Exchange'), ('numberOfCompanies', '12')]
[('stockExchange', 'http://www.wikidat

Final query for this task

In [None]:
# write your final query

In [276]:
# we know that every company is an instance of company
# and that stock exchange property of every company lists the stock exchange
# we will group all objects of company by their stock exchange property

# Q783794 = company
# wdt:P31 = instance of
# P414 = stock exchange

queryString = """
SELECT DISTINCT ?stockExchange ?stockExchangeName (count(?randomCompany) as ?numberOfCompanies)
WHERE {
   
   
   ?randomCompany wdt:P31 wd:Q783794 .
   ?randomCompany wdt:P414 ?stockExchange .
   
   ?stockExchange sc:name ?stockExchangeName
   
}
group by ?stockExchange ?stockExchangeName
order by desc (?numberOfCompanies)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('stockExchange', 'http://www.wikidata.org/entity/Q517750'), ('stockExchangeName', 'Shenzhen Stock Exchange'), ('numberOfCompanies', '143')]
[('stockExchange', 'http://www.wikidata.org/entity/Q2006583'), ('stockExchangeName', 'Tadawul'), ('numberOfCompanies', '113')]
[('stockExchange', 'http://www.wikidata.org/entity/Q82059'), ('stockExchangeName', 'NASDAQ'), ('numberOfCompanies', '50')]
[('stockExchange', 'http://www.wikidata.org/entity/Q739514'), ('stockExchangeName', 'Shanghai Stock Exchange'), ('numberOfCompanies', '49')]
[('stockExchange', 'http://www.wikidata.org/entity/Q13677'), ('stockExchangeName', 'New York Stock Exchange'), ('numberOfCompanies', '31')]
[('stockExchange', 'http://www.wikidata.org/entity/Q171240'), ('stockExchangeName', 'London Stock Exchange'), ('numberOfCompanies', '12')]
[('stockExchange', 'http://www.wikidata.org/entity/Q496672'), ('stockExchangeName', 'Hong Kong Stock Exchange'), ('numberOfCompanies', '12')]
[('stockExchange', 'http://www.wikidat

## Task 8
Identify the BGP for Academy Award

In [None]:
# write your queries

In [287]:
# looking for something realted to Academy award in a random film

# Q11424 = films
# wdt:P31 = instance of

queryString = """
SELECT DISTINCT ?pr ?obj ?prName ?objName
WHERE {
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm ?pr ?obj .
   ?pr sc:name ?prName .
   ?obj sc:name ?objName .
   
   FILTER regex(?objName, ".*Academy.*Award.*") .
   
}
# group by 
order by ?prName
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q41417'), ('prName', 'award received'), ('objName', 'Academy Award for Best Writing, Original Screenplay')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q102427'), ('prName', 'award received'), ('objName', 'Academy Award for Best Picture')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q488645'), ('prName', 'award received'), ('objName', 'Academy Award for Best Sound Editing')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q281939'), ('prName', 'award received'), ('objName', 'Academy Award for Best Film Editing')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q830079'), ('prName', 'award received'), ('objName', 'Academy Award for Best Sound')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj

In [304]:
# checking if Academy Award for Best Documentary Feature is instance of Academy Award

# Q11424 = films
# wdt:P31 = instance of
# P166 = award received
# Q111332 = Academy Award for Best Documentary Feature

queryString = """
SELECT DISTINCT ?obj ?objName
WHERE {
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P166 ?award .
   ?award wdt:P31 ?obj .
   ?obj sc:name ?objName .
   
   FILTER regex(?objName, ".*[Aa]cademy.*[Aa]ward.*") .
   
}
# group by 
# order by ?prName
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q19020'), ('objName', 'Academy Awards')]
[('obj', 'http://www.wikidata.org/entity/Q732997'), ('objName', 'British Academy Film Awards')]
[('obj', 'http://www.wikidata.org/entity/Q488651'), ('objName', 'Academy Award for Best Original Score')]
[('obj', 'http://www.wikidata.org/entity/Q655089'), ('objName', 'International Indian Film Academy Awards')]
[('obj', 'http://www.wikidata.org/entity/Q384139'), ('objName', 'Africa Movie Academy Award')]
5


Final query for this task

In [None]:
# write your final query

In [307]:
# checking if Academy Award for Best Documentary Feature is instance of Academy Award

# Q19020 = Academy Awards

queryString = """
SELECT DISTINCT ?p ?name
WHERE {
   
   wd:Q19020 ?p ?obj .
   ?p sc:name ?name .
   
}
# group by 
order by ?name
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P9084'), ('name', 'ABC News topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9629'), ('name', 'Armeniapedia ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8295'), ('name', 'AustLit ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6200'), ('name', 'BBC News topic ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2581'), ('name', 'BabelNet ID')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('name', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5019'), ('name', 'Brockhaus Enzyklopädie online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5905'), ('name', 'Comic Vine ID')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('name', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P935'), ('name', 'Commons gallery')]
[('p', 'http://www.wikidata.org/prop/direct/P3569'), ('name', 'Cultureel Woordenboek ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3509'), 

## Task 9
Find the companies (among television and film production companies) that won at least 5 Academy Awards for Best Actress for the movies they produced (the result set must be a list of triples company IRI, label and #awards).

In [None]:
# write your queries

In [308]:
# looking for something realted to Academy awards for best actress in a random film

# Q11424 = films
# wdt:P31 = instance of

queryString = """
SELECT DISTINCT ?pr ?obj ?prName ?objName
WHERE {
   
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm ?pr ?obj .
   ?pr sc:name ?prName .
   ?obj sc:name ?objName .
   
   FILTER regex(?objName, ".*[Aa]cademy.*[Aa]ward.*[Bb]est.*[Aa]ctress*") .
   
}
# group by 
order by ?prName
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q103618'), ('prName', 'award received'), ('objName', 'Academy Award for Best Actress')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q106301'), ('prName', 'award received'), ('objName', 'Academy Award for Best Supporting Actress')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q2102534'), ('prName', 'award received'), ('objName', 'Polish Academy Award for Best Supporting Actress')]
[('pr', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q928164'), ('prName', 'award received'), ('objName', 'Polish Academy Award for Best Actress')]
[('pr', 'http://www.wikidata.org/prop/direct/P1411'), ('obj', 'http://www.wikidata.org/entity/Q103618'), ('prName', 'nominated for'), ('objName', 'Academy Award for Best Actress')]
[('pr', 'http://www.wikidata.org/prop/direct/P1411

In [311]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films produced by either television production company or film production company
# using produced field to group companies
# using award_Recieved field to filter for Academic award for bes actress

# Q11424 = films
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer
# P166 = award received
# Q103618 =  Academic award for best actress

queryString = """
SELECT DISTINCT ?producer ?producerName (count(?randomFilm) as ?numberOfFilms)
WHERE {
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P166 wd:Q103618 .
   
   ?randomFilm wdt:P162 ?producer .
   {
       ?producer wdt:P31 wd:Q1762059.
   }
   UNION
   {
       ?producer wdt:P31 wd:Q10689397 .
   }
   ?producer sc:name ?producerName
}
group by ?producer ?producerName
order by desc (?numberOfFilms)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('producer', 'http://www.wikidata.org/entity/Q3791677'), ('producerName', 'John and James Woolf'), ('numberOfFilms', '1')]
1


Final query for this task

In [None]:
# write your final query

In [312]:
# we know that films are produced by someCompany which is an instance of television production company
# getting films produced by either television production company or film production company
# using produced field to group companies
# using award_Recieved field to filter for Academic award for bes actress

# Q11424 = films
# Q10689397 = television production company
# Q1762059 = film production company
# wdt:P31 = instance of
# P162 = producer
# P166 = award received
# Q103618 =  Academic award for best actress

queryString = """
SELECT DISTINCT ?producer ?producerName (count(?randomFilm) as ?numberOfFilms)
WHERE {
   ?randomFilm wdt:P31 wd:Q11424 .
   ?randomFilm wdt:P166 wd:Q103618 .
   
   ?randomFilm wdt:P162 ?producer .
   {
       ?producer wdt:P31 wd:Q1762059.
   }
   UNION
   {
       ?producer wdt:P31 wd:Q10689397 .
   }
   ?producer sc:name ?producerName
}
group by ?producer ?producerName
order by desc (?numberOfFilms)
LIMIT 20
"""

print("Results")
x=run_query(queryString)


Results
[('producer', 'http://www.wikidata.org/entity/Q3791677'), ('producerName', 'John and James Woolf'), ('numberOfFilms', '1')]
1
