# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.

In [1]:
## SETUP used later
import sys
import os
import json
import pandas as pd
sys.path.insert(1, '/locale/data/jupyter/prando/notebook/sparqlthesis/')
import modules.evaluation as evaluation
from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-movie1-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString,verbose = True):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        results = sparql.query()
        json_results = results.convert()
        if len(json_results['results']['bindings'])==0:
            print("Empty")
            return []
        array = []
        for bindings in json_results['results']['bindings']:
            app =  [ (var, value['value'])  for var, value in bindings.items() ] 
            if verbose:
                print( app)
            array.append(app)
        if verbose:
            print(len(array))
        return array

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://gracevirtuoso.dei.unipd.it/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Movie Workflow Series ("Film Genre and composer explorative search") 

Consider the following exploratory information need:

> investigate the results concerning the different film genre over years and the composers for the cinema.

## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | occupation    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P27`     | nationality   | predicate |
| `wdt:P3342`     | Significant person       | predicate |
| `wd:Q5`| Human       | node |
| `wd:Q25089`| Woody Allen       | node |





Also consider

```
wd:Q25089 ?p ?obj .
```

is the BGP to retrieve all **properties of Woody Allen**

Please consider that when you return a resource, you should return the IRI and the label of the resource. In particular, when the task require you to identify a BGP the result set must always be a list of couples IRI - label.

The workload should:


1. Identify the BGP for films

2. Identify the BGP for composer

3. Identify the BGP for film genre

4. Find how many films are been released in the United States of America from 2010-01-01 to 2015-31-12 for each film genre available (the result set must be genre IRI, label and #films).

5. Consider the timespan from 2001-01-01 to nowadays. Find the number of films released in this timespan divided by film genre and return only those with more than 50 films released every year (the result set must be genre IRI and label).

6. Consider the composers of Western films and their country of citizenship. Count the number of Western Films on which worked a composer grouped by country of citizenship and return the top-10 (the result set must be country IRI, label and #films).. 

7. Consider the decades from 1961 to 1970, and from 2001 to 2010 and compare the total number of western films released in each one (the result set must be two couples decade and #western films of that decade).

8. Consider the decades from 1961 to 1970, and from 2001 to 2010 and select only western films. Then, for each decade, compare the average cast members size per film (the result set must be two couples decade and average cast members size of that decade).

In [2]:
## startup the evaluation
# setup the file and create the empty json
ipname = "m1.ipynb"
pt = os.getcwd()+os.sep+ipname
evaluation.setup(pt)

The index of this workflow is: 2_5


## Task 1

In [3]:
# properties of Woody Allen
queryString = """
SELECT DISTINCT ?p ?name
WHERE { 
    wd:Q25089 ?p ?obj.
    ?p sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1003'), ('name', 'National Library of Romania ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1005'), ('name', 'Portuguese National Library ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('name', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1015'), ('name', 'NORAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P103'), ('name', 'native language')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('name', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('name', 'employer')]
[('p', 'http://www.wikidata.org/prop/direct/P109'), ('name', 'signature')]
[('p', 'http://www.wikidata.org/prop/direct/P1150'), ('name', 'Regensburg Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P1207'), ('name', 'NUKAT ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1220'), ('name', 'Internet Broadway Database person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P

In [4]:
# Allen as object of the triples, properties incoming
queryString = """
SELECT DISTINCT ?p ?name
WHERE { 
    ?obj ?p wd:Q25089.
    ?p sc:name ?name.
} 
"""

print("Results")
x=run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('name', 'employer')]
[('p', 'http://www.wikidata.org/prop/direct/P1269'), ('name', 'facet of')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('name', 'winner')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('name', 'cast member')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('name', 'creator')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('name', 'performer')]
[('p', 'http://www.wikidata.org/prop/direct/P180'), ('name', 'depicts')]
[('p', 'http://www.wikidata.org/prop/direct/P22'), ('name', 'father')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('name', 'spouse')]
[('p', 'http://www.wikidata.org/prop/direct/P2650'), ('name', 'interested in')]
[('p', 'http://www.wikidata.org/prop/direct/P301'), ('name', "category's main topic")]
[('p', 'http://www.wikidata.org/prop/direct/P3373'), ('name', 'sibling')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('name', 'child')]
[('p', 'http://www.

In [5]:
# use director
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    ?obj wdt:P57 wd:Q25089 .
    ?obj sc:name ?name.
} 
LIMIT 10
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q206124'), ('name', 'Midnight in Paris')]
[('obj', 'http://www.wikidata.org/entity/Q682262'), ('name', 'Alice')]
[('obj', 'http://www.wikidata.org/entity/Q14511869'), ('name', 'Magic in the Moonlight')]
[('obj', 'http://www.wikidata.org/entity/Q806092'), ('name', 'Bananas')]
[('obj', 'http://www.wikidata.org/entity/Q1004531'), ('name', 'Bullets Over Broadway')]
[('obj', 'http://www.wikidata.org/entity/Q1321004'), ('name', 'You Will Meet a Tall Dark Stranger')]
[('obj', 'http://www.wikidata.org/entity/Q971865'), ('name', "What's Up, Tiger Lily?")]
[('obj', 'http://www.wikidata.org/entity/Q729026'), ('name', 'Broadway Danny Rose')]
[('obj', 'http://www.wikidata.org/entity/Q733677'), ('name', 'Match Point')]
[('obj', 'http://www.wikidata.org/entity/Q740143'), ('name', 'Husbands and Wives')]
10


In [6]:
# instance of films
queryString = """
SELECT DISTINCT ?inst ?name
WHERE { 
    ?obj wdt:P57 wd:Q25089 ;
        wdt:P31 ?inst.
    ?inst sc:name ?name.
}
"""

print("Results")
x=run_query(queryString)

Results
[('inst', 'http://www.wikidata.org/entity/Q11424'), ('name', 'film')]
[('inst', 'http://www.wikidata.org/entity/Q506240'), ('name', 'television film')]
[('inst', 'http://www.wikidata.org/entity/Q5398426'), ('name', 'television series')]
[('inst', 'http://www.wikidata.org/entity/Q24862'), ('name', 'short film')]
[('inst', 'http://www.wikidata.org/entity/Q47467768'), ('name', 'operatic production')]
5


In [7]:
### insert the result of TASK 1 in the file
og_uri = "http://www.wikidata.org/entity/Q11424"
og_name = "film"
obj = {"uri":og_uri,"name":og_name}
evaluation.add_result(evaluation.get_index_workflow(pt),"1", evaluation.TYPE_SINGLE ,"uri", [obj] ,"all")

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


`wd:Q11424` -> film

## Task 2

In [8]:
# properties of films
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        ?prop ?obj.
    ?prop sc:name ?name.
}
ORDER BY ?name
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P7003'), ('name', 'ACMI web ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P3593'), ('name', 'AFI Catalog of Feature Films ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P7118'), ('name', 'AMPAS collections film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P6151'), ('name', 'ANICA ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P9032'), ('name', 'AVN movie ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P7194'), ('name', 'AZLyrics.com artist ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P9283'), ('name', 'AaRC title ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P6145'), ('name', 'Academy Awards Database film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P6150'), ('name', 'Academy Awards Database nominee ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P7777'), ('name', 'AdoroCinema film ID')]
[('prop', 'http://www.wikidata.org/prop/direct/P5083'), ('name', 'Adult Film Database 

In [9]:
#use the property P86 to list some composers
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P86 ?obj.
    ?obj sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q11720172'), ('name', 'Janusz Grudziński')]
[('obj', 'http://www.wikidata.org/entity/Q11767762'), ('name', 'Marcin Pospieszalski')]
[('obj', 'http://www.wikidata.org/entity/Q11817456'), ('name', 'Piotr Figiel')]
[('obj', 'http://www.wikidata.org/entity/Q13036911'), ('name', 'Aleksandër Lalo')]
[('obj', 'http://www.wikidata.org/entity/Q3299298'), ('name', 'Avni Mula')]
[('obj', 'http://www.wikidata.org/entity/Q4759658'), ('name', 'Andrzej Markowski')]
[('obj', 'http://www.wikidata.org/entity/Q7196950'), ('name', 'Piotr Hertel')]
[('obj', 'http://www.wikidata.org/entity/Q9285011'), ('name', 'Hajg Zaharian')]
[('obj', 'http://www.wikidata.org/entity/Q9312585'), ('name', 'Roman Rewakowicz')]
[('obj', 'http://www.wikidata.org/entity/Q159250'), ('name', 'Daniel Landa')]
[('obj', 'http://www.wikidata.org/entity/Q10287874'), ('name', 'Gabriel Migliori')]
[('obj', 'http://www.wikidata.org/entity/Q1692106'), ('name', 'Johan Zachrisson')]
[('obj', 

In [10]:
# look at their occupation
queryString = """
SELECT DISTINCT ?inst ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P86 ?obj.
    ?obj wdt:P106 ?inst.
    ?inst sc:name ?name.
    FILTER(REGEX(?name,"ompo")).
}
"""

print("Results")
x=run_query(queryString)

Results
[('inst', 'http://www.wikidata.org/entity/Q1415090'), ('name', 'film score composer')]
[('inst', 'http://www.wikidata.org/entity/Q36834'), ('name', 'composer')]
[('inst', 'http://www.wikidata.org/entity/Q21680731'), ('name', 'opera composer')]
[('inst', 'http://www.wikidata.org/entity/Q21680663'), ('name', 'classical composer')]
[('inst', 'http://www.wikidata.org/entity/Q21680699'), ('name', 'operetta composer')]
[('inst', 'http://www.wikidata.org/entity/Q24256060'), ('name', 'professor of music composition')]
[('inst', 'http://www.wikidata.org/entity/Q64356038'), ('name', 'musical theatre composer')]
[('inst', 'http://www.wikidata.org/entity/Q8008937'), ('name', 'Category:Composers')]
[('inst', 'http://www.wikidata.org/entity/Q63536580'), ('name', 'video game composer')]
[('inst', 'http://www.wikidata.org/entity/Q5156675'), ('name', 'Compo')]
10


In [11]:
### insert the result of TASK 2 in the file
og_uri = "http://www.wikidata.org/entity/Q36834"
og_name = "composer"
obj = {"uri":og_uri,"name":og_name}
evaluation.add_result(evaluation.get_index_workflow(pt),"2", evaluation.TYPE_SINGLE ,"uri", [obj] ,"all")

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


Find `wd:Q36834` -> composer

## Task 3

In [12]:
# use the property genre (P136) found some queries above
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P136 ?obj.
    ?obj sc:name ?name.
}
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q1054574'), ('name', 'romance film')]
[('obj', 'http://www.wikidata.org/entity/Q130232'), ('name', 'drama')]
[('obj', 'http://www.wikidata.org/entity/Q157443'), ('name', 'comedy film')]
[('obj', 'http://www.wikidata.org/entity/Q188473'), ('name', 'action film')]
[('obj', 'http://www.wikidata.org/entity/Q200092'), ('name', 'horror film')]
[('obj', 'http://www.wikidata.org/entity/Q2297927'), ('name', 'spy film')]
[('obj', 'http://www.wikidata.org/entity/Q369747'), ('name', 'war film')]
[('obj', 'http://www.wikidata.org/entity/Q52162262'), ('name', 'film based on literature')]
[('obj', 'http://www.wikidata.org/entity/Q645928'), ('name', 'biographical film')]
[('obj', 'http://www.wikidata.org/entity/Q846544'), ('name', 'disaster film')]
[('obj', 'http://www.wikidata.org/entity/Q52207399'), ('name', 'film based on a novel')]
[('obj', 'http://www.wikidata.org/entity/Q17013749'), ('name', 'historical film')]
[('obj', 'http://www.wikidata.org/en

In [13]:
# use the property genre (P136) found some queries above and get the class
queryString = """
SELECT DISTINCT ?inst ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P136 ?obj.
    ?obj wdt:P31 ?inst.
    ?inst sc:name ?name.
}
"""

print("Results")
x=run_query(queryString)

Results
[('inst', 'http://www.wikidata.org/entity/Q108329096'), ('name', 'genre by art form')]
[('inst', 'http://www.wikidata.org/entity/Q5937792'), ('name', 'crime fiction')]
[('inst', 'http://www.wikidata.org/entity/Q201658'), ('name', 'film genre')]
[('inst', 'http://www.wikidata.org/entity/Q1194240'), ('name', 'science fiction genre')]
[('inst', 'http://www.wikidata.org/entity/Q223393'), ('name', 'literary genre')]
[('inst', 'http://www.wikidata.org/entity/Q1792379'), ('name', 'art genre')]
[('inst', 'http://www.wikidata.org/entity/Q20076756'), ('name', 'speculative fiction genre')]
[('inst', 'http://www.wikidata.org/entity/Q108465955'), ('name', 'fiction genre')]
[('inst', 'http://www.wikidata.org/entity/Q15961987'), ('name', 'television genre')]
[('inst', 'http://www.wikidata.org/entity/Q7777573'), ('name', 'theatrical genre')]
[('inst', 'http://www.wikidata.org/entity/Q5151404'), ('name', 'comedic genre')]
[('inst', 'http://www.wikidata.org/entity/Q20656220'), ('name', 'thriller

In [15]:
# film genre (wd:Q201658)
queryString = """
SELECT DISTINCT ?inst ?name
WHERE { 
    BIND(wd:Q201658 AS ?inst).
    ?inst sc:name ?name.
}
"""

print("Results")
x=run_query(queryString)

Results
[('inst', 'http://www.wikidata.org/entity/Q201658'), ('name', 'film genre')]
1


In [16]:
### insert the result of TASK 3 in the file
og_uri = "http://www.wikidata.org/entity/Q201658"
og_name = "film genre"
obj = {"uri":og_uri,"name":og_name}
evaluation.add_result(evaluation.get_index_workflow(pt),"3", evaluation.TYPE_SINGLE ,"uri", [obj] ,"all")

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


Found `wd:Q201658` -> film genre

## Task 4

Need to find date of release and also the country of release

In [17]:
# find the date and the country
queryString = """
SELECT DISTINCT ?prop ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        ?prop ?obj.
    ?prop sc:name ?name.
    FILTER(REGEX(?name,"date") || REGEX(?name,"ountr") ).
}
"""

print("Results")
x=run_query(queryString)

Results
[('prop', 'http://www.wikidata.org/prop/direct/P577'), ('name', 'publication date')]
[('prop', 'http://www.wikidata.org/prop/direct/P495'), ('name', 'country of origin')]
[('prop', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('prop', 'http://www.wikidata.org/prop/direct/P2754'), ('name', 'production date')]
[('prop', 'http://www.wikidata.org/prop/direct/P3893'), ('name', 'public domain date')]
[('prop', 'http://www.wikidata.org/prop/direct/P2913'), ('name', 'date depicted')]
[('prop', 'http://www.wikidata.org/prop/direct/P1191'), ('name', 'date of first performance')]
[('prop', 'http://www.wikidata.org/prop/direct/P1619'), ('name', 'date of official opening')]
[('prop', 'http://www.wikidata.org/prop/direct/P6949'), ('name', 'announcement date')]
[('prop', 'http://www.wikidata.org/prop/direct/P3999'), ('name', 'date of official closure')]
[('prop', 'http://www.wikidata.org/prop/direct/P569'), ('name', 'date of birth')]
[('prop', 'http://www.wikidata.org/pro

In [19]:
# use the property publication date and country of origin. But first need to find United States
queryString = """
SELECT DISTINCT ?obj ?name
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P495 ?obj.
    ?obj sc:name ?name.
    FILTER(REGEX(?name,"[U,u]")).
}
ORDER BY DESC (?name)
LIMIT 20
"""

print("Results")
x=run_query(queryString)

Results
[('obj', 'http://www.wikidata.org/entity/Q42537'), ('name', 'flag of the United States')]
[('obj', 'http://www.wikidata.org/entity/Q165230'), ('name', 'flag of the Soviet Union')]
[('obj', 'http://www.wikidata.org/entity/Q380675'), ('name', 'cinema of the United States')]
[('obj', 'http://www.wikidata.org/entity/Q36704'), ('name', 'Yugoslavia')]
[('obj', 'http://www.wikidata.org/entity/Q41304'), ('name', 'Weimar Republic')]
[('obj', 'http://www.wikidata.org/entity/Q717'), ('name', 'Venezuela')]
[('obj', 'http://www.wikidata.org/entity/Q265'), ('name', 'Uzbekistan')]
[('obj', 'http://www.wikidata.org/entity/Q77'), ('name', 'Uruguay')]
[('obj', 'http://www.wikidata.org/entity/Q30'), ('name', 'United States of America')]
[('obj', 'http://www.wikidata.org/entity/Q174193'), ('name', 'United Kingdom of Great Britain and Ireland')]
[('obj', 'http://www.wikidata.org/entity/Q145'), ('name', 'United Kingdom')]
[('obj', 'http://www.wikidata.org/entity/Q170468'), ('name', 'United Arab Repu

In [20]:
# use the property publication date and country of origin. USA (Q30)
queryString = """
SELECT DISTINCT ?genre ?name COUNT(DISTINCT ?film) AS ?counts
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P495 wd:Q30;
        wdt:P136 ?genre;
        wdt:P577 ?date.
    ?genre wdt:P31 wd:Q201658.
    FILTER ( isLiteral(?date) ) .
    FILTER(?date>=xsd:date("2010-01-01") && ?date<xsd:date("2016-01-01")).
    ?genre sc:name ?name.
}
GROUP BY ?genre ?name
ORDER BY DESC (?counts)
LIMIT 10
"""

print("Results")
x=run_query(queryString)

Results
[('genre', 'http://www.wikidata.org/entity/Q93204'), ('name', 'documentary film'), ('counts', '1648')]
[('genre', 'http://www.wikidata.org/entity/Q130232'), ('name', 'drama'), ('counts', '1584')]
[('genre', 'http://www.wikidata.org/entity/Q157443'), ('name', 'comedy film'), ('counts', '687')]
[('genre', 'http://www.wikidata.org/entity/Q200092'), ('name', 'horror film'), ('counts', '663')]
[('genre', 'http://www.wikidata.org/entity/Q188473'), ('name', 'action film'), ('counts', '573')]
[('genre', 'http://www.wikidata.org/entity/Q2484376'), ('name', 'thriller film'), ('counts', '521')]
[('genre', 'http://www.wikidata.org/entity/Q471839'), ('name', 'science fiction film'), ('counts', '318')]
[('genre', 'http://www.wikidata.org/entity/Q20442589'), ('name', 'LGBT-related film'), ('counts', '311')]
[('genre', 'http://www.wikidata.org/entity/Q859369'), ('name', 'comedy-drama'), ('counts', '290')]
[('genre', 'http://www.wikidata.org/entity/Q959790'), ('name', 'crime film'), ('counts', 

In [23]:
# use the property publication date and country of origin. USA (Q30)
queryString = """
SELECT DISTINCT ?genre ?name COUNT(DISTINCT ?film) AS ?counts
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P495 wd:Q30;
        wdt:P136 ?genre;
        wdt:P577 ?date.
    ?genre wdt:P31 wd:Q201658.
    FILTER ( isLiteral(?date) ) .
    FILTER(?date>=xsd:date("1960-01-01") && ?date<xsd:date("1965-01-01")).
    ?genre sc:name ?name.
}
GROUP BY ?genre ?name
ORDER BY DESC (?counts)
LIMIT 10
"""

print("Results")
x=run_query(queryString)

Results
[('genre', 'http://www.wikidata.org/entity/Q130232'), ('name', 'drama'), ('counts', '339')]
[('genre', 'http://www.wikidata.org/entity/Q157443'), ('name', 'comedy film'), ('counts', '124')]
[('genre', 'http://www.wikidata.org/entity/Q200092'), ('name', 'horror film'), ('counts', '81')]
[('genre', 'http://www.wikidata.org/entity/Q860626'), ('name', 'romantic comedy'), ('counts', '81')]
[('genre', 'http://www.wikidata.org/entity/Q172980'), ('name', 'Western film'), ('counts', '77')]
[('genre', 'http://www.wikidata.org/entity/Q369747'), ('name', 'war film'), ('counts', '71')]
[('genre', 'http://www.wikidata.org/entity/Q842256'), ('name', 'musical film'), ('counts', '66')]
[('genre', 'http://www.wikidata.org/entity/Q471839'), ('name', 'science fiction film'), ('counts', '58')]
[('genre', 'http://www.wikidata.org/entity/Q959790'), ('name', 'crime film'), ('counts', '52')]
[('genre', 'http://www.wikidata.org/entity/Q1054574'), ('name', 'romance film'), ('counts', '50')]
10


In [24]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"4", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


## Task 5

In [28]:
# look at the year
queryString = """
SELECT DISTINCT(YEAR(?date)) AS ?max_year
WHERE {
   ?film wdt:P577 ?date ;
         wdt:P136 ?genre .
   ?genre wdt:P31 wd:Q201658 .
   FILTER(isLiteral(?date)).
   FILTER(?date >="2018-01-01T00:00:00Z"^^xsd:dateTime)
}
"""

print("Results")
x=run_query(queryString)

Results
[('max_year', '2018')]
[('max_year', '2019')]
[('max_year', '2021')]
[('max_year', '2020')]
[('max_year', '2022')]
[('max_year', '2023')]
[('max_year', '2115')]
[('max_year', '2040')]
[('max_year', '2100')]
[('max_year', '2024')]
[('max_year', '2188')]
11


In [35]:
# find all the genres from 2001 to now that have more than 50 films for each year
# we consider films publicated up to 2021 given that the instance of Wikidata is from 2021
queryString = """
SELECT ?genre ?g_name 
WHERE{
    
    {
        SELECT DISTINCT ?genre YEAR(?date) AS ?year (COUNT(?film) AS ?films)
        WHERE {
           ?film wdt:P577 ?date ;
                 wdt:P136 ?genre .
           ?genre wdt:P31 wd:Q201658 .
           FILTER(?date >="2001-01-01T00:00:00Z"^^xsd:dateTime && ?date <="2022-01-01T00:00:00Z"^^xsd:dateTime)
        }
        GROUP BY ?genre YEAR(?date)
    }
    ?genre sc:name ?g_name .
    FILTER (?films > 50)
}
GROUP BY ?genre ?g_name
HAVING (COUNT(DISTINCT ?year) >20)
"""

print("Results")
x=run_query(queryString)

Results
[('genre', 'http://www.wikidata.org/entity/Q2484376'), ('g_name', 'thriller film')]
[('genre', 'http://www.wikidata.org/entity/Q959790'), ('g_name', 'crime film')]
[('genre', 'http://www.wikidata.org/entity/Q93204'), ('g_name', 'documentary film')]
[('genre', 'http://www.wikidata.org/entity/Q188473'), ('g_name', 'action film')]
[('genre', 'http://www.wikidata.org/entity/Q157443'), ('g_name', 'comedy film')]
[('genre', 'http://www.wikidata.org/entity/Q130232'), ('g_name', 'drama')]
[('genre', 'http://www.wikidata.org/entity/Q319221'), ('g_name', 'adventure film')]
[('genre', 'http://www.wikidata.org/entity/Q200092'), ('g_name', 'horror film')]
8


In [36]:
obj = [{"uri":r[0][1],"name":r[1][1]} for r in x]
evaluation.add_result(evaluation.get_index_workflow(pt),"5", evaluation.TYPE_SET ,"uri", obj)

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


## Task 6

In [38]:
# find top-10 composers of western films
queryString = """
SELECT DISTINCT ?country ?name COUNT(DISTINCT ?film) AS ?counts
WHERE { 
    ?film wdt:P31 wd:Q11424;
        wdt:P86 ?composer;
        wdt:P136 wd:Q172980.
    ?composer wdt:P27 ?country.
    ?country sc:name ?name.
}
GROUP BY ?country ?name
ORDER BY DESC (?counts)
LIMIT 10
"""

print("Results")
x=run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q30'), ('name', 'United States of America'), ('counts', '1024')]
[('country', 'http://www.wikidata.org/entity/Q145'), ('name', 'United Kingdom'), ('counts', '54')]
[('country', 'http://www.wikidata.org/entity/Q34266'), ('name', 'Russian Empire'), ('counts', '45')]
[('country', 'http://www.wikidata.org/entity/Q38'), ('name', 'Italy'), ('counts', '40')]
[('country', 'http://www.wikidata.org/entity/Q183'), ('name', 'Germany'), ('counts', '38')]
[('country', 'http://www.wikidata.org/entity/Q40'), ('name', 'Austria'), ('counts', '36')]
[('country', 'http://www.wikidata.org/entity/Q15180'), ('name', 'Soviet Union'), ('counts', '36')]
[('country', 'http://www.wikidata.org/entity/Q28513'), ('name', 'Austria-Hungary'), ('counts', '31')]
[('country', 'http://www.wikidata.org/entity/Q172579'), ('name', 'Kingdom of Italy'), ('counts', '31')]
[('country', 'http://www.wikidata.org/entity/Q142'), ('name', 'France'), ('counts', '20')]
10


In [39]:
## single literal associated to an URI
objs = []
for i in x:
    f_uri = i[0][1]
    f_name = i[1][1]
    val = i[2][1]
    obj = {}
    obj["refers_to"] = f_uri
    obj["refers_to_name"] = f_name
    obj["check"] = "value"
    obj["value"]= val
    objs.append(obj)
evaluation.add_result(evaluation.get_index_workflow(pt),"6", evaluation.TYPE_REFERRED ,"value", objs)

The index of this workflow is: 2_5
The path is /locale/data/jupyter/prando/notebook/2022/results/workflow2_5.json
JSON object updated


## Task 7

In [44]:
# compare the decades
queryString = """
SELECT COUNT(DISTINCT ?film) AS ?counts
WHERE { 
    ?film wdt:P136 wd:Q172980 ;
        wdt:P31 wd:Q11424;
        wdt:P577 ?date.
    FILTER(?date>=xsd:date("1961-01-01") && ?date<xsd:date("1971-01-01")).
}
"""

print("Results")
x=run_query(queryString)

Results
[('counts', '265')]
1


In [45]:
# compare the decades
queryString = """
SELECT COUNT(DISTINCT ?film) AS ?counts
WHERE { 
    ?film wdt:P136 wd:Q172980 ;
        wdt:P31 wd:Q11424;
        wdt:P577 ?date.
    FILTER(?date>=xsd:date("2001-01-01") && ?date<xsd:date("2011-01-01")).
}
"""

print("Results")
x=run_query(queryString)

Results
[('counts', '65')]
1


In [46]:
# find n_western films between 1961-1970 and 2001-2010 
queryString = """
SELECT DISTINCT ?decade (COUNT( DISTINCT ?film) AS ?films)
        WHERE {
           ?film wdt:P136 wd:Q172980 ;
                 wdt:P31 wd:Q11424 ;
                 wdt:P577 ?date .
           FILTER((YEAR(?date) > 1960 && YEAR(?date) <= 1970) || (YEAR(?date) > 2000 && YEAR(?date) <= 2010))
           BIND (if(YEAR(?date) > 1960 && YEAR(?date) <= 1970,"1961-1970", "2001-2010") AS ?decade)
        }
        GROUP BY ?decade
LIMIT 100
"""

print("Results")
x=run_query(queryString)

Results
[('decade', '2001-2010'), ('films', '64')]
[('decade', '1961-1970'), ('films', '250')]
2


### Task 8

In [48]:
# find average cast members size for western films between 1961-1970 and 2001-2010
queryString = """
SELECT DISTINCT ?decade (AVG(?cast_size) AS ?avg_cast)
WHERE {
    {
        SELECT DISTINCT ?decade ?film (COUNT(?cast) AS ?cast_size)
        WHERE {
           ?film wdt:P136 wd:Q172980 ;
                 wdt:P31 wd:Q11424 ;
                 wdt:P577 ?date ;
                 wdt:P161 ?cast .
           FILTER((YEAR(?date) > 1960 && YEAR(?date) <= 1970) || (YEAR(?date) > 2000 && YEAR(?date) <= 2010))
           BIND (if(YEAR(?date) > 1960 && YEAR(?date) <= 1970,"1961-1970","2001-2010") AS ?decade)
        }
    }
}
GROUP BY ?decade
LIMIT 100
"""

print("Results")
x=run_query(queryString)

Results
[('decade', '2001-2010'), ('avg_cast', '12.272727272727273')]
[('decade', '1961-1970'), ('avg_cast', '13.259414225941423')]
2
