# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-7bcf47b429-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Authors comparison explorative search") 

Consider the following exploratory scenario:


>  Investigate the the production of Paul Auster and Ian McEwan, check how many books they have written for each litarature genre, gather information about their production and about their works which are not books



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wd:Q47461344`| writtenwork   | node |
| `wd:Q214642`  | Paul Auster   | node |
| `wd:Q190379`  | Ian McEwan    | node |




Also consider

```
wd:Q214642 ?rel ?obj  . 
```
is the BGP to retrieve all **properties of Paul Auster**


The workload should


1. Identify the BGP for obtaining the books (with publishing date and genre) published by the two authors

2. Did the authors published a book in the same year? What is the longest period without publishing a book for the two authors? 

3. Did the authors produced, acted or directed a film? If so, did they write the screenplay?

4. How many films were derived from the books of these two authors? 

5. Which author won more literature-related awards? Have they ever being nominated for a Nobel award? 

In [6]:
# start your workflow here

In [7]:
queryString = """
SELECT COUNT( ?obj)
WHERE { 
wd:Q214642 ?rel ?obj  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '191')]


1

## Task 1

1. Identify the BGP for obtaining the books (with publishing date and genre) published by the two authors

#### Comment
**IMPORTANT** From now on I'm going to refer to Paul Auster as PA and Ian McEwan as IM

In [8]:
#Let's check the properties for PA
queryString = """
SELECT *
WHERE { 
wd:Q214642 ?p ?obj  .

# this returns the labels
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1005'), ('obj', '72675'), ('pname', 'Portuguese National Library ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('obj', '072139331'), ('pname', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('obj', 'http://www.wikidata.org/entity/Q8242'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1015'), ('obj', '90118966'), ('pname', 'NORAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q11774202'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q14467526'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q1622272'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q18814623'), ('pname', 'occupation')]
[('p',

163

In [9]:
#Let's check the properties for IM
queryString = """
SELECT *
WHERE { 
wd:Q190379 ?p ?obj  .

# this returns the labels
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2255'), ('obj', '1647'), ('pname', "Debrett's People of Today ID")]
[('p', 'http://www.wikidata.org/prop/direct/P5502'), ('obj', 'ian-mcewan'), ('pname', 'LRB contributor ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('obj', '068842910'), ('pname', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1015'), ('obj', '90068931'), ('pname', 'NORAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q214917'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q28389'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q3282637'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('obj', 'http://www.wikidata.org/entity/Q36180'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/

168

In [10]:
#Let's check the incoming edges for PA
queryString = """
SELECT *
WHERE { 
?sub ?p wd:Q214642  .

# this returns the labels
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('sub', 'http://www.wikidata.org/entity/Q2915460'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member')]
[('sub', 'http://www.wikidata.org/entity/Q2364684'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member')]
[('sub', 'http://www.wikidata.org/entity/Q3213673'), ('p', 'http://www.wikidata.org/prop/direct/P162'), ('pname', 'producer')]
[('sub', 'http://www.wikidata.org/entity/Q3213673'), ('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator')]
[('sub', 'http://www.wikidata.org/entity/Q445905'), ('p', 'http://www.wikidata.org/prop/direct/P22'), ('pname', 'father')]
[('sub', 'http://www.wikidata.org/entity/Q1280035'), ('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse')]
[('sub', 'http://www.wikidata.org/entity/Q259543'), ('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse')]
[('sub', 'http://www.wikidata.org/entity/Q56296595'), ('p', 'http://www.wikidata.org/prop/direct/P2789'), ('p

61

In [11]:
#Let's check the incoming edges for IM
queryString = """
SELECT *
WHERE { 
?sub ?p wd:Q190379  .

# this returns the labels
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('sub', 'http://www.wikidata.org/entity/Q160082'), ('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pname', 'winner')]
[('sub', 'http://www.wikidata.org/entity/Q5709276'), ('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pname', 'winner')]
[('sub', 'http://www.wikidata.org/entity/Q7348342'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member')]
[('sub', 'http://www.wikidata.org/entity/Q1626186'), ('p', 'http://www.wikidata.org/prop/direct/P162'), ('pname', 'producer')]
[('sub', 'http://www.wikidata.org/entity/Q3243784'), ('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse')]
[('sub', 'http://www.wikidata.org/entity/Q15831559'), ('p', 'http://www.wikidata.org/prop/direct/P301'), ('pname', "category's main topic")]
[('sub', 'http://www.wikidata.org/entity/Q1723053'), ('p', 'http://www.wikidata.org/prop/direct/P50'), ('pname', 'author')]
[('sub', 'http://www.wikidata.org/entity/Q58883639'), ('p', 'http://www.wikidata.org/prop/direc

44

#### Comment
Let's safe the "author" property reference

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P50`   | author   | predicate |

In [24]:
#It seems that the property "author" (wdt:P50) could connect a book with its author so let's check the properties of what is probably a book to confirm that we are actually talking about a book
#Let's check the property of a book
queryString = """
SELECT ?sname ?pname ?obj ?p
WHERE { 
?sub wdt:P50 wd:Q214642  ;
        ?p ?obj .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?p <http://schema.org/name> ?pname .
}
ORDER BY ASC (?sname)
"""

print("Results")
run_query(queryString)

Results
[('sname', '4 3 2 1'), ('pname', 'significant event'), ('obj', 'http://www.wikidata.org/entity/Q193484'), ('p', 'http://www.wikidata.org/prop/direct/P793')]
[('sname', '4 3 2 1'), ('pname', 'significant event'), ('obj', 'http://www.wikidata.org/entity/Q5149908'), ('p', 'http://www.wikidata.org/prop/direct/P793')]
[('sname', '4 3 2 1'), ('pname', 'significant event'), ('obj', 'http://www.wikidata.org/entity/Q4572764'), ('p', 'http://www.wikidata.org/prop/direct/P793')]
[('sname', '4 3 2 1'), ('pname', 'narrative location'), ('obj', 'http://www.wikidata.org/entity/Q25395'), ('p', 'http://www.wikidata.org/prop/direct/P840')]
[('sname', '4 3 2 1'), ('pname', 'narrative location'), ('obj', 'http://www.wikidata.org/entity/Q1408'), ('p', 'http://www.wikidata.org/prop/direct/P840')]
[('sname', '4 3 2 1'), ('pname', 'narrative location'), ('obj', 'http://www.wikidata.org/entity/Q60'), ('p', 'http://www.wikidata.org/prop/direct/P840')]
[('sname', '4 3 2 1'), ('pname', 'narrative location

492

In [35]:
#Let's check the "instance of" property of a book and in particular the destination node of that property
queryString = """
SELECT ?sname ?oname ?obj
WHERE { 
?sub wdt:P50 wd:Q214642 ;
    wdt:P31 ?obj .

# this returns the labels
?sub <http://schema.org/name> ?sname .
#?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?sname)
"""

print("Results")
run_query(queryString)

Results
[('sname', '4 3 2 1'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', '4 3 2 1'), ('oname', 'novel'), ('obj', 'http://www.wikidata.org/entity/Q8261')]
[('sname', "Auggie Wren's Christmas Story"), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'City of Glass'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'City of Glass'), ('oname', 'graphic novel'), ('obj', 'http://www.wikidata.org/entity/Q725377')]
[('sname', 'Ghosts'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'Here and Now: Letters, 2008-2011'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'I Thought My Father Was God'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'I Thought My Father Was God'), ('oname', 'short story collection'), ('obj', 'http://www.wi

35

#### Comment
As we can see we have the book class, but even other similar class like literary work, written work, novel, book series and graphic novel that can be connected to the book node with other paths.  
So we now need to check if this really happens.

In [38]:
#Let's check if there are any relations between literary work, written work, novel, book series and graphic novel classes with the book class (wd:Q571)  
queryString = """
SELECT DISTINCT ?oname ?pname ?p
WHERE { 
?sub wdt:P50 wd:Q214642 ;
    wdt:P31 ?obj .
?obj ?p wd:Q571.

FILTER(!(?obj = wd:Q3331189)).

# this returns the labels
?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?oname)
"""

print("Results")
run_query(queryString)

Results
[('oname', 'book series'), ('pname', 'has part'), ('p', 'http://www.wikidata.org/prop/direct/P527')]
[('oname', 'novel'), ('pname', 'subclass of'), ('p', 'http://www.wikidata.org/prop/direct/P279')]
[('oname', 'written work'), ('pname', 'different from'), ('p', 'http://www.wikidata.org/prop/direct/P1889')]


3

#### Comment
In this way we have discovered that a written work is considered different from a book, that novel is a sublass of book and nothing about literary work and graphic novel classes, so let's try to find additional information about this last two classes

In [42]:
#Let's find info about literary work (wd:Q7725634) and graphic novels (wd:Q725377)
queryString = """
SELECT DISTINCT ?oname ?pname ?propname ?p ?prop
WHERE { 
?sub wdt:P50 wd:Q214642 ;
    wdt:P31 ?obj .
?obj ?p ?prop.

FILTER(?obj = wd:Q7725634 || ?obj = wd:Q725377).

# this returns the labels
?prop <http://schema.org/name> ?propname .
?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?oname)
"""

print("Results")
run_query(queryString)

Results
[('oname', 'graphic novel'), ('pname', 'instance of'), ('propname', 'literary form'), ('p', 'http://www.wikidata.org/prop/direct/P31'), ('prop', 'http://www.wikidata.org/entity/Q4263830')]
[('oname', 'graphic novel'), ('pname', 'subclass of'), ('propname', 'novel'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('prop', 'http://www.wikidata.org/entity/Q8261')]
[('oname', 'graphic novel'), ('pname', 'instance of'), ('propname', 'literary genre'), ('p', 'http://www.wikidata.org/prop/direct/P31'), ('prop', 'http://www.wikidata.org/entity/Q223393')]
[('oname', 'graphic novel'), ('pname', 'subclass of'), ('propname', 'comic book album'), ('p', 'http://www.wikidata.org/prop/direct/P279'), ('prop', 'http://www.wikidata.org/entity/Q2831984')]
[('oname', 'graphic novel'), ('pname', 'instance of'), ('propname', 'art movement'), ('p', 'http://www.wikidata.org/prop/direct/P31'), ('prop', 'http://www.wikidata.org/entity/Q968159')]
[('oname', 'graphic novel'), ('pname', 'main subject'),

38

#### Comment
We have discoverd that graphic novel is subclass of novel and so we can consider it as a book, but at the same time that literary work is subclass of written work but even a distinct class 
(different from). This probably does not means that a written work is not a book but in terms of classes they are not the same. 

To put the word end at this problem I'm going to consider written works and literary work as books because if I search on the web one of them the results are books, and because otherwise the task looses it sense.

Now let's do the same process even for the other author IM.

In [43]:
#It seems that the property "author" (wdt:P50) could connect a book with its author so let's check the property of it to confirm that we are talking about a book
#Let's check the property of a book
queryString = """
SELECT ?sname ?pname ?obj ?p
WHERE { 
?sub wdt:P50 wd:Q190379  ;
        ?p ?obj .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?p <http://schema.org/name> ?pname .
}
ORDER BY ASC (?sname)
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Amsterdam'), ('pname', 'NNL item ID'), ('obj', '001983322'), ('p', 'http://www.wikidata.org/prop/direct/P3959')]
[('sname', 'Amsterdam'), ('pname', 'publisher'), ('obj', 'http://www.wikidata.org/entity/Q3277534'), ('p', 'http://www.wikidata.org/prop/direct/P123')]
[('sname', 'Amsterdam'), ('pname', 'genre'), ('obj', 'http://www.wikidata.org/entity/Q8261'), ('p', 'http://www.wikidata.org/prop/direct/P136')]
[('sname', 'Amsterdam'), ('pname', 'Encyclopædia Britannica Online ID'), ('obj', 'topic/Amsterdam-by-McEwan'), ('p', 'http://www.wikidata.org/prop/direct/P1417')]
[('sname', 'Amsterdam'), ('pname', 'title'), ('obj', 'Amsterdam'), ('p', 'http://www.wikidata.org/prop/direct/P1476')]
[('sname', 'Amsterdam'), ('pname', 'follows'), ('obj', 'http://www.wikidata.org/entity/Q2920448'), ('p', 'http://www.wikidata.org/prop/direct/P155')]
[('sname', 'Amsterdam'), ('pname', 'followed by'), ('obj', 'http://www.wikidata.org/entity/Q306619'), ('p', 'http://www.wikidata.org/prop/

335

In [44]:
#Let's check the "instance of" property to know 
queryString = """
SELECT ?sname ?oname ?obj
WHERE { 
?sub wdt:P50 wd:Q190379 ;
    wdt:P31 ?obj .

# this returns the labels
?sub <http://schema.org/name> ?sname .
#?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?sname)
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Amsterdam'), ('oname', 'written work'), ('obj', 'http://www.wikidata.org/entity/Q47461344')]
[('sname', 'Atonement'), ('oname', 'written work'), ('obj', 'http://www.wikidata.org/entity/Q47461344')]
[('sname', 'Atonement'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'Atonement'), ('oname', 'version, edition, or translation'), ('obj', 'http://www.wikidata.org/entity/Q3331189')]
[('sname', 'Black Dogs'), ('oname', 'written work'), ('obj', 'http://www.wikidata.org/entity/Q47461344')]
[('sname', 'Enduring Love'), ('oname', 'written work'), ('obj', 'http://www.wikidata.org/entity/Q47461344')]
[('sname', 'First Love, Last Rites'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'In Between the Sheets'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q7725634')]
[('sname', 'Machines Like Me'), ('oname', 'literary work'), ('obj', 'http://www.wikidata.org/entity/Q77

25

#### Comment 
From the other author analysis he don't discover anything new.

After all this considerations we can say that looking at the incoming edges of the two authors, with the property called "author" (wdt:P50) they are connected with a book they, or more in general a written/literature work, that they made.

So starting from here I can obtain the information of genre (*wdt:P136*) and publication date (*wdt:P577*) of the books.  
From the text of the current task I suppose to retrive only the books that have both publication date and genre.

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P50`   | author   | predicate |
| `wdt:P136`     | genre   | predicate |
| `wdt:P577`    | publication date    | predicate | 

In [51]:
#Let's find publication date and genre of each book of PA
queryString = """
SELECT ?sname (GROUP_CONCAT(DISTINCT ?gname ; separator = ", ") AS ?genre) ?pub
WHERE { 
?sub wdt:P50 wd:Q214642 ;
        wdt:P136 ?gen ;
        wdt:P577 ?pub .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?gen <http://schema.org/name> ?gname .
}
GROUP BY ?sname ?pub
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Man in the Dark'), ('genre', 'novel'), ('pub', '2008-01-01T00:00:00Z')]
[('sname', '4 3 2 1'), ('genre', 'Bildungsroman'), ('pub', '2017-01-01T00:00:00Z')]
[('sname', 'The Locked Room'), ('genre', 'novel'), ('pub', '1986-01-01T00:00:00Z')]
[('sname', 'Mr. Vertigo'), ('genre', 'novel'), ('pub', '1994-04-05T00:00:00Z')]
[('sname', 'Moon Palace'), ('genre', 'picaresque novel, romance novel'), ('pub', '1989-02-01T00:00:00Z')]
[('sname', 'Report from the Interior'), ('genre', 'memoir'), ('pub', '2013-11-19T00:00:00Z')]
[('sname', 'The Brooklyn Follies'), ('genre', 'novel'), ('pub', '2005-01-01T00:00:00Z')]
[('sname', "Auggie Wren's Christmas Story"), ('genre', 'short story'), ('pub', '1990-01-01T00:00:00Z')]
[('sname', 'The New York Trilogy'), ('genre', 'mystery fiction, novel'), ('pub', '1987-01-01T00:00:00Z')]
[('sname', 'Leviathan'), ('genre', 'crime novel, detective fiction, hardboiled'), ('pub', '1992-01-01T00:00:00Z')]
[('sname', 'Here and Now: Letters, 2008-2011')

22

In [52]:
#Let's find publication date and genre of each book of IM
queryString = """
SELECT ?sname (GROUP_CONCAT(DISTINCT ?gname ; separator = ", ") AS ?genre) ?pub
WHERE { 
?sub wdt:P50 wd:Q190379 ;
        wdt:P136 ?gen ;
        wdt:P577 ?pub .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?gen <http://schema.org/name> ?gname .
}
GROUP BY ?sname ?pub
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Amsterdam'), ('genre', 'novel'), ('pub', '1998-12-01T00:00:00Z')]
[('sname', 'The Children Act'), ('genre', 'novel'), ('pub', '2014-09-02T00:00:00Z')]
[('sname', 'Machines Like Me'), ('genre', 'novel'), ('pub', '2019-01-01T00:00:00Z')]
[('sname', 'On Chesil Beach'), ('genre', 'novel'), ('pub', '2007-01-01T00:00:00Z')]
[('sname', 'The Innocent (McEwan novel)'), ('genre', 'novel'), ('pub', '1990-01-01T00:00:00Z')]
[('sname', 'The Child in Time'), ('genre', 'novel'), ('pub', '1987-01-01T00:00:00Z')]
[('sname', 'Atonement'), ('genre', 'novel'), ('pub', '2001-01-01T00:00:00Z')]
[('sname', 'The Innocent'), ('genre', 'novel'), ('pub', '1990-01-01T00:00:00Z')]
[('sname', 'My Purple Scented Novel'), ('genre', 'novel'), ('pub', '2016-01-01T00:00:00Z')]
[('sname', 'Sweet Tooth'), ('genre', 'novel'), ('pub', '2012-01-01T00:00:00Z')]
[('sname', 'Enduring Love'), ('genre', 'novel'), ('pub', '1997-01-01T00:00:00Z')]
[('sname', 'The Comfort of Strangers'), ('genre', 'novel'), ('pub

17

#### Comment 
Using the GROUP_CONCAT, if a book has more than one genre we can list them and avoid book repetition, this final query shows the BGP requested by the task. By the way with the predicates cited in the previous comment I found the BGP requested from the task.

## Task 2

2. Did the authors published a book in the same year? What is the longest period without publishing a book for the two authors?

In [60]:
# Let's check the datatype of the publication date
queryString = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT DATATYPE(?pub) WHERE{
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?pub .
        
        
# this returns the labels
#?subpa <http://schema.org/name> ?pabookname .
#?subim <http://schema.org/name> ?imbookname .
}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', 'http://www.w3.org/2001/XMLSchema#dateTime')]


1

In [12]:
# Year() function try
queryString = """

SELECT (YEAR(?pub) AS ?date) WHERE{
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?pub .
        
# this returns the labels
#?subpa <http://schema.org/name> ?pabookname .
#?subim <http://schema.org/name> ?imbookname .
}
ORDER BY ASC (?date)  
"""

print("Results")
run_query(queryString)

Results
[('date', '1985')]
[('date', '1986')]
[('date', '1987')]
[('date', '1987')]
[('date', '1989')]
[('date', '1990')]
[('date', '1990')]
[('date', '1992')]
[('date', '1993')]
[('date', '1994')]
[('date', '1994')]
[('date', '1998')]
[('date', '1999')]
[('date', '2001')]
[('date', '2002')]
[('date', '2002')]
[('date', '2003')]
[('date', '2004')]
[('date', '2005')]
[('date', '2006')]
[('date', '2008')]
[('date', '2008')]
[('date', '2009')]
[('date', '2010')]
[('date', '2010')]
[('date', '2012')]
[('date', '2012')]
[('date', '2013')]
[('date', '2013')]
[('date', '2017')]


30

In [13]:
# Another test
queryString = """
SELECT ?date WHERE{
    ?subpa wdt:P50 wd:Q190379 ;
        wdt:P577 ?pub .
        
        BIND(YEAR(?pub) AS ?date).
        
# this returns the labels
#?subpa <http://schema.org/name> ?pabookname .
#?subim <http://schema.org/name> ?imbookname .
}
ORDER BY ASC (?date) 
"""

print("Results")
run_query(queryString)

Results
[('date', '1975')]
[('date', '1978')]
[('date', '1978')]
[('date', '1981')]
[('date', '1987')]
[('date', '1990')]
[('date', '1990')]
[('date', '1992')]
[('date', '1994')]
[('date', '1997')]
[('date', '1998')]
[('date', '2001')]
[('date', '2001')]
[('date', '2005')]
[('date', '2005')]
[('date', '2007')]
[('date', '2010')]
[('date', '2012')]
[('date', '2014')]
[('date', '2016')]
[('date', '2016')]
[('date', '2019')]
[('date', '2019')]


23

#### Comment
Publication dates are in xsd:dateTime format so I can apply the year function to extract the year

In [4]:
#Same year of publication and relative books 
queryString = """

SELECT ?pabookname ?imbookname ?pub WHERE{
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?p .
    ?subim wdt:P50 wd:Q190379 ;
        wdt:P577 ?d .
            
    BIND((YEAR(?p)) AS ?pub).
    BIND((YEAR(?d)) AS ?da).
    
    FILTER(?pub = ?da).
        
# this returns the labels
?subpa <http://schema.org/name> ?pabookname .
?subim <http://schema.org/name> ?imbookname .
}
"""

print("Results")
run_query(queryString)

Results
The operation failed EndPointInternalError: endpoint returned code 500 and response. 

Response:
b'Virtuoso 22007 Error DT001: Function year needs a datetime, date or time as argument 1, not an arg of type IRI_ID (243)\n\nSPARQL query:\ndefine sql:big-data-const 0\n#output-format:application/sparql-results+json\n\n##-7bcf47b429-##\nPREFIX wd: <http://www.wikidata.org/entity/> \nPREFIX wdt: <http://www.wikidata.org/prop/direct/> \nPREFIX sc: <http://schema.org/>\n\n\n\nSELECT ?pabookname ?imbookname ?pub WHERE{\n    ?subpa wdt:P50 wd:Q214642 ;\n        wdt:P577 ?p .\n    ?subim wdt:P50 wd:Q190379 ;\n        wdt:P577 ?d .\n            \n    BIND((YEAR(?p)) AS ?pub).\n    BIND((YEAR(?d)) AS ?da).\n    \n    FILTER(?pub = ?da).\n        \n# this returns the labels\n?subpa <http://schema.org/name> ?pabookname .\n?subim <http://schema.org/name> ?imbookname .\n}\n'


In [8]:
#Another try
queryString = """

SELECT ?pabookname ?imbookname ?pub WHERE{
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?p .
    
    BIND((YEAR(?p)) AS ?pub).
    
    FILTER(?pub = ?da)
    {
        SELECT ?subim ?da WHERE
        {
            
            ?subim wdt:P50 wd:Q190379 ;
                wdt:P577 ?d .
            
            BIND((YEAR(?d)) AS ?da).
        }
    }    
# this returns the labels
?subpa <http://schema.org/name> ?pabookname .
?subim <http://schema.org/name> ?imbookname .
}
"""

print("Results")
run_query(queryString)

Results
The operation failed EndPointInternalError: endpoint returned code 500 and response. 

Response:
b'Virtuoso 22007 Error DT001: Function year needs a datetime, date or time as argument 1, not an arg of type IRI_ID (243)\n\nSPARQL query:\ndefine sql:big-data-const 0\n#output-format:application/sparql-results+json\n\n##-7bcf47b429-##\nPREFIX wd: <http://www.wikidata.org/entity/> \nPREFIX wdt: <http://www.wikidata.org/prop/direct/> \nPREFIX sc: <http://schema.org/>\n\n\n\nSELECT ?pabookname ?imbookname ?pub WHERE{\n    ?subpa wdt:P50 wd:Q214642 ;\n        wdt:P577 ?p .\n    \n    BIND((YEAR(?p)) AS ?pub).\n    \n    FILTER(?pub = ?da)\n    {\n        SELECT ?subim ?da WHERE\n        {\n            \n            ?subim wdt:P50 wd:Q190379 ;\n                wdt:P577 ?d .\n            \n            BIND((YEAR(?d)) AS ?da).\n        }\n    }    \n# this returns the labels\n?subpa <http://schema.org/name> ?pabookname .\n?subim <http://schema.org/name> ?imbookname .\n}\n'


#### Comment
I don't know why but the FILTER(?pub = ?date) doesn't work, even knowing that ?pub and ?da are integer and that the BIND is correct. If I use the same filter without using the function year I works but the results are not matching the requirements of the task (query below to see this implementation)

In [127]:
#Same year of publication and relative books looking at the dates without using the year function
queryString = """

SELECT DISTINCT ?pabookname ?imbookname ?p WHERE{
    
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?p .
            
    BIND((YEAR(?p)) AS ?pub).    
    
    ?subim wdt:P50 wd:Q190379 ;
        wdt:P577 ?d .
            
    BIND((YEAR(?d)) AS ?da).
        
    FILTER(?d = ?p).
    
# this returns the labels
?subpa <http://schema.org/name> ?pabookname .
?subim <http://schema.org/name> ?imbookname .
}
"""

print("Results")
run_query(queryString)

Results
[('pabookname', 'The New York Trilogy'), ('imbookname', 'The Child in Time'), ('p', '1987-01-01T00:00:00Z')]
[('pabookname', 'In the Country of Last Things'), ('imbookname', 'The Child in Time'), ('p', '1987-01-01T00:00:00Z')]
[('pabookname', 'The Music of Chance'), ('imbookname', 'The Innocent (McEwan novel)'), ('p', '1990-01-01T00:00:00Z')]
[('pabookname', "Auggie Wren's Christmas Story"), ('imbookname', 'The Innocent (McEwan novel)'), ('p', '1990-01-01T00:00:00Z')]
[('pabookname', 'The Music of Chance'), ('imbookname', 'The Innocent'), ('p', '1990-01-01T00:00:00Z')]
[('pabookname', "Auggie Wren's Christmas Story"), ('imbookname', 'The Innocent'), ('p', '1990-01-01T00:00:00Z')]
[('pabookname', 'City of Glass'), ('imbookname', 'The Daydreamer'), ('p', '1994-01-01T00:00:00Z')]
[('pabookname', 'I Thought My Father Was God'), ('imbookname', 'Atonement'), ('p', '2001-01-01T00:00:00Z')]
[('pabookname', 'The Brooklyn Follies'), ('imbookname', 'Saturday'), ('p', '2005-01-01T00:00:00Z

9

#### Comment
Not completely correct but this is the solution that goes nearer to the correct result.

Let's go on now with the part of "longest period without publishing" of the task.

In [14]:
#Let's print the publication dates to have an idea about what we should return to solve the task (for the author PA)
queryString = """

SELECT (YEAR(?pub) AS ?date) WHERE{
    ?subpa wdt:P50 wd:Q214642 ;
        wdt:P577 ?pub .
        
# this returns the labels
#?subpa <http://schema.org/name> ?pabookname .
#?subim <http://schema.org/name> ?imbookname .
}
ORDER BY ASC (?date)  
"""

print("Results")
run_query(queryString)

Results
[('date', '1985')]
[('date', '1986')]
[('date', '1987')]
[('date', '1987')]
[('date', '1989')]
[('date', '1990')]
[('date', '1990')]
[('date', '1992')]
[('date', '1993')]
[('date', '1994')]
[('date', '1994')]
[('date', '1998')]
[('date', '1999')]
[('date', '2001')]
[('date', '2002')]
[('date', '2002')]
[('date', '2003')]
[('date', '2004')]
[('date', '2005')]
[('date', '2006')]
[('date', '2008')]
[('date', '2008')]
[('date', '2009')]
[('date', '2010')]
[('date', '2010')]
[('date', '2012')]
[('date', '2012')]
[('date', '2013')]
[('date', '2013')]
[('date', '2017')]


30

In [29]:
#Let's print the interval between the release of a book and another (for the author PA) 
queryString = """
SELECT (YEAR(?de) AS ?date) (YEAR(MIN(?dt)) AS ?min) WHERE{
                ?subpa wdt:P50 wd:Q214642 ;
                    wdt:P577 ?de .
                    
                    FILTER(?de<?dt).
                    {
                        SELECT ?dt WHERE{
                            ?subpa wdt:P50 wd:Q214642 ;
                                wdt:P577 ?dt .
                        }
                    }
            }
            GROUP BY ?de ?date
            ORDER BY ASC (?date)
"""

print("Results")
run_query(queryString)

Results
[('date', '1985'), ('min', '1986')]
[('date', '1986'), ('min', '1987')]
[('date', '1987'), ('min', '1989')]
[('date', '1989'), ('min', '1990')]
[('date', '1990'), ('min', '1992')]
[('date', '1992'), ('min', '1993')]
[('date', '1993'), ('min', '1994')]
[('date', '1994'), ('min', '1998')]
[('date', '1994'), ('min', '1994')]
[('date', '1998'), ('min', '1999')]
[('date', '1999'), ('min', '2001')]
[('date', '2001'), ('min', '2002')]
[('date', '2002'), ('min', '2003')]
[('date', '2003'), ('min', '2004')]
[('date', '2004'), ('min', '2005')]
[('date', '2005'), ('min', '2006')]
[('date', '2006'), ('min', '2008')]
[('date', '2008'), ('min', '2009')]
[('date', '2009'), ('min', '2010')]
[('date', '2010'), ('min', '2012')]
[('date', '2012'), ('min', '2013')]
[('date', '2012'), ('min', '2012')]
[('date', '2013'), ('min', '2013')]
[('date', '2013'), ('min', '2017')]


24

In [32]:
#Let's print the distance in year between the release of two books from the longest period to the shortest (for the author PA)
queryString = """
SELECT ((YEAR(MIN(?dt)) AS ?min)-(YEAR(?de) AS ?date) AS ?res) WHERE{
                ?subpa wdt:P50 wd:Q214642 ;
                    wdt:P577 ?de .
                    
                    FILTER(?de<?dt).
                    {
                        SELECT ?dt WHERE{
                            ?subpa wdt:P50 wd:Q214642 ;
                                wdt:P577 ?dt .
                        }
                    }
            }
            GROUP BY ?min ?de ?date
            ORDER BY DESC (?res)
"""

print("Results")
run_query(queryString)

Results
[('res', '4')]
[('res', '4')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '1')]
[('res', '0')]
[('res', '0')]
[('res', '0')]


24

In [34]:
#Let's print the publication dates to have an idea about what we should return to solve the task (for the author IM)
queryString = """

SELECT (YEAR(?pub) AS ?date) WHERE{
    ?subpa wdt:P50 wd:Q190379 ;
        wdt:P577 ?pub .
        
# this returns the labels
#?subpa <http://schema.org/name> ?pabookname .
#?subim <http://schema.org/name> ?imbookname .
}
ORDER BY ASC (?date)  
"""

print("Results")
run_query(queryString)

Results
[('date', '1975')]
[('date', '1978')]
[('date', '1978')]
[('date', '1981')]
[('date', '1987')]
[('date', '1990')]
[('date', '1990')]
[('date', '1992')]
[('date', '1994')]
[('date', '1997')]
[('date', '1998')]
[('date', '2001')]
[('date', '2001')]
[('date', '2005')]
[('date', '2005')]
[('date', '2007')]
[('date', '2010')]
[('date', '2012')]
[('date', '2014')]
[('date', '2016')]
[('date', '2016')]
[('date', '2019')]
[('date', '2019')]


23

In [33]:
#Let's print the distance in year between the release of two books from the longest period to the shortest (for the author IM)
queryString = """
SELECT ((YEAR(MIN(?dt)) AS ?min)-(YEAR(?de) AS ?date) AS ?res) WHERE{
                ?subpa wdt:P50 wd:Q190379 ;
                    wdt:P577 ?de .
                    
                    FILTER(?de<?dt).
                    {
                        SELECT ?dt WHERE{
                            ?subpa wdt:P50 wd:Q190379 ;
                                wdt:P577 ?dt .
                        }
                    }
            }
            GROUP BY ?min ?de ?date
            ORDER BY DESC (?res)
"""

print("Results")
run_query(queryString)

Results
[('res', '6')]
[('res', '4')]
[('res', '3')]
[('res', '3')]
[('res', '3')]
[('res', '3')]
[('res', '3')]
[('res', '3')]
[('res', '3')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '2')]
[('res', '1')]


16

#### Comment
The longest period without publishing for Paul Auster is 4 years and for Ian McEwan is 6 years

## Task 3
3. Did the authors produced, acted or directed a film? If so, did they write the screenplay?

From the first task we know that:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P161`   | cast member   | predicate |
| `wdt:P162`   | produced   | predicate |
| `wdt:P57`     | directed   | predicate |
| `wdt:P58`    | screenwriter    | predicate |

In [37]:
#Let's see if PA produced, acted in or directed some films printing them
queryString = """
SELECT ?pname ?sname ?p ?sub
WHERE { 
?sub ?p wd:Q214642  .

FILTER(?p = wdt:P161 || ?p = wdt:P57 || ?p = wdt:P162).

# this returns the labels
?sub <http://schema.org/name> ?sname .
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'cast member'), ('sname', 'Act of God'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('sub', 'http://www.wikidata.org/entity/Q2915460')]
[('pname', 'producer'), ('sname', 'The Inner Life of Martin Frost'), ('p', 'http://www.wikidata.org/prop/direct/P162'), ('sub', 'http://www.wikidata.org/entity/Q3213673')]
[('pname', 'cast member'), ('sname', 'The Music of Chance'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('sub', 'http://www.wikidata.org/entity/Q2364684')]
[('pname', 'director'), ('sname', 'Blue in the Face'), ('p', 'http://www.wikidata.org/prop/direct/P57'), ('sub', 'http://www.wikidata.org/entity/Q2478977')]
[('pname', 'director'), ('sname', 'The Inner Life of Martin Frost'), ('p', 'http://www.wikidata.org/prop/direct/P57'), ('sub', 'http://www.wikidata.org/entity/Q3213673')]
[('pname', 'director'), ('sname', 'Smoke'), ('p', 'http://www.wikidata.org/prop/direct/P57'), ('sub', 'http://www.wikidata.org/entity/Q653447')]
[('pname', 'director'), ('sn

7

In [38]:
#Let's see if IM produced, acted in or directed some films printing them
queryString = """
SELECT ?pname ?sname ?p ?sub
WHERE { 
?sub ?p wd:Q190379  .

FILTER(?p = wdt:P161 || ?p = wdt:P57 || ?p = wdt:P162).

# this returns the labels
?sub <http://schema.org/name> ?sname .
?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'producer'), ('sname', 'Atonement'), ('p', 'http://www.wikidata.org/prop/direct/P162'), ('sub', 'http://www.wikidata.org/entity/Q1626186')]
[('pname', 'cast member'), ('sname', 'The Unbelievers'), ('p', 'http://www.wikidata.org/prop/direct/P161'), ('sub', 'http://www.wikidata.org/entity/Q7348342')]


2

#### Comment
Paul Auster produced 1 film, acted in 2 and directed 4, on the other hand IanMcEwan produced and acted only one film 

In [54]:
#Let's see if PA is also the screenwriter of the films he produced, acted in or directed 
queryString = """
SELECT ?sname (GROUP_CONCAT(DISTINCT ?pname ; separator = ", ") AS ?role) ?sw
WHERE { 
?sub ?p wd:Q214642 ;
    ?s wd:Q214642 .

FILTER(?p = wdt:P161 || ?p = wdt:P57 || ?p = wdt:P162) 
FILTER(?s = wdt:P58).

# this returns the labels
?sub <http://schema.org/name> ?sname .
?s <http://schema.org/name> ?sw .
?p <http://schema.org/name> ?pname .
}
GROUP BY ?sname ?sw
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Lulu on the Bridge'), ('role', 'director'), ('sw', 'screenwriter')]
[('sname', 'Blue in the Face'), ('role', 'director'), ('sw', 'screenwriter')]
[('sname', 'Act of God'), ('role', 'cast member'), ('sw', 'screenwriter')]
[('sname', 'The Music of Chance'), ('role', 'cast member'), ('sw', 'screenwriter')]
[('sname', 'Smoke'), ('role', 'director'), ('sw', 'screenwriter')]
[('sname', 'The Inner Life of Martin Frost'), ('role', 'director, producer'), ('sw', 'screenwriter')]


6

In [55]:
#Same thing as before for IM
queryString = """
SELECT ?sname (GROUP_CONCAT(DISTINCT ?pname ; separator = ", ") AS ?role) ?sw
WHERE { 
?sub ?p wd:Q190379 ;
    ?s wd:Q190379 .

FILTER(?p = wdt:P161 || ?p = wdt:P57 || ?p = wdt:P162) 
FILTER(?s = wdt:P58).

# this returns the labels
?sub <http://schema.org/name> ?sname .
?s <http://schema.org/name> ?sw .
?p <http://schema.org/name> ?pname .
}
GROUP BY ?sname ?sw
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Atonement'), ('role', 'producer'), ('sw', 'screenwriter')]


1

#### Comment
Paul Auster had been screenwriter in all films in which he partecipated instead Ian McEwan had been screenwriter only in the film produced by him   

## Task 4
4. How many films were derived from the books of these two authors? 

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P31`   | istance of   | predicate |

#### Comment
Looking at the query in Task 1 it seems that the outgoing edges from a book don't show any ralation with a film derived from them so I search in the opposite direction relation: from the films to the books. 

In [60]:
#Same thing as before for IM
queryString = """
SELECT ?sname ?ioname ?sub ?io
WHERE { 
?sub ?p wd:Q190379 ;
    ?s wd:Q190379 ;
    wdt:P31 ?io .

FILTER(?p = wdt:P161 || ?p = wdt:P57 || ?p = wdt:P162) 
FILTER(?s = wdt:P58).

# this returns the labels
?sub <http://schema.org/name> ?sname .
?io <http://schema.org/name> ?ioname .
#?p <http://schema.org/name> ?pname .
}
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Atonement'), ('ioname', 'film'), ('sub', 'http://www.wikidata.org/entity/Q1626186'), ('io', 'http://www.wikidata.org/entity/Q11424')]


1

We have discovered the IRI for the node film:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wd:Q11424`   | film   | node |

In [64]:
#Check if there are relations between the books of PA and some films
queryString = """
SELECT ?fname ?pname ?sname ?film ?p ?sub
WHERE { 
?sub wdt:P50 wd:Q214642 .
?film ?p ?sub;
    wdt:P31 wd:Q11424 .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?film <http://schema.org/name> ?fname .
?p <http://schema.org/name> ?pname .
}
"""

print("Results")
run_query(queryString)

Results
Empty


0

In [67]:
#Check if there are relations between the books of IM an some films
queryString = """
SELECT ?fname ?pname ?sname ?film ?p ?sub
WHERE { 
?sub wdt:P50 wd:Q190379 .
?film ?p ?sub;
    wdt:P31 wd:Q11424 .

# this returns the labels
?sub <http://schema.org/name> ?sname .
?film <http://schema.org/name> ?fname .
?p <http://schema.org/name> ?pname .
}
"""

print("Results")
run_query(queryString)

Results
[('fname', 'On Chesil Beach'), ('pname', 'based on'), ('sname', 'On Chesil Beach'), ('film', 'http://www.wikidata.org/entity/Q27959336'), ('p', 'http://www.wikidata.org/prop/direct/P144'), ('sub', 'http://www.wikidata.org/entity/Q451491')]
[('fname', 'Atonement'), ('pname', 'based on'), ('sname', 'Atonement'), ('film', 'http://www.wikidata.org/entity/Q1626186'), ('p', 'http://www.wikidata.org/prop/direct/P144'), ('sub', 'http://www.wikidata.org/entity/Q306619')]
[('fname', 'The Cement Garden'), ('pname', 'based on'), ('sname', 'The Cement Garden'), ('film', 'http://www.wikidata.org/entity/Q1916423'), ('p', 'http://www.wikidata.org/prop/direct/P144'), ('sub', 'http://www.wikidata.org/entity/Q1198186')]
[('fname', 'The Children Act'), ('pname', 'based on'), ('sname', 'The Children Act'), ('film', 'http://www.wikidata.org/entity/Q28923729'), ('p', 'http://www.wikidata.org/prop/direct/P144'), ('sub', 'http://www.wikidata.org/entity/Q18162235')]


4

#### Comment
We discoverd that no films derived from Paul Auster's books and that 4 films derived from Ian McEwan's books

## Task 5
5. Which author won more literature-related awards? Have they ever being nominated for a Nobel award? 

In the first task we discovered the properties: 

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1411`   | nominated for   | predicate |
| `wdt:P166`   | award received   | predicate |

So we can search for information about the type of award to check if it is a literature-related award, and if the prize is a Nobel

In [73]:
#Let's print the award and the nominee for PA
queryString = """
SELECT ?pname ?oname ?p ?obj
WHERE { 
wd:Q214642 ?p ?obj  .

FILTER(?p = wdt:P166 || ?p = wdt:P1411).

# this returns the labels
?p <http://schema.org/name> ?pname .
OPTIONAL {?obj <http://schema.org/name> ?oname }.
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'award received'), ('oname', 'Commandeur des Arts et des Lettres\u200e'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q13452531')]
[('pname', 'award received'), ('oname', 'Princess of Asturias Literary Prize'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q3332454')]
[('pname', 'award received'), ('oname', 'AAAS Fellow'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q52382875')]
[('pname', 'award received'), ('oname', 'PEN/Faulkner Award for Fiction'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q1188661')]
[('pname', 'award received'), ('oname', 'Prix Médicis for foreign literature'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q3405076')]


5

In [71]:
#Let's print the award and the nominee for IM
queryString = """
SELECT ?pname ?oname ?p ?obj
WHERE { 
wd:Q190379 ?p ?obj  .

FILTER(?p = wdt:P166 || ?p = wdt:P1411).

# this returns the labels
?p <http://schema.org/name> ?pname .
OPTIONAL {?obj <http://schema.org/name> ?oname} .
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'award received'), ('oname', 'honorary doctor of the University of Sussex'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q56494487')]
[('pname', 'award received'), ('oname', 'Common Wealth Award of Distinguished Service'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q5153503')]
[('pname', 'award received'), ('oname', 'Commander of the Order of the British Empire'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q12201477')]
[('pname', 'award received'), ('oname', 'Jerusalem Prize'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q277787')]
[('pname', 'award received'), ('oname', 'Fellow of the Royal Society of Literature'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q39477935')]
[('pname', 'award received'), ('oname', 'AAAS Fellow'), ('p', '

19

In [78]:
#Let's look for the outgoing edges from the nodes connected with the author PA through the relations "award received" and "nominated for" (we are looking for the properties of the awards)
queryString = """
SELECT ?rname ?ename ?rel ?end
WHERE { 
wd:Q214642 ?p ?obj  .
?obj ?rel ?end .

FILTER(?p = wdt:P166 || ?p = wdt:P1411).

# this returns the labels
?p <http://schema.org/name> ?pname .
?rel <http://schema.org/name> ?rname .
OPTIONAL {?obj <http://schema.org/name> ?oname .
?end <http://schema.org/name> ?ename}.
} 
"""

print("Results")
run_query(queryString)

Results
[('rname', 'next lower rank'), ('ename', 'Officier des Arts et des Lettres\u200e'), ('rel', 'http://www.wikidata.org/prop/direct/P3729'), ('end', 'http://www.wikidata.org/entity/Q13452524')]
[('rname', 'founded by'), ('ename', 'Jean-Pierre Giraudoux'), ('rel', 'http://www.wikidata.org/prop/direct/P112'), ('end', 'http://www.wikidata.org/entity/Q3169349')]
[('rname', 'founded by'), ('ename', 'Gala Barbisan'), ('rel', 'http://www.wikidata.org/prop/direct/P112'), ('end', 'http://www.wikidata.org/entity/Q15139267')]
[('rname', 'country'), ('ename', 'United States of America'), ('rel', 'http://www.wikidata.org/prop/direct/P17'), ('end', 'http://www.wikidata.org/entity/Q30')]
[('rname', 'country'), ('ename', 'France'), ('rel', 'http://www.wikidata.org/prop/direct/P17'), ('end', 'http://www.wikidata.org/entity/Q142')]
[('rname', 'country'), ('ename', 'France'), ('rel', 'http://www.wikidata.org/prop/direct/P17'), ('end', 'http://www.wikidata.org/entity/Q142')]
[('rname', 'country'), ('

44

In [79]:
#Let's look for the outgoing edges from the nodes connected with the author IM through the relations "award received" and "nominated for" (we are looking for the properties of the awards)
queryString = """
SELECT DISTINCT ?rname ?ename ?rel ?end
WHERE { 
wd:Q190379 ?p ?obj  .
?obj ?rel ?end .

FILTER(?p = wdt:P166 || ?p = wdt:P1411).

# this returns the labels
?p <http://schema.org/name> ?pname .
?rel <http://schema.org/name> ?rname .
OPTIONAL {?obj <http://schema.org/name> ?oname .
?end <http://schema.org/name> ?ename}.
} 
"""

print("Results")
run_query(queryString)

Results
[('rname', 'next lower rank'), ('ename', 'Officer of the Order of the British Empire'), ('rel', 'http://www.wikidata.org/prop/direct/P3729'), ('end', 'http://www.wikidata.org/entity/Q10762848')]
[('rname', 'winner'), ('ename', 'Yann Martel'), ('rel', 'http://www.wikidata.org/prop/direct/P1346'), ('end', 'http://www.wikidata.org/entity/Q13914')]
[('rname', 'winner'), ('ename', 'Ruth Prawer Jhabvala'), ('rel', 'http://www.wikidata.org/prop/direct/P1346'), ('end', 'http://www.wikidata.org/entity/Q235759')]
[('rname', 'winner'), ('ename', 'Anita Brookner'), ('rel', 'http://www.wikidata.org/prop/direct/P1346'), ('end', 'http://www.wikidata.org/entity/Q237687')]
[('rname', 'winner'), ('ename', 'Kingsley Amis'), ('rel', 'http://www.wikidata.org/prop/direct/P1346'), ('end', 'http://www.wikidata.org/entity/Q220078')]
[('rname', 'winner'), ('ename', 'Michael Ondaatje'), ('rel', 'http://www.wikidata.org/prop/direct/P1346'), ('end', 'http://www.wikidata.org/entity/Q313593')]
[('rname', 'wi

203

We have discovered that the awards are related to literary award node through the properties "subclass of" and "istance of":

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P31`   | instance of   | predicate |
| `wdt:P279`   | subclass of   | predicate |
| `wd:Q378427`   | literary award   | node |

Now we can count the literary award won by each author

In [85]:
#Let's print the literary awards won by PA
queryString = """
SELECT ?pname ?oname ?p ?obj
WHERE { 
wd:Q214642 ?p ?obj .
?obj ?r wd:Q378427 .

FILTER(?p = wdt:P166).
FILTER(?r = wdt:P31 || ?r = wdt:P279).

# this returns the labels
?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'award received'), ('oname', 'Prix Médicis for foreign literature'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q3405076')]
[('pname', 'award received'), ('oname', 'Princess of Asturias Literary Prize'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q3332454')]
[('pname', 'award received'), ('oname', 'PEN/Faulkner Award for Fiction'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q1188661')]


3

In [84]:
#Let's print the literary awards won by IM
queryString = """
SELECT ?pname ?oname ?p ?obj
WHERE { 
wd:Q190379 ?p ?obj .
?obj ?r wd:Q378427 .

FILTER(?p = wdt:P166).
FILTER(?r = wdt:P31 || ?r = wdt:P279).

# this returns the labels
?p <http://schema.org/name> ?pname .
?obj <http://schema.org/name> ?oname .
} 
"""

print("Results")
run_query(queryString)

Results
[('pname', 'award received'), ('oname', 'Fellow of the Royal Society of Literature'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q39477935')]
[('pname', 'award received'), ('oname', 'Booker Prize'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q160082')]
[('pname', 'award received'), ('oname', 'James Tait Black Memorial Prize'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q392586')]
[('pname', 'award received'), ('oname', 'Helmerich Award'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q5709276')]
[('pname', 'award received'), ('oname', 'Prix Femina étranger'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikidata.org/entity/Q3404580')]
[('pname', 'award received'), ('oname', 'Bodley Medal'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('obj', 'http://www.wikida

7

#### Comment
Through these queries we have discovered that Ian McEwan has won more prizes than Paul Auster and that noone of the two has been nominated for a Nobel award (you can find this second result in the first two queries).