# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-34ebe815f6-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Author comparison explorative search") 

Consider the following exploratory scenario:


>  Investigate Italian and French book authors in terms of awards, books published and copyright types



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate | 
| `wdt:P17`     | country       | predicate | 
| `wdt:P27`     | citizenship   | predicate | 
| `wd:Q36180`   | writer        | node |
| `wd:Q38`      | Italy         | node |
| `wd:Q172579`  | Kingdom of Italy        | node |
| `wd:Q142`     | France        | node |
| `wd:Q37922`   | Nobel Prize literature        | node |
| `wd:Q213678`  | Vatican Library        | node |


Also consider that

```
?p wdt:P27 wd:Q142
```

is the BGP to retrieve all **French citizens**


The workload should


1. Identify the BGP for obtaining the Italian and French writers who published a book in the last 50 years

2. Compare the number of books written by Italian and French writers

3. Count how many books written by Italian authors are now released with a "public domain" copyright form

4. How many Literature Nobel awards won authors from Italy and from the Kingdom of Italy? 

5. Are there books from Litarature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatical Library)?

In [1]:
# start your workflow here

In [2]:
queryString = """
SELECT COUNT(?p)
WHERE { 
?p wdt:P27 wd:Q142 .
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '273456')]


1

## Explorative Queries
I run some queries to understand the data structure.

Firstly I look for properties and objects linked to the node writer (wdt:Q36180).

In [3]:
queryString = """
SELECT DISTINCT ?p ?pname ?o ?oname
WHERE { 
wd:Q36180 ?p ?o.
?p <http://schema.org/name> ?pname.
?o <http://schema.org/name> ?oname.
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P425'), ('pname', 'field of this occupation'), ('o', 'http://www.wikidata.org/entity/Q37260'), ('oname', 'writing')]
[('p', 'http://www.wikidata.org/prop/direct/P425'), ('pname', 'field of this occupation'), ('o', 'http://www.wikidata.org/entity/Q2250012'), ('oname', 'writing')]
[('p', 'http://www.wikidata.org/prop/direct/P910'), ('pname', "topic's main category"), ('o', 'http://www.wikidata.org/entity/Q5849863'), ('oname', 'Category:Writers')]
[('p', 'http://www.wikidata.org/prop/direct/P1687'), ('pname', 'Wikidata property'), ('o', 'http://www.wikidata.org/entity/P50'), ('oname', 'author')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q12737077'), ('oname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q28640'), ('oname', 'profession')]
[('p', 'http://www.wikidata.org/prop/direct/P

14

Actually, it is more reasonable to look the other way around therefore I retrieve all properties that link any node to the node writer (wdt:Q36180).

In [4]:
queryString = """
SELECT DISTINCT ?p ?pname 
WHERE { 
?s ?p wd:Q36180.
?p <http://schema.org/name> ?pname.

} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1423'), ('pname', 'template has topic')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P123'), ('pname', 'publisher')]
[('p', 'http://www.wikidata.org/prop/direct/P1269'), ('pname', 'facet of')]
[('p', 'http://www.wikidata.org/prop/direct/P135'), ('pname', 'movement')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pname', 'languages spoken, written or signed')]
[('p', 'http://www.wikidata.org/prop/direct/P1535'), ('pname', 'used by')]
[('p', 'http://www.wikidata.org/prop/direct/P1552'), ('pname', 'has quality')]
[('p', 'http://www.wikidata.org/prop/direct/P180'), ('pname', 'depicts')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pname', 'different from')]
[('p', 'http://www.wikidata.org/prop/

33

I retrieved a property that could be useful later that is wdt:P50, named author.

## Italian and French writers active in the last 50 years
This section contains the queries used to answer the first question: _Identify the BGP for obtaining the Italian and French writers who published a book in the last 50 years_ .

### Retrieving writers
Firstly, I look for the BPG to retrieve writers. I think writers will either be connected to a pair "property occupation (wdt:P106) object writer (wd:Q36180)" or "property instance of (wdt:P31) object writer (wd:Q36180)". I will check if my assumptions are correct by running two queries that selects some nodes that are subject to the above-mentioned paths.

In [5]:
queryString = """
SELECT ?s ?sname 
WHERE { 
?s wdt:P106 wd:Q36180.
?s <http://schema.org/name> ?sname.

}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q1342082'), ('sname', 'Youssef Ziedan')]
[('s', 'http://www.wikidata.org/entity/Q1373516'), ('sname', 'Eugénio de Castro')]
[('s', 'http://www.wikidata.org/entity/Q240377'), ('sname', 'Amanda Palmer')]
[('s', 'http://www.wikidata.org/entity/Q3191192'), ('sname', 'Júlia Lopes de Almeida')]
[('s', 'http://www.wikidata.org/entity/Q589406'), ('sname', 'Empress Xiaoquancheng')]
[('s', 'http://www.wikidata.org/entity/Q610608'), ('sname', 'António Aleixo')]
[('s', 'http://www.wikidata.org/entity/Q710504'), ('sname', 'Bohuslav Balbín')]
[('s', 'http://www.wikidata.org/entity/Q8183671'), ('sname', 'Aarão de Lacerda')]
[('s', 'http://www.wikidata.org/entity/Q8188119'), ('sname', 'Adelina Lopes Vieira')]
[('s', 'http://www.wikidata.org/entity/Q8196972'), ('sname', 'Amado Gómez Ugarte')]
[('s', 'http://www.wikidata.org/entity/Q8198136'), ('sname', 'Ana Vasco')]
[('s', 'http://www.wikidata.org/entity/Q8199338'), ('sname', 'Andrés García de Céspedes')]


15

The path ?s wdt:P106 (occupation) wd:Q36180 (writer) correctly retrieves writers. Let us check the second path:

In [6]:
queryString = """
SELECT ?s ?sname 
WHERE { 
?s wdt:P31 wd:Q36180.
?s <http://schema.org/name> ?sname.

}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q108669177'), ('sname', 'Abigail Jones')]


1

This query only retrieved one result: Abigail Jones, that is a fictional character therefore the second path does not retrieve writers.

### Retrieving writers interesting properties

#### Information about writers' nationality
Now I look for some properties that a writer is subject to.

In [7]:
queryString = """
SELECT DISTINCT ?p ?pname 
WHERE { 
?w wdt:P106 wd:Q36180;
    ?p ?o.
?p <http://schema.org/name> ?pname.

}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P9271'), ('pname', 'Kickstarter username')]
[('p', 'http://www.wikidata.org/prop/direct/P3269'), ('pname', 'Fotografen.nl ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8577'), ('pname', 'ICP artist ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6891'), ('pname', 'National Film Board of Canada director identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P4015'), ('pname', 'Vimeo identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P5498'), ('pname', 'Base de datos de premiados person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P7357'), ('pname', 'Diccionari de la Literatura Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9256'), ('pname', 'Diccionari de la traducció catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8749'), ('pname', 'Dictionary of Portuguese Historians ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2985'), ('pname', 'DBSE ID')]
[('p', 'http://www.wikidata.org/prop/dir

50

Since I did not retrieve much interesting properties I try to count them in order to see if such result is due to the 'LIMIT 50' or because writers' properties are not so interesting.

In [8]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?writerProperties 
WHERE { 
?w wdt:P106 wd:Q36180;
    ?p ?o.
}
"""

print("Results")
run_query(queryString)

Results
[('writerProperties', '2923')]


1

There are 2923 distinct properties that a writer is subject to therefore next time I will need to be more specific if I want to retrieve useful properties.

Before investigating the previous point further, just out of curiosity I check the number of distinct properties a writer is object to.

In [9]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?props
WHERE { 
?w wdt:P106 wd:Q36180.
?n ?p ?w.
}
"""

print("Results")
run_query(queryString)

Results
[('props', '325')]


1

This is a more reasonable number therefore I take a look at them. I noticed that the more common properties (for example country, citizenship, occupation) have a smaller number identifier than specific properties therefore I order the properties according to their URI.

In [10]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 
?w wdt:P106 wd:Q36180.
?n ?p ?w.
?p <http://schema.org/name> ?pname.
}
ORDER BY ?p
LIMIT 100
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1018'), ('pname', 'language regulatory body')]
[('p', 'http://www.wikidata.org/prop/direct/P1026'), ('pname', 'academic thesis')]
[('p', 'http://www.wikidata.org/prop/direct/P1027'), ('pname', 'conferred by')]
[('p', 'http://www.wikidata.org/prop/direct/P1028'), ('pname', 'donated by')]
[('p', 'http://www.wikidata.org/prop/direct/P1029'), ('pname', 'crew member(s)')]
[('p', 'http://www.wikidata.org/prop/direct/P1037'), ('pname', 'director / manager')]
[('p', 'http://www.wikidata.org/prop/direct/P1038'), ('pname', 'relative')]
[('p', 'http://www.wikidata.org/prop/direct/P1039'), ('pname', 'kinship to subject')]
[('p', 'http://www.wikidata.org/prop/direct/P1040'), ('pname', 'film editor')]
[('p', 'http://www.wikidata.org/prop/direct/P1049'), ('pname', 'worshipped by')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http

100

It is more reasonable to think a writer is subject to properties that may retrieve useful information and these last two queries confirm it therefore I keep investigating the previous flow, that is properties that writers are subject to. First I look for the URI for an object country by means of the node "Italy" (wd:Q38).

In [11]:
queryString = """
SELECT ?o ?oname
WHERE { 
wd:Q38 wdt:P31 ?o.
?o <http://schema.org/name> ?oname.

}
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q3624078'), ('oname', 'sovereign state')]
[('o', 'http://www.wikidata.org/entity/Q6256'), ('oname', 'country')]


2

Now I check that nodes that are instance of country (wd:Q6256) are actually countries.

In [12]:
queryString = """
SELECT ?c ?cname
WHERE { 
?c wdt:P31 wd:Q6256.
?c <http://schema.org/name> ?cname.
}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q16644'), ('cname', 'Northern Mariana Islands')]
[('c', 'http://www.wikidata.org/entity/Q207991'), ('cname', 'Pahlavi Dynasty')]
[('c', 'http://www.wikidata.org/entity/Q756617'), ('cname', 'Danish Realm')]
[('c', 'http://www.wikidata.org/entity/Q786'), ('cname', 'Dominican Republic')]
[('c', 'http://www.wikidata.org/entity/Q907112'), ('cname', 'Transnistria')]
[('c', 'http://www.wikidata.org/entity/Q244165'), ('cname', 'Republic of Artsakh')]
[('c', 'http://www.wikidata.org/entity/Q1005'), ('cname', 'The Gambia')]
[('c', 'http://www.wikidata.org/entity/Q1006'), ('cname', 'Guinea')]
[('c', 'http://www.wikidata.org/entity/Q1007'), ('cname', 'Guinea-Bissau')]
[('c', 'http://www.wikidata.org/entity/Q1008'), ('cname', 'Ivory Coast')]
[('c', 'http://www.wikidata.org/entity/Q1009'), ('cname', 'Cameroon')]
[('c', 'http://www.wikidata.org/entity/Q1011'), ('cname', 'Cape Verde')]
[('c', 'http://www.wikidata.org/entity/Q1013'), ('cname', 'Lesotho')]


15

In this way I can retrieve writers' properties whose object are some instance of country. In this way I can see how citizenship information is store in the data.

In [13]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE { 
?w wdt:P106 wd:Q36180;
    ?p ?c.
?c wdt:P31 wd:Q6256.
?p <http://schema.org/name> ?pname.

}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P119'), ('pname', 'place of burial')]
[('p', 'http://www.wikidata.org/prop/direct/P1532'), ('pname', 'country for sport')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pname', 'country')]
[('p', 'http://www.wikidata.org/prop/direct/P19'), ('pname', 'place of birth')]
[('p', 'http://www.wikidata.org/prop/direct/P20'), ('pname', 'place of death')]
[('p', 'http://www.wikidata.org/prop/direct/P2632'), ('pname', 'place of detention')]
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pname', 'country of citizenship')]
[('p', 'http://www.wikidata.org/prop/direct/P276'), ('pname', 'location')]
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pname', 'country of origin')]
[('p', 'http://www.wikidata.org/prop/direct/P53'), ('pname', 'family')]
[('p', 'http://www.wikidata.org/prop/direct/P551'), ('pname', 'residence')]
[('p', 'http://www.wikidata.org

27

Some properties that could help me retrieve writers' citizenship are:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P17`   | country   | predicate |
| `wdt:P19`   | place of birth   | predicate |
| `wdt:P27`    | country of citizenship      | predicate |
| `wdt:P459`     | country of origin   | predicate |

Therefore, in order to retrieve italian and french writers we can use such properties linked to Italy (wd:Q38), Kingdom of Italy (wd:Q172579) and France (wd:Q142). We need two countries for Italy because since we are retrieving writers that published at least a book in the last 50 years we could have writers still linked to the Kingdom of Italy. 

Before moving on, I check if there are other properties asserting writers' nationality without using the country but for example the words 'italian' or 'french'.

In [14]:
queryString = """
SELECT ?p ?pname ?o ?oname
WHERE { 
?w wdt:P106 wd:Q36180;
    wdt:P27 wdt:Q142;
    ?p ?o.
    
?p <http://schema.org/name> ?pname.
?o <http://schema.org/name> ?oname.

FILTER(regex(?pname,".*citizenship.*") || regex(?pname, ".*nationality.*")).

}
"""

print("Results")
run_query(queryString)

Results
Empty


0

No other properties named citizenship is connected to french writers. Therefore I can retrieve all italian and french writers using the properties I already discovered without losing information. First, I count the french writers to see if the reasoning is correct.

In [15]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?frenchWriters
WHERE { 
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q142.

}
"""

print("Results")
run_query(queryString)

Results
[('frenchWriters', '14400')]


1

I add the other properties previously retrieved to see if they actually retrieve more results or if their impact is negligible.

In [16]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?frenchWriters
WHERE { 
    ?w wdt:P106 wd:Q36180;
        wdt:P17|wdt:P27|wdt:P19|wdt:P459 wd:Q142.

}
"""

print("Results")
run_query(queryString)

Results
[('frenchWriters', '14421')]


1

We retrieve 21 more authors over more than 14 thousands therefore it looks like I can simply use wdt:P17. I check it for Italy too before deciding. 

In [17]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?italianWriters
WHERE { 
{
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q38.
}
UNION
{
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q172579.
}
}
"""

print("Results")
run_query(queryString)

Results
[('italianWriters', '7442')]


1

In [18]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?italianWriters
WHERE { 
{
    ?w wdt:P106 wd:Q36180;
        wdt:P17|wdt:P27|wdt:P19|wdt:P459 wd:Q38.
}
UNION
{
    ?w wdt:P106 wd:Q36180;
        wdt:P17|wdt:P27|wdt:P19|wdt:P459 wd:Q172579.
}
}
"""

print("Results")
run_query(queryString)

Results
[('italianWriters', '7454')]


1

Again, I retrieved about ten writers more over the 7 thousand I already found therefore I decide to simply use wdt:P27 property.

#### BGP for all italian and french wrtiters
Now I can retrieve all french and italian writers.

In [19]:
queryString = """
SELECT COUNT(DISTINCT ?w) As ?writers
WHERE { 
{
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q142.
}
UNION
{
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q38.
}
UNION
{
    ?w wdt:P106 wd:Q36180;
        wdt:P27 wd:Q172579.
}
}
"""

print("Results")
run_query(queryString)

Results
[('writers', '21821')]


1

Equivalent formulation:

In [20]:
queryString = """
SELECT COUNT(DISTINCT ?w) As ?writers
WHERE { 

    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    
}
"""

print("Results")
run_query(queryString)

Results
[('writers', '21821')]


1

#### Information about writers' work
Now I need to filter all italian and french writers based on the one that published at least a book in the last 50 years. Therefore I need to understand how such information is stored in the database. I filter the properties only for italian and french writers in order to ease the computation. I am using a subquery to retrieve the writers first.

In [21]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    
    ?w ?p ?o.
    
    ?p <http://schema.org/name> ?pname.
    FILTER(regex(?pname,".*publish.*") || regex(?pname, ".*wrote.*")).
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}

}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P123'), ('pname', 'publisher')]
[('p', 'http://www.wikidata.org/prop/direct/P1239'), ('pname', 'ISFDB publisher ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1433'), ('pname', 'published in')]


3

Looks like the property wdt:P123 publisher could be useful, let us see the object linked with it.

In [22]:
queryString = """
SELECT DISTINCT ?o ?oname
WHERE {
    
    ?w wdt:P123 ?o.
    ?o <http://schema.org/name> ?oname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}

}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q2190366'), ('oname', 'Éditions Albin Michel')]
[('o', 'http://www.wikidata.org/entity/Q16628535'), ('oname', 'Des idées & des hommes')]
[('o', 'http://www.wikidata.org/entity/Q28777944'), ('oname', 'Éditions Goélette')]
[('o', 'http://www.wikidata.org/entity/Q3220877'), ('oname', 'Le Castor Astral')]
[('o', 'http://www.wikidata.org/entity/Q28777883'), ('oname', 'French Pulp')]
[('o', 'http://www.wikidata.org/entity/Q3345036'), ('oname', 'Éditions Nouveau Monde')]
[('o', 'http://www.wikidata.org/entity/Q273819'), ('oname', 'Éditions Gallimard')]
[('o', 'http://www.wikidata.org/entity/Q2823584'), ('oname', 'Actes Sud')]
[('o', 'http://www.wikidata.org/entity/Q3563761'), ('oname', 'Vuibert')]
[('o', 'http://www.wikidata.org/entity/Q3579411'), ('oname', 'Éditions Liana Levi')]
[('o', 'http://www.wikidata.org/entity/Q3579478'), ('oname', 'Philippe Picquier Publishing')]
[('o', 'http://www.wikidata.org/entity/Q3579598'), ('oname', "L'Insomniaqu

22

I retrieved only 22 results even if the limit was 50. It looks like this is not the right property. I start exploring the subset's properties by counting them. I look for the properties where the writers are the subjects, then I will do the same thing for the other direction.

In [23]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?props 
WHERE {
    
    ?w ?p ?o.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}

}
"""

print("Results")
run_query(queryString)

Results
[('props', '1707')]


1

Now I retrieve the top 50 most used properties in the subset.

In [24]:
queryString = """
SELECT ?p ?pname COUNT(?o) AS ?objects 
WHERE {
    
    ?w ?p ?o.
    ?p <http://schema.org/name> ?pname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
GROUP BY ?p ?pname
ORDER BY DESC(COUNT(?o))
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('objects', '62679')]
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pname', 'country of citizenship'), ('objects', '30291')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pname', 'languages spoken, written or signed'), ('objects', '25719')]
[('p', 'http://www.wikidata.org/prop/direct/P735'), ('pname', 'given name'), ('objects', '24243')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('objects', '24117')]
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pname', 'sex or gender'), ('objects', '24071')]
[('p', 'http://www.wikidata.org/prop/direct/P569'), ('pname', 'date of birth'), ('objects', '23787')]
[('p', 'http://www.wikidata.org/prop/direct/P214'), ('pname', 'VIAF ID'), ('objects', '22108')]
[('p', 'http://www.wikidata.org/prop/direct/P7859'), ('pname', 'WorldCat Identities ID'), ('objects', '20970')]
[('p', 'http://www.wikidata.org/prop/direct/P19')

50

I did not retrieve any useful information about the published book but I discovered the property wdt:P166 named "award received" that might be useful for later. I now run the same queries using writers as objects of the properties.

In [25]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?props 
WHERE {
    
    ?s ?p ?w.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}

}
"""

print("Results")
run_query(queryString)

Results
[('props', '178')]


1

In [26]:
queryString = """
SELECT ?p ?pname COUNT(?w) AS ?objects 
WHERE {
    
    ?s ?p ?w.
    ?p <http://schema.org/name> ?pname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
GROUP BY ?p ?pname
ORDER BY DESC(COUNT(?w))
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P50'), ('pname', 'author'), ('objects', '46832')]
[('p', 'http://www.wikidata.org/prop/direct/P58'), ('pname', 'screenwriter'), ('objects', '12770')]
[('p', 'http://www.wikidata.org/prop/direct/P161'), ('pname', 'cast member'), ('objects', '10895')]
[('p', 'http://www.wikidata.org/prop/direct/P170'), ('pname', 'creator'), ('objects', '5348')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pname', 'named after'), ('objects', '5326')]
[('p', 'http://www.wikidata.org/prop/direct/P57'), ('pname', 'director'), ('objects', '4637')]
[('p', 'http://www.wikidata.org/prop/direct/P921'), ('pname', 'main subject'), ('objects', '4052')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('pname', 'child'), ('objects', '2051')]
[('p', 'http://www.wikidata.org/prop/direct/P175'), ('pname', 'performer'), ('objects', '2043')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse'), ('objects', '1905')]
[('p', 'http://www.wikidata

50

I discovered the property wdt:P50 author that I already encountered before. I check the property by retrieving what are its subjects.

In [27]:
queryString = """
SELECT COUNT(DISTINCT ?s) AS ?subjects 
WHERE {
    
    ?s ?p ?w.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
"""

print("Results")
run_query(queryString)

Results
[('subjects', '116326')]


1

Since the subjects are a lot I retrieve only the top 25 based on the number of writers linked to it.

In [28]:
queryString = """
SELECT ?b ?bname COUNT(?w) AS ?writers 
WHERE {
    
    ?s ?p ?w.
    ?s wdt:P31 ?b.
    ?b <http://schema.org/name> ?bname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
GROUP BY ?b ?bname
ORDER BY DESC(COUNT(?w))
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('b', 'http://www.wikidata.org/entity/Q11424'), ('bname', 'film'), ('writers', '27960')]
[('b', 'http://www.wikidata.org/entity/Q3331189'), ('bname', 'version, edition, or translation'), ('writers', '16375')]
[('b', 'http://www.wikidata.org/entity/Q5'), ('bname', 'human'), ('writers', '12175')]
[('b', 'http://www.wikidata.org/entity/Q7725634'), ('bname', 'literary work'), ('writers', '8244')]
[('b', 'http://www.wikidata.org/entity/Q191067'), ('bname', 'article'), ('writers', '6892')]
[('b', 'http://www.wikidata.org/entity/Q47461344'), ('bname', 'written work'), ('writers', '6267')]
[('b', 'http://www.wikidata.org/entity/Q3305213'), ('bname', 'painting'), ('writers', '3674')]
[('b', 'http://www.wikidata.org/entity/Q13433827'), ('bname', 'encyclopedia article'), ('writers', '2290')]
[('b', 'http://www.wikidata.org/entity/Q5185279'), ('bname', 'poem'), ('writers', '2089')]
[('b', 'http://www.wikidata.org/entity/Q2831984'), ('bname', 'comic book album'), ('writers', '1921')]
[('b'

25

In order to retrieve the italian and french writers who published at least a book I could select the writers where there exists a triple "?s wdt:P50 ?writer" where ?s is instance of "literary work" (wd:Q7725634), "written work" (wd:Q47461344) or "book" (wd:Q571). Before that, I want to investigate further the instances of literary work in order to see if they are actually books or it is wrongful to consider them in the above-mentioned query.

I retrieve some triples using only italian writers in order to see if such instances are books.

In [29]:
queryString = """
SELECT ?sname ?pname ?wname
WHERE {
    
    ?s ?p ?w.
    ?s wdt:P31 wd:Q7725634.
    ?w <http://schema.org/name> ?wname.
    ?p <http://schema.org/name> ?pname.
    ?s <http://schema.org/name> ?sname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('sname', 'The Story of a New Name'), ('pname', 'author'), ('wname', 'Elena Ferrante')]
[('sname', 'Bononiensis de architectura libri quinque quibus cuncta fere architectonicae facultatis mysteria docte, perspicue . . .'), ('pname', 'author'), ('wname', 'Sebastiano Serlio')]
[('sname', 'Smog'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'Il Ghebo'), ('pname', 'author'), ('wname', 'Elio Bartolini')]
[('sname', 'Il signor Diavolo'), ('pname', 'author'), ('wname', 'Pupi Avati')]
[('sname', 'Sotto il sole giaguaro'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'Five Stories of Ferrara'), ('pname', 'author'), ('wname', 'Giorgio Bassani')]
[('sname', 'The Story of the Lost Child'), ('pname', 'author'), ('wname', 'Elena Ferrante')]
[('sname', "La bellezza d'Ippolita"), ('pname', 'author'), ('wname', 'Elio Bartolini')]
[('sname', "L'attore"), ('pname', 'author'), ('wname', 'Mario Soldati')]
[('sname', 'The eye of the cat'), ('pname', 'author'), ('wname

50

We can clearly see some known books therefore I will consider instances of literary work as well. I also check the objects that are instances of "written work" (wd:Q47461344).

In [30]:
queryString = """
SELECT ?sname ?pname ?wname
WHERE {
    
    ?s ?p ?w.
    ?s wdt:P31 wd:Q47461344.
    ?w <http://schema.org/name> ?wname.
    ?p <http://schema.org/name> ?pname.
    ?s <http://schema.org/name> ?sname.
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('sname', 'La carne, la morte e il diavolo nella letteratura romantica'), ('pname', 'author'), ('wname', 'Mario Praz')]
[('sname', 'Invisible Cities'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'Veritas'), ('pname', 'author'), ('wname', 'Francesco Sorti')]
[('sname', 'Conversations in Sicily'), ('pname', 'author'), ('wname', 'Elio Vittorini')]
[('sname', 'Imprimatur'), ('pname', 'author'), ('wname', 'Rita Monaldi')]
[('sname', 'The Cloven Viscount'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'Bisexuality in the ancient Word'), ('pname', 'author'), ('wname', 'Eva Cantarella')]
[('sname', 'Men and Not Men'), ('pname', 'author'), ('wname', 'Elio Vittorini')]
[('sname', 'The Watcher'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'Our Ancestors'), ('pname', 'author'), ('wname', 'Italo Calvino')]
[('sname', 'The Lost Daughter'), ('pname', 'author'), ('wname', 'Elena Ferrante')]
[('sname', 'The Baron in the Trees'), ('pname', 'autho

25

#### BGP for all italian and french writers that have published at least a book
From the previous queries I discovered that in order to retrieve all italian and french writers that have published at least a book I need to look for writers where there exists a triple "?w wdtt:P50 ?obejct" where object is an instance of one of these three entities: book (wd:Q571), literarty work (wd:Q77256354) and  written work (wd:Q47461344).

In [32]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?writers
WHERE {
    
    FILTER EXISTS{
        ?s wdt:P50 ?w;
           wdt:P31 ?books.
        FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    }
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
"""

print("Results")
run_query(queryString)

Results
[('writers', '3478')]


1

I count the writers based on their country. I use the equivalent formulation for retrieving italian and french authors. _All details about the two equivalent queries in the previous section_

In [34]:
queryString = """
SELECT ?country COUNT(DISTINCT ?w) AS ?writers
WHERE {

    ?c <http://schema.org/name> ?country.
    
    FILTER EXISTS{
        ?s wdt:P50 ?w;
           wdt:P31 ?books.
        FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    }
    
{ 
    SELECT ?c ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
GROUP BY ?country
ORDER BY DESC(?writers)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('writers', '2539')]
[('country', 'Italy'), ('writers', '847')]
[('country', 'Kingdom of Italy'), ('writers', '411')]


3

#### Information about books' publishing date
Finally, I need to filter books that were written in the last 50 years therefore I need to find how such information is stored. First of all I count how many properties these books have.

In [36]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?bookProps
WHERE {
    
    ?s wdt:P50 ?w;
       wdt:P31 ?books;
       ?p ?o.
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
{ 
    SELECT ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
"""

print("Results")
run_query(queryString)

Results
[('bookProps', '382')]


1

Since they are a lot I retrieve the top 25.

In [37]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?s) AS ?subjects
WHERE {
    
    ?s wdt:P50 ?w;
       wdt:P31 ?books;
       ?p ?o.
    ?p <http://schema.org/name> ?pname.
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
{ 
    SELECT ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
GROUP BY ?p ?pname
ORDER BY DESC(?subjects)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P50'), ('pname', 'author'), ('subjects', '13606')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('subjects', '13606')]
[('p', 'http://www.wikidata.org/prop/direct/P407'), ('pname', 'language of work or name'), ('subjects', '10688')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre'), ('subjects', '10485')]
[('p', 'http://www.wikidata.org/prop/direct/P577'), ('pname', 'publication date'), ('subjects', '10215')]
[('p', 'http://www.wikidata.org/prop/direct/P2671'), ('pname', 'Google Knowledge Graph ID'), ('subjects', '6792')]
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('pname', 'country of origin'), ('subjects', '6272')]
[('p', 'http://www.wikidata.org/prop/direct/P6216'), ('pname', 'copyright status'), ('subjects', '2611')]
[('p', 'http://www.wikidata.org/prop/direct/P747'), ('pname', 'has edition or translation'), ('subjects', '2375')]
[('p', 'http://www.wikidata.org/prop/dir

25

Before moving on with the proerty "publication date" (wdt:P577), the above query query retrieved some properties that will come in handy for solving the following question. Such properties are:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P6216`   | copyright status  | predicate |
| `wdt:P747`   | has edition or translation  | predicate |
| `wdt:P136` | genre | predicate |
| `wdt:P361`   | part of | predicate |
| `wdt:P179`   | part of the series | predicate |


Most books have the property publication date (wdt:P577) therefore I will use it to retrieve such information. Before moving on I check some value of the publication date property in order to check if I will be able to retrieve useful information, i.e. the year of publication, with some SPARQL functions.

In [38]:
queryString = """
SELECT DISTINCT ?b ?bname ?pubDate
WHERE {
    
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P577 ?pubDate.
    ?b <http://schema.org/name> ?bname.
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
{ 
    SELECT ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('b', 'http://www.wikidata.org/entity/Q3597882'), ('bname', '1934'), ('pubDate', '1982-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q17629132'), ('bname', 'È Stato la mafia'), ('pubDate', '2014-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q21015518'), ('bname', 'Jack'), ('pubDate', '1875-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q2756710'), ('bname', 'Thaïs'), ('pubDate', '1890-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q22084323'), ('bname', 'La Philosophie critique de Kant'), ('pubDate', '1963-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q5332810'), ('bname', 'Echographies of Television'), ('pubDate', '1996-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q1213210'), ('bname', 'The Countess of Charny'), ('pubDate', '1853-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q17635217'), ('bname', "L'amore è un dio"), ('pubDate', '2007-01-01T00:00:00Z')]
[('b', 'http://www.wikidata.org/entity/Q

25

The date is in the datetime format therefore I try to retrieve only the year of publication by using. some SPARQL functions.

In [39]:
queryString = """
SELECT DISTINCT ?b ?bname ?yearPub
WHERE {
    
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P577 ?pubDate.
    BIND(year(?pubDate) AS ?yearPub).
    ?b <http://schema.org/name> ?bname.
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
{ 
    SELECT ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('b', 'http://www.wikidata.org/entity/Q3210864'), ('bname', 'Death and the Lumberjack'), ('yearPub', '1668')]
[('b', 'http://www.wikidata.org/entity/Q3597882'), ('bname', '1934'), ('yearPub', '1982')]
[('b', 'http://www.wikidata.org/entity/Q17629132'), ('bname', 'È Stato la mafia'), ('yearPub', '2014')]
[('b', 'http://www.wikidata.org/entity/Q20991677'), ('bname', 'The Twelve Abbots of Challant'), ('yearPub', '1981')]
[('b', 'http://www.wikidata.org/entity/Q21009532'), ('bname', 'Vernon Subutex 1'), ('yearPub', '2015')]
[('b', 'http://www.wikidata.org/entity/Q21015157'), ('bname', 'En Chine'), ('yearPub', '1924')]
[('b', 'http://www.wikidata.org/entity/Q21015288'), ('bname', 'Les Royautés'), ('yearPub', '1908')]
[('b', 'http://www.wikidata.org/entity/Q21015518'), ('bname', 'Jack'), ('yearPub', '1875')]
[('b', 'http://www.wikidata.org/entity/Q25338045'), ('bname', 'The Kid and Carlsson. The Little Prince'), ('yearPub', '1988')]
[('b', 'http://www.wikidata.org/entity/Q2756710'),

25

### BGP for obtaining italian and french writers who published a book in the last 50 years.
Since I assume there are a lot of writers I provide as output the total number of them.

In [40]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?writers
WHERE {
    
    ?b wdt:P50 ?w;
        wdt:P31 ?books;
        wdt:P577 ?pubDate.
        
    FILTER(datatype(?pubDate) = xsd:dateTime).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    BIND(year(xsd:datetime(?pubDate)) AS ?yearPub).
    FILTER(?yearPub >= 1970).
    
{ 
    SELECT ?w
    WHERE { 
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q142.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q38.
    }
    UNION
    {
        ?w wdt:P106 wd:Q36180;
            wdt:P27 wd:Q172579.
    }
    }
}
}
"""

print("Results")
run_query(queryString)

Results
[('writers', '1956')]


1

Equivalent formulation:

In [41]:
queryString = """
SELECT COUNT(DISTINCT ?w) AS ?writers
WHERE {
    
    ?b wdt:P50 ?w;
        wdt:P31 ?books;
        wdt:P577 ?pubDate.
        
    FILTER(datatype(?pubDate) = xsd:dateTime).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    BIND(year(xsd:datetime(?pubDate)) AS ?yearPub).
    FILTER(?yearPub >= 1970).
    
{ 
    SELECT ?w
    WHERE { 
    
        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.
        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    }
}
}
"""

print("Results")
run_query(queryString)

Results
[('writers', '1956')]


1

Just out of curiosity, I count the above writers based on the country.

In [42]:
queryString = """
SELECT ?country COUNT(DISTINCT ?w) AS ?writers
WHERE {
    
    ?b wdt:P50 ?w;
        wdt:P31 ?books;
        wdt:P577 ?pubDate.
        
    ?w wdt:P106 wd:Q36180;
                wdt:P27 ?c.   
                
    ?c <http://schema.org/name> ?country.
    
    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).    
    FILTER(datatype(?pubDate) = xsd:dateTime).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    BIND(year(xsd:datetime(?pubDate)) AS ?yearPub).
    FILTER(?yearPub >= 1970).
}
GROUP BY ?country
ORDER BY DESC(?writers)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('writers', '1271')]
[('country', 'Italy'), ('writers', '681')]
[('country', 'Kingdom of Italy'), ('writers', '193')]


3

## Comparing the number of books written by italian and french authors
This section contains the queries used to answer the second question: _Compare the number of books written by Italian and French writers_ .

I recall in the previous part of the workflow I retrieved these properties for books:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P577`   | publication date  | predicate |
| `wdt:P747`   | has edition or translation  | predicate |
| `wdt:P136` | genre | predicate |
| `wdt:P361`   | part of | predicate |
| `wdt:P179`   | part of the series | predicate |

I will now use them to investigate the number of books written by italian and french writers. It is important to notice that italian writers can belong to two countries: Italy (wd:Q38) and Kingdom of Italy (wd:Q172579). For this reason, all the following queries will have three results based on the three countries.

### Number of books written by italian and french authors
Firstly I count italian and french authors who published at least a book.

In [43]:
queryString = """
SELECT ?country COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {

    ?c <http://schema.org/name> ?country.
            
    ?b wdt:P50 ?w;
       wdt:P31 ?books.
       
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).   
    
}
GROUP BY ?country
ORDER BY DESC(?numberBooks)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('numberBooks', '10357')]
[('country', 'Italy'), ('numberBooks', '2699')]
[('country', 'Kingdom of Italy'), ('numberBooks', '1652')]


3

### Number of books written by italian and french authors published in the last 50 years
Based on the previous point, I count the number of books written by italian and french authors published in the last 50 years.

In [44]:
queryString = """
SELECT ?country COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {

    ?c <http://schema.org/name> ?country.
            
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P577 ?pubDate.
       
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
        
    FILTER(datatype(?pubDate) = xsd:dateTime).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    BIND(year(xsd:datetime(?pubDate)) AS ?pubYear).
    FILTER(?pubYear >= 1970). 
    
}
GROUP BY ?country
ORDER BY DESC(?numberBooks)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('numberBooks', '3859')]
[('country', 'Italy'), ('numberBooks', '1970')]
[('country', 'Kingdom of Italy'), ('numberBooks', '514')]


3

### Number of books based on their publication year
Now I check the number of books published by italian and french writers in the last 10 years. I also filter out publication dates that are set in the future in order to avoid considering weird data.

In [47]:
queryString = """
SELECT ?country ?pubYear COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {

    ?c <http://schema.org/name> ?country.
            
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P577 ?pubDate.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
        
    FILTER(datatype(?pubDate) = xsd:dateTime).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    BIND(year(xsd:datetime(?pubDate)) AS ?pubYear).
    FILTER(?pubDate < NOW()).
    FILTER(?pubYear >= 2010). 
    

}
GROUP BY ?country ?pubYear
ORDER BY ?pubYear DESC(?numberBooks)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('pubYear', '2010'), ('numberBooks', '114')]
[('country', 'Italy'), ('pubYear', '2010'), ('numberBooks', '88')]
[('country', 'Kingdom of Italy'), ('pubYear', '2010'), ('numberBooks', '9')]
[('country', 'France'), ('pubYear', '2011'), ('numberBooks', '113')]
[('country', 'Italy'), ('pubYear', '2011'), ('numberBooks', '71')]
[('country', 'Kingdom of Italy'), ('pubYear', '2011'), ('numberBooks', '3')]
[('country', 'France'), ('pubYear', '2012'), ('numberBooks', '112')]
[('country', 'Italy'), ('pubYear', '2012'), ('numberBooks', '71')]
[('country', 'France'), ('pubYear', '2013'), ('numberBooks', '87')]
[('country', 'Italy'), ('pubYear', '2013'), ('numberBooks', '65')]
[('country', 'Kingdom of Italy'), ('pubYear', '2013'), ('numberBooks', '5')]
[('country', 'France'), ('pubYear', '2014'), ('numberBooks', '99')]
[('country', 'Italy'), ('pubYear', '2014'), ('numberBooks', '45')]
[('country', 'France'), ('pubYear', '2015'), ('numberBooks', '90')]
[('country', 'I

30

As expected french authors always publish more books than italians. That might be due to the fact that there are more french authors than italians and France population is also larger. Another interesting point is that books published by writers from Kingdom of Italy decrease in number with the increase of the year and from 2018 we have no books published by them. That is due to the fact that Italy has been a republic since 1946 therefore authors that are from the Kingdom of Italy are getting old and most of them, if they are alive, are no longer active.

### Number of books based on the number of translations
Now I check if there are more books written by italian or franch authors that have been translated in more than 5 different languages. in order to do that I use the property "has edition or translation" (wdt:P747). Firstly I check what type of information is stored with that property.

In [48]:
queryString = """
SELECT DISTINCT ?bname ?edition ?instanceOf
WHERE {
            
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P747 ?e.
       
    ?e wdt:P31 ?i.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
        
    ?b <http://schema.org/name> ?bname.
    ?e <http://schema.org/name> ?edition.
    ?i <http://schema.org/name> ?instanceOf.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('bname', 'Life Is Elsewhere'), ('edition', 'Život je jinde'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Life Is Elsewhere'), ('edition', 'La vie est ailleurs'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Thaïs'), ('edition', 'Thaïs'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'The Countess of Charny'), ('edition', 'Hrabina Charny'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Le Chevalier de Maison-Rouge'), ('edition', 'Kawaler de Maison Rouge'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Le Chevalier de Maison-Rouge'), ('edition', 'Kawaler de Maison Rouge'), ('instanceOf', 'translated work')]
[('bname', 'Le Chevalier de Maison-Rouge'), ('edition', 'Le Chevalier de Maison-Rouge'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Ange Pitou'), ('edition', 'Ange Pitou'), ('instanceOf', 'version, edition, or translation')]
[('bname', 'Ange Pitou'), ('edition', 

15

This query gave me no interesting results therefore I try and see how many triple of such type there are for each book.

In [49]:
queryString = """
SELECT ?b ?bname COUNT(?e) AS ?editions
WHERE {
            
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P747 ?e.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
        
    ?b <http://schema.org/name> ?bname.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
GROUP BY ?b ?bname
ORDER BY DESC(?editions)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('b', 'http://www.wikidata.org/entity/Q25338'), ('bname', 'The Little Prince'), ('editions', '14')]
[('b', 'http://www.wikidata.org/entity/Q1219561'), ('bname', 'Around the World in Eighty Days'), ('editions', '11')]
[('b', 'http://www.wikidata.org/entity/Q11834'), ('bname', 'Puss in Boots'), ('editions', '11')]
[('b', 'http://www.wikidata.org/entity/Q98688752'), ('bname', 'Historia Ecclesiastica'), ('editions', '9')]
[('b', 'http://www.wikidata.org/entity/Q183565'), ('bname', 'Twenty Thousand Leagues Under the Sea'), ('editions', '8')]
[('b', 'http://www.wikidata.org/entity/Q219457'), ('bname', 'A Journey to the Center of the Earth'), ('editions', '8')]
[('b', 'http://www.wikidata.org/entity/Q77117286'), ('bname', 'La Traviata : [programme] : opera in three acts'), ('editions', '8')]
[('b', 'http://www.wikidata.org/entity/Q1156065'), ('bname', 'The Joke'), ('editions', '7')]
[('b', 'http://www.wikidata.org/entity/Q612570'), ('bname', 'Lelia'), ('editions', '6')]
[('b', 'http:

25

I guess this property is more related to the various editions of a book rather than the translations in different languages. It could be because a translated book is considered a different book since the title might change. Given the results I could count the number of books published by each country that have more that 5 editions or translations.

In [50]:
queryString = """
SELECT ?country COUNT(?b) AS ?books
WHERE {
{    
    SELECT ?country ?b WHERE{
    
        ?b wdt:P50 ?w;
           wdt:P31 ?books;
           wdt:P747 ?e.

        ?w wdt:P106 wd:Q36180;
            wdt:P27 ?c.

        ?c <http://schema.org/name> ?country.

        FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).

        FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
        
    }
    GROUP BY ?country ?b
    HAVING(COUNT(?e) >= 5)
}
}
GROUP BY ?country
ORDER BY DESC(?books)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('books', '29')]
[('country', 'Kingdom of Italy'), ('books', '2')]


2

France has way more books with more than 5 editions than Italy. However, I am not sure this property is actually useful.

### Number of books based on the genre
Now I count the number of books of italian and french authors based on the genre (wdt:P136). First of all I retrieve how many genres we have.

In [51]:
queryString = """
SELECT COUNT(DISTINCT ?g) AS ?genres
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P136 ?g.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
"""

print("Results")
run_query(queryString)

Results
[('genres', '345')]


1

There are 345 different genres. Before moving on with the comparison I retrieve the top 15 most popular genres across all books written by italian and french authors.

In [52]:
queryString = """
SELECT ?genre COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P136 ?g.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
    
    ?g <http://schema.org/name> ?genre.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
GROUP BY ?genre
ORDER BY DESC(?numberBooks)
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('genre', 'novel'), ('numberBooks', '5256')]
[('genre', 'essay'), ('numberBooks', '992')]
[('genre', 'short story'), ('numberBooks', '680')]
[('genre', 'crime novel'), ('numberBooks', '355')]
[('genre', 'song'), ('numberBooks', '344')]
[('genre', 'science fiction novel'), ('numberBooks', '313')]
[('genre', 'poetry'), ('numberBooks', '311')]
[('genre', 'espionage novel'), ('numberBooks', '285')]
[('genre', 'historical novel'), ('numberBooks', '192')]
[('genre', 'autobiographical novel'), ('numberBooks', '117')]
[('genre', 'fable'), ('numberBooks', '100')]
[('genre', 'adventure novel'), ('numberBooks', '92')]
[('genre', 'theatre'), ('numberBooks', '88')]
[('genre', 'novella'), ('numberBooks', '86')]
[('genre', 'autobiography'), ('numberBooks', '84')]


15

Now I count the number of books written by italian and french authors based on the most popular genres.

In [56]:
queryString = """
SELECT ?country ?genre COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P136 ?g.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
    
    ?g <http://schema.org/name> ?genre.
    ?c <http://schema.org/name> ?country.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
GROUP BY ?country ?genre
ORDER BY DESC(?numberBooks)
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('genre', 'novel'), ('numberBooks', '3603')]
[('country', 'Italy'), ('genre', 'novel'), ('numberBooks', '1545')]
[('country', 'Kingdom of Italy'), ('genre', 'novel'), ('numberBooks', '618')]
[('country', 'France'), ('genre', 'essay'), ('numberBooks', '592')]
[('country', 'France'), ('genre', 'short story'), ('numberBooks', '493')]
[('country', 'Italy'), ('genre', 'essay'), ('numberBooks', '385')]
[('country', 'France'), ('genre', 'song'), ('numberBooks', '344')]
[('country', 'France'), ('genre', 'crime novel'), ('numberBooks', '333')]
[('country', 'France'), ('genre', 'espionage novel'), ('numberBooks', '285')]
[('country', 'France'), ('genre', 'poetry'), ('numberBooks', '245')]
[('country', 'France'), ('genre', 'science fiction novel'), ('numberBooks', '240')]
[('country', 'Kingdom of Italy'), ('genre', 'essay'), ('numberBooks', '169')]
[('country', 'France'), ('genre', 'historical novel'), ('numberBooks', '167')]
[('country', 'Italy'), ('genre', 'short

15

### Number of books that are part of a book series
In the end, I check if there are more books written by italians or french authors that are part of a book series. In order to do that I use the properties "part of the series" (wdt:P179) and "part of" (wdt:P361). Firstly I check how such triples look like.

In [57]:
queryString = """
SELECT DISTINCT ?bname ?pname ?object
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P179 ?o.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    ?b <http://schema.org/name> ?bname.
    wdt:P179 <http://schema.org/name> ?pname.
    ?o <http://schema.org/name> ?object.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('bname', "L'Œuvre"), ('pname', 'part of the series'), ('object', 'Les Rougon-Macquart')]
[('bname', 'Le Parfum de la dame en noir'), ('pname', 'part of the series'), ('object', 'Joseph Rouletabille')]
[('bname', 'The Firm of Nucingen'), ('pname', 'part of the series'), ('object', 'La Comédie humaine')]
[('bname', 'Heretic Dawn'), ('pname', 'part of the series'), ('object', 'Fortune de France')]
[('bname', "L'avenir comme je l'imaginais, ou pas"), ('pname', 'part of the series'), ('object', 'Ma vie selon moi')]
[('bname', 'Ensemble, enfin !'), ('pname', 'part of the series'), ('object', 'Ma vie selon moi')]
[('bname', 'Sous le soleil de Floride'), ('pname', 'part of the series'), ('object', 'Ma vie selon moi')]
[('bname', 'The Adventures of Captain Hatteras'), ('pname', 'part of the series'), ('object', 'Voyages Extraordinaires')]
[('bname', 'The Three Musketeers'), ('pname', 'part of the series'), ('object', "D'Artagnan Romances")]
[('bname', 'The Iron King'), ('pname', 'part

15

In [58]:
queryString = """
SELECT DISTINCT ?bname ?pname ?object
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P361 ?o.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    ?b <http://schema.org/name> ?bname.
    wdt:P361 <http://schema.org/name> ?pname.
    ?o <http://schema.org/name> ?object.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('bname', 'Sauron Defeated'), ('pname', 'part of'), ('object', 'The History of The Lord of the Rings')]
[('bname', 'Sauron Defeated'), ('pname', 'part of'), ('object', 'The History of Middle-earth')]
[('bname', 'The Firm of Nucingen'), ('pname', 'part of'), ('object', 'Scenes from Parisian life')]
[('bname', 'The Second Sex'), ('pname', 'part of'), ('object', "Le Monde's 100 Books of the Century")]
[('bname', 'Béatrix'), ('pname', 'part of'), ('object', 'Scenes from Private Life')]
[('bname', 'Gobseck'), ('pname', 'part of'), ('object', 'Scenes from Private Life')]
[('bname', 'Le Bal de Sceaux'), ('pname', 'part of'), ('object', 'Scenes from Private Life')]
[('bname', 'La Maison du chat-qui-pelote'), ('pname', 'part of'), ('object', 'Scenes from Private Life')]
[('bname', 'Un drame au bord de la mer'), ('pname', 'part of'), ('object', 'Philosophical studies')]
[('bname', 'Splendeurs et misères des courtisanes'), ('pname', 'part of'), ('object', 'Scenes from Parisian life')]
[(

15

Now I can count how many books that are part of a series are written by italian and french authors.

In [59]:
queryString = """
SELECT ?country COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P179|wdt:P361 ?o.
    
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.

    ?c <http://schema.org/name> ?country.

    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).

}
GROUP BY ?country
ORDER BY DESC(?numberBooks)
"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('numberBooks', '1628')]
[('country', 'Italy'), ('numberBooks', '51')]
[('country', 'Kingdom of Italy'), ('numberBooks', '20')]


3

## Copyright forms
This section contains the queries used to answer the third question: _Count how many books written by Italian authors are now released with a "public domain" copyright form_ .

### Retrieving the copyright information

I recall that, answering the first question I retrieved the property "copyright status" (wdt:P6216). Therefore I start from there to see if I can find the "public domain" copyright form. \
First of all I retrieve some triples containing that property in order to see the type of information it stores.

In [60]:
queryString = """
SELECT DISTINCT ?bname ?pname ?cr ?copyright 
WHERE {
    
    ?b wdt:P50 ?w;
           wdt:P31 ?books;
           wdt:P6216 ?cr.
           
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
        
    ?b <http://schema.org/name> ?bname.
    wdt:P6216 <http://schema.org/name> ?pname.
    ?cr <http://schema.org/name> ?copyright.
    
    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
}
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('bname', 'En Chine'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'Les Royautés'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'La cigarette'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'Jack'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'Thaïs'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', "L'Œuvre"), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'La Vie de Bohème'), ('pname', 'copyright status'), ('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('bname', 'Le Parfum de la dame

25

I discovered that "public domain" is the node wd:Q19652 therefore by binding a book ?b with the triple: 
" ?b wdt:P6216 wd:Q19652 " I retrieve all books with a "public domain" copyright. \
Before moving further, just out of curiosity I retrieve all the type of copyright present in the data.

In [61]:
queryString = """
SELECT DISTINCT ?cr ?copyright 
WHERE {
    
    ?b wdt:P50 ?w;
           wdt:P31 ?books;
           wdt:P6216 ?cr.
           
    ?w wdt:P106 wd:Q36180;
        wdt:P27 ?c.
        
    ?b <http://schema.org/name> ?bname.
    wdt:P6216 <http://schema.org/name> ?pname.
    ?cr <http://schema.org/name> ?copyright.
    
    FILTER(?c = wd:Q142 || ?c = wd:Q38 || ?c = wd:Q172579).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
}
"""

print("Results")
run_query(queryString)

Results
[('cr', 'http://www.wikidata.org/entity/Q19652'), ('copyright', 'public domain')]
[('cr', 'http://www.wikidata.org/entity/Q50423863'), ('copyright', 'copyrighted')]
[('cr', 'http://www.wikidata.org/entity/Q24238356'), ('copyright', 'unknown')]


3

It comes with no surprise these are the only type of copyright available.

### Number of books written by italian authors with a "public domain" copyright
I now count how many books written by italian authors have a "public domain" copyright (wd:Q19652). I assume that by "italian authors" it is intended both authors from Italy (wd:Q38) and Kingdom of Italy (wd:Q172579).

In [62]:
queryString = """
SELECT ?country COUNT(DISTINCT ?b) AS ?freeBooks
WHERE {
    
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P6216 wd:Q19652.
           
    ?w wdt:P106 wd:Q36180;
       wdt:P27 ?c.
        
    ?c <http://schema.org/name> ?country.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
}
GROUP BY ?country
ORDER BY DESC(?freeBooks)
"""

print("Results")
run_query(queryString)

Results
[('country', 'Kingdom of Italy'), ('freeBooks', '500')]
[('country', 'Italy'), ('freeBooks', '42')]


2

As expected Kingdom of Italy authors have lots of books with a "public domain" copyright because they are old books and after a certain amount of time after the author died its work becomes of public domain. \
Just out of curiosity I check the number of books authors that have "public domain" copyright over the total number.

In [63]:
queryString = """
SELECT ?country COUNT(DISTINCT ?b) AS ?freeBooks ?totBooks
WHERE {
    
    ?b wdt:P50 ?w;
       wdt:P31 ?books;
       wdt:P6216 wd:Q19652.
           
    ?w wdt:P106 wd:Q36180;
       wdt:P27 ?c.
       
    ?c <http://schema.org/name> ?country.
        
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
    {
        SELECT ?country COUNT(DISTINCT ?b) AS ?totBooks
        WHERE{
            ?b wdt:P50 ?w;
               wdt:P31 ?books.
           
            ?w wdt:P106 wd:Q36180;
               wdt:P27 ?c.
               
            ?c <http://schema.org/name> ?country.
        
            FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
            FILTER(?c = wd:Q38 || ?c = wd:Q172579).
        }
        GROUP BY ?country
    }   
}
GROUP BY ?country ?totBooks
"""

print("Results")
run_query(queryString)

Results
[('country', 'Kingdom of Italy'), ('freeBooks', '500'), ('totBooks', '1652')]
[('country', 'Italy'), ('freeBooks', '42'), ('totBooks', '2699')]


2

This result confirm my assumption. In fact, more than 30% of such books have "public domain" copyright whereas for italian books, only the 1,6% of them have the "public domain" copyright.

## Nobel Prize Literature
This section contains the queries used to answer the fourth question: _How many Literature Nobel awards won authors from Italy and from the Kingdom of Italy?_ 

I recall that from the previous workflows I discovered one property called "award received" (wdt:P166) that writers are subect to. Therefore I can start by retrieving all italian writers who received an award and I can see how these triples look like.

In [64]:
queryString = """
SELECT DISTINCT ?wname ?pname ?o ?object
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 ?o;
       wdt:P27 ?c.
    
    ?w <http://schema.org/name> ?wname.
    wdt:P166 <http://schema.org/name> ?pname.
    ?o <http://schema.org/name> ?object.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
}
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('wname', 'Mario Cavaliere'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q28871406'), ('object', 'Jugendliteraturpreis Altmühlfranken')]
[('wname', 'Renzo De Felice'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q14539974'), ('object', 'Knight Grand Cross of the Order of Merit of the Italian Republic')]
[('wname', 'Paolo Caccia Dominioni'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q15042072'), ('object', 'Gold Medal of Military Valour')]
[('wname', 'Renzo De Felice'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q3638154'), ('object', 'Italian Order of Merit for Culture and Art')]
[('wname', 'Magdi Allam'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q1158951'), ('object', 'Dan David Prize')]
[('wname', 'Eva Cantarella'), ('pname', 'award received'), ('o', 'http://www.wikidata.org/entity/Q788568'), ('object', 'Bagutta Prize')]
[('wname', 'Natalia Ginzburg'), ('

25

Now I retrieve the number of different prizes italian writers won.

In [65]:
queryString = """
SELECT COUNT(DISTINCT ?o) AS ?distinctPrizes
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 ?o;
       wdt:P27 ?c.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
}
"""

print("Results")
run_query(queryString)

Results
[('distinctPrizes', '521')]


1

Since the number is large I retrieve the top 15 prizes most won by italians.

In [66]:
queryString = """
SELECT ?prize COUNT(DISTINCT ?w) AS ?winners
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 ?o;
       wdt:P27 ?c.
    
    ?o <http://schema.org/name> ?prize.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
}
GROUP BY ?prize
ORDER BY DESC(?winners)
LIMIT 15
"""

print("Results")
run_query(queryString)

Results
[('prize', 'Viareggio Prize'), ('winners', '89')]
[('prize', 'Order of Merit of the Italian Republic'), ('winners', '82')]
[('prize', 'Knight Grand Cross of the Order of Merit of the Italian Republic'), ('winners', '78')]
[('prize', 'Bagutta Prize'), ('winners', '75')]
[('prize', 'Strega Prize'), ('winners', '68')]
[('prize', 'Medal of Military Valour'), ('winners', '40')]
[('prize', 'Feltrinelli Prize'), ('winners', '34')]
[('prize', 'Italian Order of Merit for Culture and Art'), ('winners', '26')]
[('prize', "Ambrogino d'oro"), ('winners', '18')]
[('prize', 'Premio Campiello'), ('winners', '15')]
[('prize', 'Commander of the Order of Merit of the Italian Republic'), ('winners', '14')]
[('prize', 'Urania Award'), ('winners', '14')]
[('prize', 'Grand Officer of the Order of Merit of the Italian Republic'), ('winners', '13')]
[('prize', 'Napoli Prize'), ('winners', '13')]
[('prize', 'Gold Medal of Military Valour'), ('winners', '12')]


15

Just a few italians have won the Nobel Prize Literature therefore it won't appear in the top 15. However I know from the instructions that wd:Q37922 is the node for the Nobel Prize Literature therefore I check if there are some italian writers ?w that have the triple " ?w wdt:P166 wd:Q37922 ".

In [67]:
queryString = """
SELECT DISTINCT ?w ?writerName ?country
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 wd:Q37922;
       wdt:P27 ?c.
       
    ?w <http://schema.org/name> ?writerName.
    ?c <http://schema.org/name> ?country.
    
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
}
"""

print("Results")
run_query(queryString)

Results
[('w', 'http://www.wikidata.org/entity/Q7728'), ('writerName', 'Grazia Deledda'), ('country', 'Kingdom of Italy')]
[('w', 'http://www.wikidata.org/entity/Q43440'), ('writerName', 'Giosuè Carducci'), ('country', 'Kingdom of Italy')]
[('w', 'http://www.wikidata.org/entity/Q1403'), ('writerName', 'Luigi Pirandello'), ('country', 'Kingdom of Italy')]
[('w', 'http://www.wikidata.org/entity/Q83038'), ('writerName', 'Salvatore Quasimodo'), ('country', 'Kingdom of Italy')]
[('w', 'http://www.wikidata.org/entity/Q83038'), ('writerName', 'Salvatore Quasimodo'), ('country', 'Italy')]
[('w', 'http://www.wikidata.org/entity/Q765'), ('writerName', 'Dario Fo'), ('country', 'Italy')]


6

After performing a quick check on Wikipedia I can state the query retrieved all italian writers who won a Nobel Prize for Literature.

## Vatican Library
This section contains the queries used to answer the last question: _Are there books from Literature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatical Library)?_

### Retrieving books stored in the Vatican Library
First of all I need to understand how to retrieve if a book is stored in the Vatican Library. I have the node Vatican Library (wd:Q213678) therefore I count how many properties there are for it in both directions.

In [68]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?subjProps
WHERE {
           
    wd:Q213678 ?p ?o.
    
}
"""

print("Results")
run_query(queryString)

Results
[('subjProps', '79')]


1

In [70]:
queryString = """
SELECT COUNT(DISTINCT ?p) AS ?objProps
WHERE {
           
    ?s ?p wd:Q213678.
    
}
"""

print("Results")
run_query(queryString)

Results
[('objProps', '20')]


1

Since they are not much I can list them all, I start from the one where Vatican Library is the object ordered based on the number of subjects linked to them.

In [71]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?s) AS ?subjs
WHERE {
           
    ?s ?p wd:Q213678.
    ?p <http://schema.org/name> ?pname.
    
}
GROUP BY ?p ?pname
ORDER BY DESC(?subjs)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P195'), ('pname', 'collection'), ('subjs', '61')]
[('p', 'http://www.wikidata.org/prop/direct/P276'), ('pname', 'location'), ('subjs', '24')]
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('pname', 'employer'), ('subjs', '19')]
[('p', 'http://www.wikidata.org/prop/direct/P921'), ('pname', 'main subject'), ('subjs', '9')]
[('p', 'http://www.wikidata.org/prop/direct/P1416'), ('pname', 'affiliation'), ('subjs', '4')]
[('p', 'http://www.wikidata.org/prop/direct/P361'), ('pname', 'part of'), ('subjs', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P9419'), ('pname', 'personal library at'), ('subjs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P2378'), ('pname', 'issued by'), ('subjs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P5869'), ('pname', 'model item'), ('subjs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1629'), ('pname', 'Wikidata item of this property'), ('subjs', '1')]
[('p', 'http://www.wikidata.or

18

Properties "owned by" (wdt:P127), "location" (wdt:P276) and "archives at" (wdt:P485) look promising. Before investigating them further I also list all properties Vatican Library is subject to, also ordered based on the number of object linked to them.

In [72]:
queryString = """
SELECT ?p ?pname COUNT(DISTINCT ?o) AS ?objs
WHERE {
           
    wd:Q213678 ?p ?o.
    ?p <http://schema.org/name> ?pname.
    
}
GROUP BY ?p ?pname
ORDER BY DESC(?objs)
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P463'), ('pname', 'member of'), ('objs', '3')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('pname', 'Bibliothèque nationale de France ID'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1207'), ('pname', 'NUKAT ID'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P1017'), ('pname', 'Vatican Library ID (former scheme)'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P9661'), ('pname', 'EBAF authority ID'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P2924'), ('pname', 'Great Russian Encyclopedia Online ID'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P4808'), ('pname', 'Royal Academy new identifier'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P856'), ('pname', 'official website'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P791'), ('pname', 'ISIL'), ('objs', '1')]
[('p', 'http://www.wikidata.org/prop/direct/P9037'), ('pname',

60

I found no useful property in this direction, therefore I am going to check the subjects linked to the properties I discovered in the previous query.

In [73]:
queryString = """
SELECT ?i ?instanceOf COUNT(DISTINCT ?s) AS ?subjs
WHERE {
           
    ?s wdt:P127|wdt:P276|wdt:P485 wd:Q213678;
        wdt:P31 ?i.
    ?i <http://schema.org/name> ?instanceOf.
    
}
GROUP BY ?i ?instanceOf
ORDER BY DESC(?subjs)
"""

print("Results")
run_query(queryString)

Results
[('i', 'http://www.wikidata.org/entity/Q48498'), ('instanceOf', 'illuminated manuscript'), ('subjs', '13')]
[('i', 'http://www.wikidata.org/entity/Q87167'), ('instanceOf', 'manuscript'), ('subjs', '5')]
[('i', 'http://www.wikidata.org/entity/Q5292'), ('instanceOf', 'encyclopedia'), ('subjs', '4')]
[('i', 'http://www.wikidata.org/entity/Q830560'), ('instanceOf', 'bestiary'), ('subjs', '4')]
[('i', 'http://www.wikidata.org/entity/Q213924'), ('instanceOf', 'codex'), ('subjs', '2')]
[('i', 'http://www.wikidata.org/entity/Q7725634'), ('instanceOf', 'literary work'), ('subjs', '2')]
[('i', 'http://www.wikidata.org/entity/Q856638'), ('instanceOf', 'library catalog'), ('subjs', '1')]
[('i', 'http://www.wikidata.org/entity/Q1754581'), ('instanceOf', 'Evangeliary'), ('subjs', '1')]
[('i', 'http://www.wikidata.org/entity/Q3052382'), ('instanceOf', 'fonds'), ('subjs', '1')]
[('i', 'http://www.wikidata.org/entity/Q5'), ('instanceOf', 'human'), ('subjs', '1')]
[('i', 'http://www.wikidata.org

17

The objects of such properties are books therefore I can retrieve which books are stored in the Vatican Library by means of these three properties.

### Retrieving authors who won a Nobel Literature Prize whose books are not stored in the Vatican Library

I use the query from the previous question to retrieve authors who won a Nobel Literature Prize and I check if their books are stored in the Vatican Library.

In [74]:
queryString = """
SELECT DISTINCT ?w ?writerName
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 wd:Q37922;
       wdt:P27 ?c.
       
    ?w <http://schema.org/name> ?writerName.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    
    FILTER NOT EXISTS{
        ?b wdt:P50 ?w;
           wdt:P127|wdt:P276|wdt:P485 wd:Q213678;
           wdt:P31 ?books.
        FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    }
    
}
"""

print("Results")
run_query(queryString)

Results
[('w', 'http://www.wikidata.org/entity/Q7728'), ('writerName', 'Grazia Deledda')]
[('w', 'http://www.wikidata.org/entity/Q43440'), ('writerName', 'Giosuè Carducci')]
[('w', 'http://www.wikidata.org/entity/Q1403'), ('writerName', 'Luigi Pirandello')]
[('w', 'http://www.wikidata.org/entity/Q765'), ('writerName', 'Dario Fo')]
[('w', 'http://www.wikidata.org/entity/Q83038'), ('writerName', 'Salvatore Quasimodo')]


5

It turns out none of the authors who won a Nobel Literature Prize have a book stored in the Vatican Library. I now retrieve the one with more books not stored in the Vatican Library.

In [75]:
queryString = """
SELECT ?writerName COUNT(DISTINCT ?b) AS ?numberBooks
WHERE {
           
    ?w wdt:P106 wd:Q36180;
       wdt:P166 wd:Q37922;
       wdt:P27 ?c.
    
    ?b wdt:P50 ?w;
       wdt:P31 ?books.
       
    ?w <http://schema.org/name> ?writerName.
    
    FILTER(?c = wd:Q38 || ?c = wd:Q172579).
    FILTER(?books = wd:Q571 || ?books = wd:Q7725634 || ?books = wd:Q47461344).
    
    FILTER NOT EXISTS{ ?b wdt:P127|wdt:P276|wdt:P485 wd:Q213678. }
    
}
GROUP BY ?writerName
ORDER BY DESC(?numberBooks)
LIMIT 1
"""

print("Results")
run_query(queryString)

Results
[('writerName', 'Luigi Pirandello'), ('numberBooks', '45')]


1

Luigi Pirandello is the author who won a Nobel Literature PRize with the highest number of books not store in the Vatican Library.