# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-ab8f3961c4-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Nobel laureates explorative search") 


Consider the following exploratory scenario:

>  Investigate the authors who won the Nobel award for literature and get information about the nationality of the winners and analyse their literary production



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P106`    | profession    | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`     | country       | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wd:Q37922`   | Nobel Prize in Literature     | node |
| `wd:Q36180`   | writer        | node |
| `wd:Q38`      | Italy         | node |
| `wd:Q3624078` | Sovereign state  | node |
| `wd:Q213678`  | Vatican Library  | node |


Also consider

```
?p wdt:P27 wd:Q38  . 
?p wdt:P106/wdt:P279* wd:Q36180  . 
```

is the BGP to retrieve all **italian authors**

The workload should


1. Identify the BGP for obtaining the nobel laureates and the nominees with their relevant attributes

2. Get the number of Italian, French and Iranian winners and check which nation won more awards in the last twenty years.

3. Determine how many Literature nobel laureates have the following contries: Italy, Germany, France, Romania, Denmark, Iran, and China

4. How many Literature Nobel award winners have a PhD (aka Doctorate of Philosophy) (you may check if they have a doctoral advisor)?

5. Are there books from Litarature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatical Library)?

## 1. Identify the BGP for obtaining the nobel laureates and the nominees with their relevant attributes

First, I take some ten casual Italian Authors.

In [2]:
queryString = """
select distinct * where {
    ?p wdt:P27 wd:Q38 .
    ?p wdt:P106/wdt:P279* wd:Q36180 .
    
    ?p <http://schema.org/name> ?name .
} limit 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q61981709'), ('name', 'Francesco Lo Coco')]
[('p', 'http://www.wikidata.org/entity/Q16185271'), ('name', 'Antonello Bonci')]
[('p', 'http://www.wikidata.org/entity/Q43094174'), ('name', 'Cecilia Guariglia')]
[('p', 'http://www.wikidata.org/entity/Q59528334'), ('name', 'Mauro Mancia')]
[('p', 'http://www.wikidata.org/entity/Q48362107'), ('name', 'Fabrizio Doricchi')]
[('p', 'http://www.wikidata.org/entity/Q104212291'), ('name', 'Manuela Musco')]
[('p', 'http://www.wikidata.org/entity/Q102047173'), ('name', 'Davide Del Popolo Riolo')]
[('p', 'http://www.wikidata.org/entity/Q90217125'), ('name', 'Giovanna Tosato')]
[('p', 'http://www.wikidata.org/entity/Q96185295'), ('name', 'Enrica Petrini')]
[('p', 'http://www.wikidata.org/entity/Q22582937'), ('name', 'Lorenzo Losa')]


10

Print all the properties releated to Italian authors

In [3]:
queryString = """
select distinct ?p ?pName where {
    ?author wdt:P27 wd:Q38 ;
            wdt:P106/wdt:P279* wd:Q36180 ;
            ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P9199'), ('pName', 'Protagonisti della storia delle scienze della mente ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2798'), ('pName', 'Loop ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3835'), ('pName', 'Mendeley person ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1005'), ('pName', 'Portuguese National Library ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1006'), ('pName', 'Nationale Thesaurus voor Auteurs ID')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pName', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1015'), ('pName', 'NORAF ID')]
[('p', 'http://www.wikidata.org/prop/direct/P103'), ('pName', 'native language')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pName', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P1066'), ('pName', 'student of')]
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('pName', 'employer')]
[('p', 'http://www.wikidata.org/prop/dir

1397

Search for winner of nominees properites.

In [4]:
queryString = """
select distinct ?p ?pName where {
    ?author wdt:P27 wd:Q38 ;
            wdt:P106/wdt:P279* wd:Q36180 ;
            ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "win|nomin|rece", "i")).
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('pName', 'nominated for')]
[('p', 'http://www.wikidata.org/prop/direct/P6150'), ('pName', 'Academy Awards Database nominee ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3360'), ('pName', 'Nobel Prize People Nomination ID')]
[('p', 'http://www.wikidata.org/prop/direct/P4286'), ('pName', 'Nominis saint ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5645'), ('pName', 'Académie française award winner ID')]
[('p', 'http://www.wikidata.org/prop/direct/P7184'), ('pName', 'Awards & Winners artist ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5719'), ('pName', 'National Medal of Arts winner ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pName', 'winner')]


9

And so I get the property *award received (**P166**)*, *winner (**P1346**)* and the property *nominated for (**P1411**)*. To get people who won a Nobel prize, first I need the code for Nobel Prize. I can get it looking for what entities the Nobel Prize for literature is instance or is subclass of.

In [5]:
queryString = """
select ?p ?award ?awardName where {
    values ?p { wdt:P31 wdt:P279 } .
    
    wd:Q37922 ?p ?award.
    
    ?award <http://schema.org/name> ?awardName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('award', 'http://www.wikidata.org/entity/Q7191'), ('awardName', 'Nobel Prize')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('award', 'http://www.wikidata.org/entity/Q378427'), ('awardName', 'literary award')]


2

So I need to look for awards that are subclasses of *Nobel Prize (**Q7191**)*. Let's count people who has been nominated or winned a Nobel.

In [6]:
queryString = """
select ?p (count(distinct ?author) as ?numAuthors) where {
    values ?p { wdt:P1411 wdt:P1346 wdt:P166 } .

    ?author wdt:P106/wdt:P279* wd:Q36180 ;
            ?p ?award .
    ?award wdt:P279 wd:Q7191 .
} group by ?p
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('numAuthors', '901')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('numAuthors', '2132')]


2

So the property winner is never connected to a Nobel Prize. From the author I can get only the nominations. I then try to get the properties for Nobel Prize to see if there is a property that connect it to winners.

In [7]:
queryString = """
select distinct ?p ?pName where {
    ?award ?p ?o ;
            wdt:P279 wd:Q7191 .
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P6291'), ('pName', 'advertises')]
[('p', 'http://www.wikidata.org/prop/direct/P112'), ('pName', 'founded by')]
[('p', 'http://www.wikidata.org/prop/direct/P1296'), ('pName', 'Gran Enciclopèdia Catalana ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('pName', 'winner')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('pName', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows')]
[('p', 'http://www.wikidata.org/prop/direct/P1617'), ('pName', 'BBC Things ID')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country')]
[('p', 'http://www.wikidata.org/prop/direct/P1705'), ('pName', 'native label')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pName', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P2257'), ('pName', 'event interval')]
[('p', 'http://www.wikidat

53

So the property *winner (**P1346**)* connects the Nobel Prize to the authors. The BGP for the authors who winned or has been nominated for a Nobel Prize in Literature is:

```
?author ^wdt:P1346|wdt:P1411|wdt:P166 wd:Q37922 .
```

The BGP for the winners is:

```
?author ^wdt:P1346|wdt:P166 wd:Q37922 .
```

And for the nominations:

```
?author wdt:P1411 wd:Q37922 .
```

An alternative BGP, that let differentiate between nomination and win could be:

```
values ?p { wdt:P1411 wdt:P1346 wdt:P166 } .
{ ?author ?p wd:Q37922 . } union { wd:Q37922 ?p ?author } .
```

Or combining ```wdt:P279``` in the path, for example in the BGP for the nomination:

```
?author wdt:P1411|wdt:P279 wd:Q7191 .
```

For what concerns Nobel Prize in general, instead of ```wd:Q37922```, I need to use ```?award``` and then add the following BGP:

```
?award wdt:P279 wd:Q7191 .
```

The next two queries counts the total number of nobel laurates and nominees using the two different BGPs. The second one let me divide between winners and nominees.

In [8]:
queryString = """
select (count(distinct ?author) as ?numAuthors) where {
    ?author ^wdt:P1346|wdt:P1411|wdt:P166 ?award .
    
    ?award wdt:P279 wd:Q7191 .
}
"""

print("Results")
run_query(queryString)

Results
[('numAuthors', '2946')]


1

In [9]:
queryString = """
select ?p (count(distinct ?author) as ?numAuthors) where {
    values ?p { wdt:P1411 wdt:P1346 wdt:P166 } .
    
    { ?author ?p ?award . } union { ?award ?p ?author } .
    
    ?award wdt:P279 wd:Q7191 .
} group by ?p
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('numAuthors', '958')]
[('p', 'http://www.wikidata.org/prop/direct/P1346'), ('numAuthors', '178')]
[('p', 'http://www.wikidata.org/prop/direct/P1411'), ('numAuthors', '2395')]


3

The total number of 2946 nobel laurates and nominees is lower than the total 3531 of the second query, because some authors both winned and has been nominated for the Nobel Prize.

I can also count the winner of a nobel price for each category.

In [10]:
queryString = """
select ?awardName (count(distinct ?author) as ?numAuthors) where {
    ?author ^wdt:P1346|wdt:P166 ?award .
    
    ?award wdt:P279 wd:Q7191 .
    
    ?award <http://schema.org/name> ?awardName .
}
"""

print("Results")
run_query(queryString)

Results
[('awardName', 'Nobel Peace Prize'), ('numAuthors', '134')]
[('awardName', 'Prize in Economic Sciences in Memory of Alfred Nobel'), ('numAuthors', '86')]
[('awardName', 'Nobel Prize in Physics'), ('numAuthors', '215')]
[('awardName', 'Nobel Prize in Literature'), ('numAuthors', '118')]
[('awardName', 'Nobel Prize in Chemistry'), ('numAuthors', '185')]
[('awardName', 'Nobel Prize in Physiology or Medicine'), ('numAuthors', '222')]


6

For what concerns the attributes of the nominees, I can see all the properties connected to some nobel winner.

In [11]:
queryString = """
select distinct ?author ?authorName where {
    ?author ^wdt:P1346|wdt:P166 ?award .
    
    ?award wdt:P279 wd:Q7191 .
    ?author <http://schema.org/name> ?authorName .
} limit 5
"""

print("Results")
run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q115448'), ('authorName', 'Daniel Bovet')]
[('author', 'http://www.wikidata.org/entity/Q15900993'), ('authorName', 'Barry C. Barish')]
[('author', 'http://www.wikidata.org/entity/Q106751'), ('authorName', 'Alan J. Heeger')]
[('author', 'http://www.wikidata.org/entity/Q76766'), ('authorName', 'Gerd Binnig')]
[('author', 'http://www.wikidata.org/entity/Q192708'), ('authorName', 'Robert Coleman Richardson')]


5

In [12]:
queryString = """
select ?author ?p ?pName ?o ?oName where {
    values ?author { wd:Q115448 wd:Q106751 }
    
    ?author ?p ?o .
    
    optional { ?p <http://schema.org/name> ?pName } .
    optional { ?o <http://schema.org/name> ?oName } .
} order by ?author
"""

print("Results")
run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q106751'), ('p', 'http://www.wikidata.org/prop/direct/P101'), ('pName', 'field of work'), ('o', 'http://www.wikidata.org/entity/Q2329'), ('oName', 'chemistry')]
[('author', 'http://www.wikidata.org/entity/Q106751'), ('p', 'http://www.wikidata.org/prop/direct/P101'), ('pName', 'field of work'), ('o', 'http://www.wikidata.org/entity/Q413'), ('oName', 'physics')]
[('author', 'http://www.wikidata.org/entity/Q106751'), ('p', 'http://www.wikidata.org/prop/direct/P106'), ('pName', 'occupation'), ('o', 'http://www.wikidata.org/entity/Q1622272'), ('oName', 'university teacher')]
[('author', 'http://www.wikidata.org/entity/Q106751'), ('p', 'http://www.wikidata.org/prop/direct/P106'), ('pName', 'occupation'), ('o', 'http://www.wikidata.org/entity/Q169470'), ('oName', 'physicist')]
[('author', 'http://www.wikidata.org/entity/Q106751'), ('p', 'http://www.wikidata.org/prop/direct/P106'), ('pName', 'occupation'), ('o', 'http://www.wikidata.org/entit

194

The most relevant property releated to the nobel prize is probably the *Nobel prize ID (**P3188**)*, which has the form `{category}/laureates/{year}/{surname}`

I can so try to retrieve the year when the laureates won the nobel using regex.

In [13]:
queryString = """
select distinct ?author ?nobelID ?yearWin where {
    values ?author { wd:Q115448 wd:Q106751 }
    
    ?author wdt:P3188 ?nobelID .
    
    bind (xsd:gYear(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin)
} order by ?author
"""

print("Results")
run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q106751'), ('nobelID', 'chemistry/laureates/2000/heeger'), ('yearWin', '2000-01-01')]
[('author', 'http://www.wikidata.org/entity/Q115448'), ('nobelID', 'medicine/laureates/1957/bovet'), ('yearWin', '1957-01-01')]


2

So the final BGP to obtain the year when a laureates won a nobel is:

```
?author wdt:P3188 ?nobelID .
bind (xsd:gYear(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin)
```

I can also verify if the selected property is always present or not

In [37]:
queryString = """
select ?totalNobels ?nobelsWithID where {
    {
        select ?country (count(distinct ?nobel) as ?totalNobels) where {
            ?nobel ^wdt:P1346|wdt:P166 ?award .
    
            ?award wdt:P279 wd:Q7191 .
        }
    } .
    {
        select (count(distinct ?yearWin) as ?nobelsWithID) where {
            ?nobel ^wdt:P1346|wdt:P166 ?award ;
                    wdt:P3188 ?nobelID .            
            ?award wdt:P279 wd:Q7191 .
            
            bind (xsd:gYear(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin)
        }
    }
}
"""

print("Results")
run_query(queryString)

Results
[('totalNobels', '958'), ('nobelsWithID', '115')]


1

In [38]:
queryString = """
select ?countryName ?totalNobels ?nobelsWithID where {
    {
        select ?country (count(distinct ?nobel) as ?totalNobels) where {
            ?nobel ^wdt:P1346|wdt:P166 ?award ;
                    wdt:P27 ?country .
    
            ?award wdt:P279 wd:Q7191 .
        }
    } .
    {
        select ?country (count(distinct ?yearWin) as ?nobelsWithID) where {
            ?nobel ^wdt:P1346|wdt:P166 ?award ;
                    wdt:P3188 ?nobelID ;
                    wdt:P27 ?country .
            
            ?award wdt:P279 wd:Q7191 .
            
            bind (xsd:gYear(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin)
        }
    }
    
    ?country <http://schema.org/name> ?countryName .
} order by ?countryName
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Albania'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Argentina'), ('totalNobels', '5'), ('nobelsWithID', '5')]
[('countryName', 'Australia'), ('totalNobels', '12'), ('nobelsWithID', '10')]
[('countryName', 'Austria'), ('totalNobels', '18'), ('nobelsWithID', '12')]
[('countryName', 'Austria-Hungary'), ('totalNobels', '10'), ('nobelsWithID', '8')]
[('countryName', 'Austrian Empire'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Azerbaijan Democratic Republic'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Baltic Germans'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Bangladesh'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Belarus'), ('totalNobels', '1'), ('nobelsWithID', '1')]
[('countryName', 'Belgium'), ('totalNobels', '10'), ('nobelsWithID', '9')]
[('countryName', 'Bizone'), ('totalNobels', '1'), ('nobelsWithID', '0')]
[('countryName', 'Brazil'), ('totalNobels', '1

110

So, only in some cases I can retrieve the year when the prize is winned. But, in order to answer the next questions, this is the only way to obtain this property. From the second query, divided by country, we can also see that (except for USA), the nobel prizes for wich we cannot retrieve the year it is relatively well distributed, so I can use this subset of 115 nobel laureates as a good representation in order to compare countries.

## 2. Get the number of Italian, French and Iranian winners and check which nation won more awards in the last twenty years.

First of all, I need the code for Italy, French and Iran. I can found them looking for the countries that are instances of *Sovereign state (**Q3624078**)*

In [15]:
queryString = """
select ?country ?countryName where {
    ?country wdt:P31 wd:Q3624078 .

    ?country <http://schema.org/name> ?countryName .
        
    filter regex(?countryName, "[Ii]tal|[Ff]ranc|[Ii]ran") .
}
"""

print("Results")
run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q142'), ('countryName', 'France')]
[('country', 'http://www.wikidata.org/entity/Q38'), ('countryName', 'Italy')]
[('country', 'http://www.wikidata.org/entity/Q794'), ('countryName', 'Iran')]


3

So I got *France (**Q142**)*, *Italy (**Q38**)* and *Iran (**Q794**)*. Now I can get some nobel laureates and their nationality.

In [16]:
queryString = """
select distinct ?author ?authorName ?countryName ?yearWin ?yearElapsed where {
    values ?country { wd:Q142 wd:Q38 wd:Q794 } .

    ?author ^wdt:P1346|wdt:P166 ?award ;
            wdt:P3188 ?nobelID ;
            wdt:P27 ?country .
    
    ?award wdt:P279 wd:Q7191 .
    
    bind (xsd:int(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin) .
    bind (year(now()) - ?yearWin as ?yearElapsed) .
    
    ?author <http://schema.org/name> ?authorName .
    ?country <http://schema.org/name> ?countryName .
} limit 20
"""

print("Results")
run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q115448'), ('authorName', 'Daniel Bovet'), ('countryName', 'Italy'), ('yearWin', '1957'), ('yearElapsed', '65')]
[('author', 'http://www.wikidata.org/entity/Q103598'), ('authorName', 'Luc Montagnier'), ('countryName', 'France'), ('yearWin', '2008'), ('yearElapsed', '14')]
[('author', 'http://www.wikidata.org/entity/Q103844'), ('authorName', 'Françoise Barré-Sinoussi'), ('countryName', 'France'), ('yearWin', '2008'), ('yearElapsed', '14')]
[('author', 'http://www.wikidata.org/entity/Q109553'), ('authorName', 'Renato Dulbecco'), ('countryName', 'Italy'), ('yearWin', '1975'), ('yearElapsed', '47')]
[('author', 'http://www.wikidata.org/entity/Q17280087'), ('authorName', 'Emmanuelle Charpentier'), ('countryName', 'France')]
[('author', 'http://www.wikidata.org/entity/Q244395'), ('authorName', 'Seán MacBride'), ('countryName', 'France'), ('yearWin', '1974'), ('yearElapsed', '48')]
[('author', 'http://www.wikidata.org/entity/Q71023'), ('auth

20

And finally I count the number of Nobel laureates for the three countries

In [17]:
queryString = """
select ?countryName (count(distinct ?winner) as ?numWinners) where {
    values ?country { wd:Q142 wd:Q38 wd:Q794 } .

    ?winner ^wdt:P1346|wdt:P166 ?award ;
            wdt:P27 ?country .
    
    ?award wdt:P279 wd:Q7191 .
    
    ?country <http://schema.org/name> ?countryName .
} group by ?countryName
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Iran'), ('numWinners', '1')]
[('countryName', 'France'), ('numWinners', '70')]
[('countryName', 'Italy'), ('numWinners', '13')]


3

And the country that have the greates number of nobel laureates in the last 20 years is:

In [18]:
queryString = """
select ?country ?countryName (count(distinct ?winner) as ?numWinners) where {
    ?winner ^wdt:P1346|wdt:P166 ?award ;
            wdt:P3188 ?nobelID ;
            wdt:P27 ?country .
    
    ?award wdt:P279 wd:Q7191 .
    
    ?country <http://schema.org/name> ?countryName .
    
    bind (xsd:int(replace(?nobelID, "[a-zA-Z0-9]+/laureates/([0-9]+)/.*", "$1")) as ?yearWin) .
    
    filter (year(now()) - ?yearWin <= "20"^^xsd:int)
} group by ?country ?countryName
order by desc(?numWinners)
limit 1
"""

print("Results")
run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q30'), ('countryName', 'United States of America'), ('numWinners', '77')]


1

USA is the country with the greatest number of laureates in the last 20 year. We also need to consider that USA was also the country with less laureates for which we can retrieve the year, so the comparison is made considering only 89 laureates for USA, instead of the total number of 408. However, the winner still USA, so taking a subset of the laureates didn't affected the result.

## 3. Determine how many Literature nobel laureates have the following countries: Italy, Germany, France, Romania, Denmark, Iran, and China

First, I need to get the code for the countries.

In [19]:
queryString = """
select ?country ?countryName where {
    ?country wdt:P31 wd:Q3624078 .

    ?country <http://schema.org/name> ?countryName .
    
    filter regex(?countryName, "[Ii]tal|[Ff]ranc|[Ii]ran|[Gg]erman|[Rr]omania|[Dd]enmark|[Cc]hina")
}
"""

print("Results")
run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q142'), ('countryName', 'France')]
[('country', 'http://www.wikidata.org/entity/Q148'), ('countryName', "People's Republic of China")]
[('country', 'http://www.wikidata.org/entity/Q183'), ('countryName', 'Germany')]
[('country', 'http://www.wikidata.org/entity/Q218'), ('countryName', 'Romania')]
[('country', 'http://www.wikidata.org/entity/Q35'), ('countryName', 'Denmark')]
[('country', 'http://www.wikidata.org/entity/Q38'), ('countryName', 'Italy')]
[('country', 'http://www.wikidata.org/entity/Q7318'), ('countryName', 'Nazi Germany')]
[('country', 'http://www.wikidata.org/entity/Q794'), ('countryName', 'Iran')]
[('country', 'http://www.wikidata.org/entity/Q107312248'), ('countryName', 'Republic of China')]


9

So I got *France (**Q142**)*, *Italy (**Q38**)*, *Iran (**Q794**)*, *Germany (**Q183**)*, *Romania (**Q218**)*, *Denmark (**Q35**)*, *China (**Q148**)*. For what concern China, I use *People's Republic of China (PRC)*, because it is the real name of the China state (Republic of China is Taiwan). So I can retrieve the total number of laureates using the BGP for Nobel Laureates of the point 1.

In [20]:
queryString = """
select ?countryName (count(distinct ?author) as ?numLaureates) where {
    values ?country { wd:Q142 wd:Q38 wd:Q794 wd:Q183 wd:Q218 wd:Q35 wd:Q148 } .

    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            wdt:P27 ?country .
        
    ?country <http://schema.org/name> ?countryName .
} group by ?countryName
order by desc(?numLaureates)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'France'), ('numLaureates', '17')]
[('countryName', 'Germany'), ('numLaureates', '6')]
[('countryName', 'Denmark'), ('numLaureates', '4')]
[('countryName', 'Italy'), ('numLaureates', '3')]
[('countryName', "People's Republic of China"), ('numLaureates', '2')]
[('countryName', 'Romania'), ('numLaureates', '1')]


6

## 4. How many Literature Nobel award winners have a PhD (aka Doctorate of Philosophy) (you may check if they have a doctoral advisor)?

I can find for doctoral advisors looking at the properties which are connected from any entity to the literature nobel winners, and looking for the property doctoral advisor.

In [21]:
queryString = """
select distinct ?p ?pName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 .
    
    ?author ?p ?o .
        
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "doctor"))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P184'), ('pName', 'doctoral advisor')]
[('p', 'http://www.wikidata.org/prop/direct/P185'), ('pName', 'doctoral student')]
[('p', 'http://www.wikidata.org/prop/direct/P5459'), ('pName', 'RHE doctor ID')]


3

So, a person has a PhD if there is a property *doctoral advisor (**P184**)* and *doctoral student (**P185**)* from the person to any entity. I can consider both paths (phd to its advisor, advisor to phd) The BGP is the following one:

```
?author ^wdt:P185|wdt:P184 ?o .
```

And so, I can answer the question.

In [39]:
queryString = """
select count(distinct ?author) as ?numLaureatesWithPhD where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            ^wdt:P185|wdt:P184 ?o .
}
"""

print("Results")
run_query(queryString)

Results
[('numLaureatesWithPhD', '2')]


1

The number of Literature Nobel laureates with a PhD is 2. We can also list this persons, which are:

In [41]:
queryString = """
select distinct ?author ?authorName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            ^wdt:P185|wdt:P184 ?o .
    
    ?author <http://schema.org/name> ?authorName .
}
"""

print("Results")
run_query(queryString)

Results
[('author', 'http://www.wikidata.org/entity/Q33760'), ('authorName', 'Bertrand Russell')]
[('author', 'http://www.wikidata.org/entity/Q47695'), ('authorName', 'Rudolf Christoph Eucken')]


2

Only, for curiosity and comparison, I can also count the total number of nobel laureates with a PhD.

In [42]:
queryString = """
select count(distinct ?author) as ?numLaureatesWithPhD where {
    ?author ^wdt:P1346|wdt:P166 ?award ;
            ^wdt:P185|wdt:P184 ?o .
    
    ?award wdt:P279 wd:Q7191 .
}
"""

print("Results")
run_query(queryString)

Results
[('numLaureatesWithPhD', '428')]


1

That is about the 44% of total nobel laureates. I can finally also see which are the categories with the high number of laureates with a PhD.

In [43]:
queryString = """
select ?awardName count(distinct ?author) as ?numLaureatesWithPhD where {
    ?author ^wdt:P1346|wdt:P166 ?award ;
            ^wdt:P185|wdt:P184 ?o .
    
    ?award wdt:P279 wd:Q7191 .
    
    ?award <http://schema.org/name> ?awardName .
} group by ?award ?awardName
"""

print("Results")
run_query(queryString)

Results
[('awardName', 'Nobel Prize in Literature'), ('numLaureatesWithPhD', '2')]
[('awardName', 'Nobel Peace Prize'), ('numLaureatesWithPhD', '8')]
[('awardName', 'Nobel Prize in Physiology or Medicine'), ('numLaureatesWithPhD', '73')]
[('awardName', 'Nobel Prize in Physics'), ('numLaureatesWithPhD', '153')]
[('awardName', 'Nobel Prize in Chemistry'), ('numLaureatesWithPhD', '116')]
[('awardName', 'Prize in Economic Sciences in Memory of Alfred Nobel'), ('numLaureatesWithPhD', '78')]


6

And I discover that Physics and Chemistry are the categories with the highest number of laureates with PhD.

## 5. Are there books from Literature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatican Library)?

First of all, I need to find a property that link nobel winners, and more in general persons, with books

In [26]:
queryString = """
select distinct ?p ?pName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            ?p ?o .
        
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "author", "i"))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1153'), ('pName', 'Scopus author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1233'), ('pName', 'Internet Speculative Fiction Database author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1556'), ('pName', 'zbMATH author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1580'), ('pName', 'University of Barcelona authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1607'), ('pName', 'Dialnet author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1670'), ('pName', 'Canadiana Authorities ID (former scheme)')]
[('p', 'http://www.wikidata.org/prop/direct/P1899'), ('pName', 'LibriVox author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1938'), ('pName', 'Project Gutenberg author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1960'), ('pName', 'Google Scholar author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2188'), ('pName', 'BiblioNet author ID')]
[('p', 'http://www.wikidata.org/prop/direct/P244'), 

87

Probably the property *author (**P50**)* is the one I am looking for, I can check it verifying what are the entities connected with this property.

In [27]:
queryString = """
select distinct ?o ?oName ?subclass ?subclassName ?instance ?instanceName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 .
    ?o wdt:P50 ?author .
    
    optional {
        ?o wdt:P279 ?subclass .
        ?subclass <http://schema.org/name> ?subclassName .
    }
    
    optional {
        ?o wdt:P31 ?instance .
        ?instance <http://schema.org/name> ?instanceName .
    }
        
    ?o <http://schema.org/name> ?oName .
} limit 20
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q7746252'), ('oName', 'The Last of the Light Brigade'), ('subclass', 'http://www.wikidata.org/entity/Q34743'), ('subclassName', 'Rudyard Kipling'), ('instance', 'http://www.wikidata.org/entity/Q5185279'), ('instanceName', 'poem')]
[('o', 'http://www.wikidata.org/entity/Q19724435'), ('oName', 'Total Dictation'), ('subclass', 'http://www.wikidata.org/entity/Q1087138'), ('subclassName', 'dictation'), ('instance', 'http://www.wikidata.org/entity/Q2668072'), ('instanceName', 'collection')]
[('o', 'http://www.wikidata.org/entity/Q19724435'), ('oName', 'Total Dictation'), ('subclass', 'http://www.wikidata.org/entity/Q1087138'), ('subclassName', 'dictation'), ('instance', 'http://www.wikidata.org/entity/Q11483816'), ('instanceName', 'annual event')]
[('o', 'http://www.wikidata.org/entity/Q4657570'), ('oName', 'A Kind of Alaska'), ('instance', 'http://www.wikidata.org/entity/Q1194583'), ('instanceName', 'one-act play')]
[('o', 'http://www.wikidata.

20

The instances of these works are the following one:

In [28]:
queryString = """
select distinct ?instance ?instanceName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 .
    ?o wdt:P50 ?author ;
        wdt:P31 ?instance .
    
    ?instance <http://schema.org/name> ?instanceName .   
}
"""

print("Results")
run_query(queryString)

Results
[('instance', 'http://www.wikidata.org/entity/Q17518557'), ('instanceName', 'collection of articles')]
[('instance', 'http://www.wikidata.org/entity/Q1194583'), ('instanceName', 'one-act play')]
[('instance', 'http://www.wikidata.org/entity/Q25379'), ('instanceName', 'play')]
[('instance', 'http://www.wikidata.org/entity/Q49848'), ('instanceName', 'document')]
[('instance', 'http://www.wikidata.org/entity/Q11483816'), ('instanceName', 'annual event')]
[('instance', 'http://www.wikidata.org/entity/Q254554'), ('instanceName', 'picture book')]
[('instance', 'http://www.wikidata.org/entity/Q732577'), ('instanceName', 'publication')]
[('instance', 'http://www.wikidata.org/entity/Q7553'), ('instanceName', 'translation')]
[('instance', 'http://www.wikidata.org/entity/Q105420'), ('instanceName', 'anthology')]
[('instance', 'http://www.wikidata.org/entity/Q105543609'), ('instanceName', 'musical work/composition')]
[('instance', 'http://www.wikidata.org/entity/Q13442814'), ('instanceName

106

The fact that the entites are instances of books, poem and so on confirm that the property author is the one I am looking for. I can initially try to see if there are some properties that connects the books or poems or works directly to the Vatican Library.

In [29]:
queryString = """
select distinct ?p ?pName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 .
    
    ?book wdt:P50 ?author ;
          ?p wd:Q213678 .
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
Empty


0

There are no results, so maybe there are some other entities in the middle. First, I can see which entities are connected to the Vatican Library and their properties.

In [30]:
queryString = """
select distinct ?o ?oName ?p ?pName {
    ?o ?p wd:Q213678 .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
} limit 100
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q17174857'), ('oName', 'Bibbia di Federico da Montefeltro - BAV Urb.Lat.1&2'), ('p', 'http://www.wikidata.org/prop/direct/P195'), ('pName', 'collection')]
[('o', 'http://www.wikidata.org/entity/Q2754987'), ('oName', 'Leopoldo Cicognara'), ('p', 'http://www.wikidata.org/prop/direct/P9419'), ('pName', 'personal library at')]
[('o', 'http://www.wikidata.org/entity/Q209285'), ('oName', 'Codex Vaticanus'), ('p', 'http://www.wikidata.org/prop/direct/P195'), ('pName', 'collection')]
[('o', 'http://www.wikidata.org/entity/Q7075'), ('oName', 'library'), ('p', 'http://www.wikidata.org/prop/direct/P5869'), ('pName', 'model item')]
[('o', 'http://www.wikidata.org/entity/Q19823475'), ('oName', 'prefect of the Vatican Library'), ('p', 'http://www.wikidata.org/prop/direct/P2389'), ('pName', 'organization directed by the office or position')]
[('o', 'http://www.wikidata.org/entity/Q1806588'), ('oName', 'Lasimos Krater'), ('p', 'http://www.wikidata.org/pro

100

Probably the property *part of (**P361**)* can play a role in found what I am looking for. Maybe books are located in other libraries which are part of the Vatican Library.

In [31]:
queryString = """
select distinct ?p ?pName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 .
    
    ?book wdt:P50 ?author ;
          ?p ?library .
    
    ?library wdt:P361* wd:Q213678 .
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
Empty


0

Also in this case, there are no connections between the books written by nobel laureates and the Vatican Library.

I conclude that in the database there are no works written by Literature Nobel Laureates that are part of the Vatican Library. I can in any case see if there is a property "vatican" in the laureates properties.

In [32]:
queryString = """
select distinct ?p ?pName where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            ?p ?o .
        
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "vatica", "i"))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1017'), ('pName', 'Vatican Library ID (former scheme)')]
[('p', 'http://www.wikidata.org/prop/direct/P8034'), ('pName', 'Vatican Library VcBA ID')]


2

The vatican library ID doesn't provide informations abouts books written, but can give us informatio if the author has written books contained in the Vatican Library. I can use this information to count the number of nobel laureates that has this property.

In [33]:
queryString = """
select (count(distinct ?author) as ?numAuthors) where {
    ?author ^wdt:P1346|wdt:P166 wd:Q37922 ;
            wdt:P1017|wdt:P8034 ?o .
}
"""

print("Results")
run_query(queryString)

Results
[('numAuthors', '75')]


1

The number of authors with a relation with the Vatican Library are 75 on a total of Literature Laureates of 118, so probably 43 laureates never wrote a book contained in the Vatican Library.