# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-e68f39aa8e-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Politics Workflow Series ("Presidents of countries") 

Consider the following exploratory information need:

> You investigating presidents of the republic, or similar roles, across international states around the world

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`     | country       | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wdt:P39`     | position held   | predicate |
| `wd:Q248577`  | President of the republic | node      |
| `wd:Q11696`   | President of U.S.A.      | node      |
| `wd:Q332711`  | President of Italy        | node |
| `wd:Q38`      | Italy          | node |
| `wd:Q30`      | U.S.A.        | node |



Also consider

```
?p wdt:P39/wdt:P279* wd:Q248577  .  
```

is the BGP to retrieve all **presidents of the world countries through history**

## Workload Goals

1. Identify the BGP for obtaining important attributes for people, e.g., date of birth/death, gender, profession 

2. Identify the BGP to retrieve countries with  had at least once had a president

3. When was the first president of each country born? How many presidents each country had?

4. Is there a country that had at some point a woman as a president?

5. Analyze the number of presidents per country through time
 
   5.1 What are the top-5 countries for number of presidents? Which countries had the least?
   
   5.2 For how many presidents we know a date of death in each country?
   
   5.3 Which are the professions of different presidents? How many presidents had a specific profession?


In [2]:
# start your workflow here

## 1. Identify the BGP for obtaining important attributes for people, e.g., date of birth/death, gender, profession

First, I need the entity identifier for people. I can retrieve it from subclasses/instances of presidents. I first need a president, so I can get one from the list of all presidents.

In [3]:
queryString = """
select distinct ?p ?pName {
    ?p wdt:P39/wdt:P279* wd:Q248577 .
    
    ?p <http://schema.org/name> ?pName .
} limit 1
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q14345'), ('pName', 'Roh Moo-hyun')]


1

And I can use **Q14345** to view its instances/subclasses.

In [4]:
queryString = """
select ?p ?pName ?o ?oName where {
    values ?p { wdt:P31 wdt:P279 }
    
    wd:Q14345 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q5'), ('oName', 'human')]


1

So, people are instances of *human (**Q5**)*. I can now retrieve the desired properties.

In [5]:
queryString = """
select distinct ?p ?pName where {
    ?people wdt:P31 wd:Q5 ;
            ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "gender|profession|date", "i") || isNumeric(?o))
}
"""

# print("Results")
# run_query(queryString)

Since there are too many humans, the time required to select all of them and search for properties is too long, so I rely only on presidents entities to look for relevant properties.

In [6]:
queryString = """
select distinct ?p ?pName where {
    ?people wdt:P39/wdt:P279* wd:Q248577 ;
            ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "gender|profes|job|occup|positio|date|birth|death", "i") || isNumeric(?o))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pName', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P1196'), ('pName', 'manner of death')]
[('p', 'http://www.wikidata.org/prop/direct/P1477'), ('pName', 'birth name')]
[('p', 'http://www.wikidata.org/prop/direct/P19'), ('pName', 'place of birth')]
[('p', 'http://www.wikidata.org/prop/direct/P20'), ('pName', 'place of death')]
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pName', 'sex or gender')]
[('p', 'http://www.wikidata.org/prop/direct/P39'), ('pName', 'position held')]
[('p', 'http://www.wikidata.org/prop/direct/P509'), ('pName', 'cause of death')]
[('p', 'http://www.wikidata.org/prop/direct/P569'), ('pName', 'date of birth')]
[('p', 'http://www.wikidata.org/prop/direct/P570'), ('pName', 'date of death')]
[('p', 'http://www.wikidata.org/prop/direct/P1971'), ('pName', 'number of children')]
[('p', 'http://www.wikidata.org/prop/direct/P2048'), ('pName', 'height')]
[('p', 'http://www.wikidata.org/pro

26

Some relevant attributes are *sex or gender (**P21**)*, *date of birth (**P569**)*, *date of death (**P570**)*, *number of children (**P1971**)*, *manner of death (**P1196**)*, *cause of death (**P509**)*, *occupation (**P106**)*

The BGP to get all this attribute is:

```
?person wdt:P31 wd:Q5 .

optional { ?person wdt:P21 ?gender } .
optional { ?person wdt:P106 ?occupation } .
optional { ?person wdt:P569 ?birthDate } .
optional { ?person wdt:P570 ?deathDate } .
optional { ?person wdt:P1196 ?mannerDeath } .
optional { ?person wdt:P509 ?causeDeath } .
optional { ?person wdt:P1971 ?numChildren } .
```

## 2. Identify the BGP to retrieve countries with had at least once had a president

First, I need to understand which are the positions held by a president. I can use the same president used in point 2.

In [7]:
queryString = """
select distinct ?presidentPosition ?presidentPositionName {
    wd:Q14345 wdt:P39 ?presidentPosition .
    
    ?presidentPosition <http://schema.org/name> ?presidentPositionName .
}
"""

print("Results")
run_query(queryString)

Results
[('presidentPosition', 'http://www.wikidata.org/entity/Q14850694'), ('presidentPositionName', 'Member of National Assembly of South Korea')]
[('presidentPosition', 'http://www.wikidata.org/entity/Q6296418'), ('presidentPositionName', 'president of South Korea')]
[('presidentPosition', 'http://www.wikidata.org/entity/Q12592560'), ('presidentPositionName', 'Minister of Oceans and Fisheries of South Korea')]


3

And I can also see the properties releated to the president position role. I can use the properties releated to the *president of South Korea*, *president of USA* and *president of Italy*.

In [8]:
queryString = """
select distinct ?p ?pName {
    values ?presidentPosition { wd:Q6296418 wd:Q11696 wd:Q332711 }
    
    ?presidentPosition ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2389'), ('pName', 'organization directed by the office or position')]
[('p', 'http://www.wikidata.org/prop/direct/P2098'), ('pName', 'substitute/deputy/replacement of office/officeholder')]
[('p', 'http://www.wikidata.org/prop/direct/P1001'), ('pName', 'applies to jurisdiction')]
[('p', 'http://www.wikidata.org/prop/direct/P1225'), ('pName', 'U.S. National Archives Identifier')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1417'), ('pName', 'Encyclopædia Britannica Online ID')]
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows')]
[('p', 'http://www.wikidata.org/prop/direct/P158'), ('pName', 'seal image')]
[('p', 'http://www.wikidata.org/prop/direct/P1617'), ('pName', 'BBC Things ID')]
[('p', 'http://www.wikidata.org/prop/direct/P163'), ('pName', 'flag')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country')]
[

62

In [9]:
queryString = """
select distinct ?p ?pName ?o ?oName {
    values ?presidentPosition { wd:Q332711 }
    
    ?presidentPosition ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . }
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P155'), ('pName', 'follows'), ('o', 'http://www.wikidata.org/entity/Q1220'), ('oName', 'Giorgio Napolitano')]
[('p', 'http://www.wikidata.org/prop/direct/P263'), ('pName', 'official residence'), ('o', 'http://www.wikidata.org/entity/Q223079'), ('oName', 'Quirinal Palace')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q248577'), ('oName', 'President of the Republic')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country'), ('o', 'http://www.wikidata.org/entity/Q38'), ('oName', 'Italy')]
[('p', 'http://www.wikidata.org/prop/direct/P1001'), ('pName', 'applies to jurisdiction'), ('o', 'http://www.wikidata.org/entity/Q38'), ('oName', 'Italy')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q4164871'), ('oName', 'position')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 

19

So, taking as example the president of Italy, it is subclass of *resident of the Republic*, and it has a property country that refers to Italy. With this knowledge, I can connect each president to its country, using the BGP:



In [10]:
queryString = """
select distinct ?countryName ?presidentPositionName (count(?president) as ?numPresidents) {
    ?president wdt:P39 ?presidentPosition .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    
    ?presidentPosition <http://schema.org/name> ?presidentPositionName .
    ?country <http://schema.org/name> ?countryName .
} order by desc(?numPresidents)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Honduras'), ('presidentPositionName', 'President of Honduras'), ('numPresidents', '84')]
[('countryName', 'United States of America'), ('presidentPositionName', 'President of the United States'), ('numPresidents', '64')]
[('countryName', 'Colombia'), ('presidentPositionName', 'President of Colombia'), ('numPresidents', '60')]
[('countryName', 'Mexico'), ('presidentPositionName', 'President of Mexico'), ('numPresidents', '59')]
[('countryName', 'Argentina'), ('presidentPositionName', 'President of Argentina'), ('numPresidents', '59')]
[('countryName', 'Venezuela'), ('presidentPositionName', 'President of Venezuela'), ('numPresidents', '58')]
[('countryName', 'Paraguay'), ('presidentPositionName', 'President of Paraguay'), ('numPresidents', '57')]
[('countryName', 'Haiti'), ('presidentPositionName', 'President of Haiti'), ('numPresidents', '55')]
[('countryName', 'Guatemala'), ('presidentPositionName', 'President of the Republic of Guatemala'), ('numPresidents',

101

To keep the BGP simpler, the second way could be using the nationality of the president, instead of the country of its role. I can do this, only if the two countries (nationality and country) matches, so I first need to check this point.

In [11]:
queryString = """
select distinct ?countryName ?nationalityName (count(?president) as ?numMismatchPresidents) {
    ?president wdt:P39 ?presidentPosition ;
               wdt:P27 ?nationality .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    
    ?country <http://schema.org/name> ?countryName .
    ?nationality <http://schema.org/name> ?nationalityName .
    
    filter (?nationality != ?country)
} order by ?countryName desc(?numMismatchPresidents)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Albania'), ('nationalityName', 'Ottoman Empire'), ('numMismatchPresidents', '2')]
[('countryName', 'Albania'), ('nationalityName', 'Albanian Kingdom'), ('numMismatchPresidents', '1')]
[('countryName', 'Albania'), ('nationalityName', 'Independent Albania'), ('numMismatchPresidents', '1')]
[('countryName', 'Albania'), ('nationalityName', 'Principality of Albania'), ('numMismatchPresidents', '1')]
[('countryName', 'Albania'), ('nationalityName', 'Albanian Republic'), ('numMismatchPresidents', '1')]
[('countryName', 'Angola'), ('nationalityName', 'Portugal'), ('numMismatchPresidents', '1')]
[('countryName', 'Angola'), ('nationalityName', "People's Republic of Angola"), ('numMismatchPresidents', '1')]
[('countryName', 'Argentina'), ('nationalityName', 'Spain'), ('numMismatchPresidents', '2')]
[('countryName', 'Argentina'), ('nationalityName', 'Peru'), ('numMismatchPresidents', '1')]
[('countryName', 'Argentina'), ('nationalityName', 'United Provinces of the Río de 

209

And so I discover that there are some mismatches between nationalities and countries. This happens because of presidents of older ages, where the geopolitical situation was different. An example is Albania, that had a president with nationality of the Ottoman Empire.

So, finally, the BGP to obtain the countries which had at least one president is:

```
?president wdt:P39 ?presidentPosition .
?presidentPosition wdt:P279* wd:Q248577 ;
                   wdt:P17 ?country .
```


## 3. When was the first president of each country born? How many presidents each country had?

To get the first president of each country, I first need to know when a president was elected. I can try to see if in the attributes for presidents there is a property which connect the president to its election.

In [12]:
queryString = """
select distinct ?p ?pName where {
    ?people wdt:P39/wdt:P279* wd:Q248577 ;
            ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    
    filter (regex(?pName, "elect", "i"))
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P3602'), ('pName', 'candidacy in election')]
[('p', 'http://www.wikidata.org/prop/direct/P3429'), ('pName', 'Electronic Enlightenment ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1839'), ('pName', 'US Federal Election Commission ID')]


3

So, probably the property *candidacy in election (**P3602**)* refers to the elections the president has partecipated. I can try to create a path that let me connect a president to its election and then to the year of the elections. I can start looking to the properties of the president of USA, to look for one of the presidents.

In [13]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q11696 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName } .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q2285706'), ('oName', 'head of government')]
[('p', 'http://www.wikidata.org/prop/direct/P748'), ('pName', 'appointed by'), ('o', 'http://www.wikidata.org/entity/Q11701'), ('oName', 'United States House of Representatives')]
[('p', 'http://www.wikidata.org/prop/direct/P511'), ('pName', 'honorific prefix'), ('o', 'http://www.wikidata.org/entity/Q1067610'), ('oName', 'Excellency')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q17279032'), ('oName', 'elective office')]
[('p', 'http://www.wikidata.org/prop/direct/P263'), ('pName', 'official residence'), ('o', 'http://www.wikidata.org/entity/Q35525'), ('oName', 'White House')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q248577'), ('oName', 'President of the Republic')]
[('p', 'h

68

**Q6279** corresponds to the current president Joe Biden, and then I can check what the property *candidacy in election* return me.

In [14]:
queryString = """
select distinct ?election ?electionName where {
    wd:Q6279 wdt:P3602 ?election .
    
    ?election <http://schema.org/name> ?electionName .
}
"""

print("Results")
run_query(queryString)

Results
[('election', 'http://www.wikidata.org/entity/Q22923830'), ('electionName', '2020 United States presidential election')]
[('election', 'http://www.wikidata.org/entity/Q7891450'), ('electionName', '1972 United States Senate election in Delaware')]


2

So, Joe Biden partecipated in two elections. I'm interested in the presidential election of 2020 (**Q22923830**) and I can list its properties.

In [15]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q22923830 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName } .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P726'), ('pName', 'candidate'), ('o', 'http://www.wikidata.org/entity/Q22686'), ('oName', 'Donald Trump')]
[('p', 'http://www.wikidata.org/prop/direct/P991'), ('pName', 'successful candidate'), ('o', 'http://www.wikidata.org/entity/Q22686'), ('oName', 'Donald Trump')]
[('p', 'http://www.wikidata.org/prop/direct/P541'), ('pName', 'office contested'), ('o', 'http://www.wikidata.org/entity/Q11696'), ('oName', 'President of the United States')]
[('p', 'http://www.wikidata.org/prop/direct/P541'), ('pName', 'office contested'), ('o', 'http://www.wikidata.org/entity/Q11699'), ('oName', 'Vice President of the United States')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of America')]
[('p', 'http://www.wikidata.org/prop/direct/P1001'), ('pName', 'applies to jurisdiction'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of A

39

And I can use *successful candidate (**P991**)* to check if the winner correspond to the president, and *point in time (**P585**)* to get the date of the elections. I need also to check if the elction is a president election. The one considered is indeed instance of *United States presidential election*. So I need to check if this entity is an instance of or subclass of a sort of president election.

In [16]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    values ?p { wdt:P31 wdt:P279 }
    
    wd:Q47566 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName } .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q877353'), ('oName', 'indirect election')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q858439'), ('oName', 'presidential election')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q15275719'), ('oName', 'recurring event')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q5452198'), ('oName', 'first-order election')]


4

And so, the USA presidenial election is subclass of *presidential election (**Q858439**)*. I can define a BGP from a president to its election year:

```
?president wdt:P3602 ?election .
?election wdt:P31/wdt:P279* wd:Q858439 ;
          wdt:P991 ?president ;
          wdt:P585 ?dateElection .

bind (year(?dateElection) as ?yearElection) .
```

And then, I can merge all together to answer the question. The following query will result the first president (connected to an election), its birthdate and the year of the election

In [17]:
queryString = """
select distinct ?countryName ?presidentName ?birthDate ?yearElection {
    ?president wdt:P39 ?presidentPosition ;
               wdt:P3602 ?election ;
               wdt:P569 ?birthDateTime .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    ?election wdt:P31/wdt:P279* wd:Q858439 ;
          wdt:P991 ?president ;
          wdt:P585 ?minDateElection .
                       
    {
        select ?country (min(?dateElection) as ?minDateElection) where {
            ?president wdt:P39 ?presidentPosition ;
                       wdt:P3602 ?election .
            ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
            ?election wdt:P31/wdt:P279* wd:Q858439 ;
                      wdt:P991 ?president ;
                      wdt:P585 ?dateElection .
        } group by ?country
    }
    
    bind (year(?minDateElection) as ?yearElection) .
    bind (xsd:date(?birthDateTime) as ?birthDate) .
    
    ?country <http://schema.org/name> ?countryName .
    ?president <http://schema.org/name> ?presidentName .    
} order by ?countryName
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Albania'), ('presidentName', 'Bujar Nishani'), ('birthDate', '1966-09-29Z'), ('yearElection', '2012')]
[('countryName', 'Argentina'), ('presidentName', 'Bernardino Rivadavia'), ('birthDate', '1780-05-20Z'), ('yearElection', '1826')]
[('countryName', 'Armenia'), ('presidentName', 'Levon Ter-Petrosyan'), ('birthDate', '1945-01-09Z'), ('yearElection', '1991')]
[('countryName', 'Austria'), ('presidentName', 'Adolf Schärf'), ('birthDate', '1890-04-20Z'), ('yearElection', '1963')]
[('countryName', 'Azerbaijan'), ('presidentName', 'Abulfaz Elchibey'), ('birthDate', '1938-06-24Z'), ('yearElection', '1992')]
[('countryName', 'Benin'), ('presidentName', 'Thomas Boni Yayi'), ('birthDate', '1952-07-01Z'), ('yearElection', '2006')]
[('countryName', 'Bosnia and Herzegovina'), ('presidentName', 'Dragan Čavić'), ('birthDate', '1958-03-10Z'), ('yearElection', '2002')]
[('countryName', 'Brazil'), ('presidentName', 'Deodoro da Fonseca'), ('birthDate', '1827-08-05Z'), ('yearElect

86

In order to answer the second question, I can simply use the BGP for obtaining all the presidents, and count them for each country.

In [18]:
queryString = """
select distinct ?country ?countryName (count(distinct ?president) as ?numPresidents) where {
    ?president wdt:P39 ?presidentPosition .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    
    ?country <http://schema.org/name> ?countryName .
} order by desc(?numPresidents)
"""

print("Results")
run_query(queryString)

Results
[('country', 'http://www.wikidata.org/entity/Q783'), ('countryName', 'Honduras'), ('numPresidents', '84')]
[('country', 'http://www.wikidata.org/entity/Q30'), ('countryName', 'United States of America'), ('numPresidents', '64')]
[('country', 'http://www.wikidata.org/entity/Q739'), ('countryName', 'Colombia'), ('numPresidents', '60')]
[('country', 'http://www.wikidata.org/entity/Q414'), ('countryName', 'Argentina'), ('numPresidents', '59')]
[('country', 'http://www.wikidata.org/entity/Q96'), ('countryName', 'Mexico'), ('numPresidents', '59')]
[('country', 'http://www.wikidata.org/entity/Q717'), ('countryName', 'Venezuela'), ('numPresidents', '58')]
[('country', 'http://www.wikidata.org/entity/Q733'), ('countryName', 'Paraguay'), ('numPresidents', '57')]
[('country', 'http://www.wikidata.org/entity/Q790'), ('countryName', 'Haiti'), ('numPresidents', '55')]
[('country', 'http://www.wikidata.org/entity/Q774'), ('countryName', 'Guatemala'), ('numPresidents', '53')]
[('country', 'htt

99

## 5.1. What are the top-5 countries for number of presidents? Which countries had the least?

I can use the same BGP used before, to count the number of presidents, and take only the first five countries

In [19]:
queryString = """
select distinct ?countryName (count(distinct ?president) as ?numPresidents) where {
    ?president wdt:P39 ?presidentPosition .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    
    ?country <http://schema.org/name> ?countryName .
} order by desc(?numPresidents)
limit 5
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Honduras'), ('numPresidents', '84')]
[('countryName', 'United States of America'), ('numPresidents', '64')]
[('countryName', 'Colombia'), ('numPresidents', '60')]
[('countryName', 'Argentina'), ('numPresidents', '59')]
[('countryName', 'Mexico'), ('numPresidents', '59')]


5

And the last five

In [20]:
queryString = """
select distinct ?countryName (count(distinct ?president) as ?numPresidents) where {
    ?president wdt:P39 ?presidentPosition .
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
    
    ?country <http://schema.org/name> ?countryName .
} order by asc(?numPresidents)
limit 5
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Confederate States of America'), ('numPresidents', '1')]
[('countryName', 'Republic of Kosova'), ('numPresidents', '1')]
[('countryName', 'Spanish Empire'), ('numPresidents', '1')]
[('countryName', 'Serbia and Montenegro'), ('numPresidents', '2')]
[('countryName', 'Turkmenistan'), ('numPresidents', '2')]


5

So the top-5 countries for number of presidents are Honduras, with 84 presidents, USA, with 64 presidents, Colombia, Argentina and Mexico.

The least 5 countries are Confederate States of America, Republic of Kosova, Spanish Empire, Serbia and Montenegro and Turkmenistan

## 5.2. For how many presidents we know a date of death in each country?

I can count the number of president for which we know a date of death and group them by country. I can also compare this number with the total number of presidents for that country and obtain a ratio of number of death presidents on total number of presidents.

In [21]:
queryString = """
select distinct ?countryName ?numPresidents ?numDeathPresidents ?deathRatio where {
    {
        select ?country (count(?president) as ?numPresidents) (count(distinct ?deathDate) as ?numDeathPresidents) where {
            ?president wdt:P39 ?presidentPosition .
            ?presidentPosition wdt:P279* wd:Q248577 ;
                               wdt:P17 ?country .
            
            optional { ?president wdt:P570 ?deathDate }
        } group by ?country
    }
    
    bind (?numDeathPresidents * 100. / ?numPresidents as ?deathRatio)
    
    ?country <http://schema.org/name> ?countryName .
} order by desc(?deathRatio)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Republic of China 1912–1949'), ('numPresidents', '3'), ('numDeathPresidents', '3'), ('deathRatio', '100')]
[('countryName', 'Spanish Empire'), ('numPresidents', '1'), ('numDeathPresidents', '1'), ('deathRatio', '100')]
[('countryName', 'Confederate States of America'), ('numPresidents', '1'), ('numDeathPresidents', '1'), ('deathRatio', '100')]
[('countryName', 'Republic of Kosova'), ('numPresidents', '1'), ('numDeathPresidents', '1'), ('deathRatio', '100')]
[('countryName', 'Venezuela'), ('numPresidents', '61'), ('numDeathPresidents', '58'), ('deathRatio', '95.081967213114754')]
[('countryName', 'Cuba'), ('numPresidents', '30'), ('numDeathPresidents', '28'), ('deathRatio', '93.333333333333333')]
[('countryName', 'Syria'), ('numPresidents', '25'), ('numDeathPresidents', '23'), ('deathRatio', '92')]
[('countryName', 'Chile'), ('numPresidents', '37'), ('numDeathPresidents', '34'), ('deathRatio', '91.891891891891892')]
[('countryName', 'Colombia'), ('numPresidents

99

I can also list what is the most frequent manner/cause of death of the presidents for each country

In [22]:
queryString = """
select distinct ?countryName (group_concat(distinct ?mannerDeathName; separator=", ") as ?mannerDeaths) (max(?numDeaths) as ?numDeaths) {               
    {
        select ?country (max(?numDeaths) as ?moreFrequentMannerDeath) where {
            {
                select ?country ?mannerDeath (count(distinct ?president) as ?numDeaths) where {
                    ?president wdt:P39 ?presidentPosition ;
                               wdt:P1196|wdt:P509 ?mannerDeath .
                    ?presidentPosition wdt:P279* wd:Q248577 ;
                               wdt:P17 ?country .
                } group by ?country ?mannerDeath
            }
        }
    } .
    {
        select ?country ?mannerDeath (count(distinct ?president) as ?numDeaths) where {
            ?president wdt:P39 ?presidentPosition ;
                       wdt:P1196|wdt:P509 ?mannerDeath .
            ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
        } group by ?country ?mannerDeath
    }
    
    filter (?numDeaths = ?moreFrequentMannerDeath)
        
    ?country <http://schema.org/name> ?countryName .
    ?mannerDeath <http://schema.org/name> ?mannerDeathName .    
} order by ?countryName
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Albania'), ('mannerDeaths', 'disease, lung disease, myocardial infarction, natural causes, suicide'), ('numDeaths', '1')]
[('countryName', 'Angola'), ('mannerDeaths', 'surgical complications'), ('numDeaths', '1')]
[('countryName', 'Argentina'), ('mannerDeaths', 'natural causes'), ('numDeaths', '15')]
[('countryName', 'Austria'), ('mannerDeaths', 'natural causes'), ('numDeaths', '5')]
[('countryName', 'Azerbaijan'), ('mannerDeaths', 'cancer, kidney failure, natural causes, unnatural death'), ('numDeaths', '1')]
[('countryName', 'Bosnia and Herzegovina'), ('mannerDeaths', 'myocardial infarction, natural causes'), ('numDeaths', '1')]
[('countryName', 'Botswana'), ('mannerDeaths', 'natural causes, pancreatic cancer, surgical complications'), ('numDeaths', '1')]
[('countryName', 'Brazil'), ('mannerDeaths', 'natural causes'), ('numDeaths', '4')]
[('countryName', 'Burkina Faso'), ('mannerDeaths', 'homicide'), ('numDeaths', '1')]
[('countryName', 'Burundi'), ('mannerD

80

## 5.3. Which are the professions of different presidents? How many presidents had a specific profession?

In order to answer both question at same time, I can list all the occupations of the presidents, and count how many presidents had those professions.

In [23]:
queryString = """
select ?occupation ?occupationName (count(distinct ?president) as ?numPresidents) where {
    ?president wdt:P39/wdt:P279* wd:Q248577 ;
               wdt:P106 ?occupation .

    ?occupation <http://schema.org/name> ?occupationName .
} order by desc(?numPresidents)
"""

print("Results")
run_query(queryString)

Results
[('occupation', 'http://www.wikidata.org/entity/Q82955'), ('occupationName', 'politician'), ('numPresidents', '1544')]
[('occupation', 'http://www.wikidata.org/entity/Q40348'), ('occupationName', 'lawyer'), ('numPresidents', '353')]
[('occupation', 'http://www.wikidata.org/entity/Q193391'), ('occupationName', 'diplomat'), ('numPresidents', '211')]
[('occupation', 'http://www.wikidata.org/entity/Q47064'), ('occupationName', 'military personnel'), ('numPresidents', '200')]
[('occupation', 'http://www.wikidata.org/entity/Q1930187'), ('occupationName', 'journalist'), ('numPresidents', '83')]
[('occupation', 'http://www.wikidata.org/entity/Q36180'), ('occupationName', 'writer'), ('numPresidents', '76')]
[('occupation', 'http://www.wikidata.org/entity/Q188094'), ('occupationName', 'economist'), ('numPresidents', '75')]
[('occupation', 'http://www.wikidata.org/entity/Q1622272'), ('occupationName', 'university teacher'), ('numPresidents', '71')]
[('occupation', 'http://www.wikidata.org

210

There are some strange occupations for a president, such as slaveholder, DJ, comedian, pornographic actor, serial rapist or superhero. It can be interesting see which presidents has these occupations, with others information to know the context.

In [24]:
queryString = """
select distinct ?president ?presidentName ?occupationName ?countryName ?birthDate (group_concat(distinct ?instanceName; separator=", ") as ?instances) where {
    values ?strangeOccupation { wd:Q130857 wd:Q26267537 wd:Q188784 wd:Q488111 wd:Q245068 wd:Q10076267 }
    
    ?president wdt:P39 ?presidentPosition ;
               wdt:P31 ?instance;
               wdt:P106 ?strangeOccupation .
               
    optional { ?president wdt:P569 ?birthDate } .
    
    ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
                   
    ?country <http://schema.org/name> ?countryName .
    ?instance <http://schema.org/name> ?instanceName .
    ?strangeOccupation <http://schema.org/name> ?occupationName .
    ?president <http://schema.org/name> ?presidentName .
} order by ?presidentName
"""

print("Results")
run_query(queryString)

Results
[('president', 'http://www.wikidata.org/entity/Q8612'), ('presidentName', 'Andrew Johnson'), ('occupationName', 'slaveholder'), ('countryName', 'United States of America'), ('birthDate', '1808-12-29T00:00:00Z'), ('instances', 'human')]
[('president', 'http://www.wikidata.org/entity/Q57363'), ('presidentName', 'Andry Rajoelina'), ('occupationName', 'disc jockey'), ('countryName', 'Madagascar'), ('birthDate', '1974-05-30T00:00:00Z'), ('instances', 'human')]
[('president', 'http://www.wikidata.org/entity/Q11891'), ('presidentName', 'James K. Polk'), ('occupationName', 'slaveholder'), ('countryName', 'United States of America'), ('birthDate', '1795-11-02T00:00:00Z'), ('instances', 'human')]
[('president', 'http://www.wikidata.org/entity/Q162269'), ('presidentName', 'Jefferson Davis'), ('occupationName', 'slaveholder'), ('countryName', 'Confederate States of America'), ('birthDate', '1808-06-03T00:00:00Z'), ('instances', 'human')]
[('president', 'http://www.wikidata.org/entity/Q1799

8

So the slaveholders, for example, are presidents of USA in 19th century, and this can be explain their "occupation". Also for the superhero, it is a fictional character.

I can also list which is the most frequent profession (escluding "politician") for each country.

In [25]:
queryString = """
select distinct ?countryName (group_concat(?occupationName; separator=", ") as ?occupations) (max(?numPresidents) as ?numPresidents) {               
    {
        select ?country (max(?numPresidents) as ?moreFrequentOccupation) where {
            {
                select ?country ?occupation (count(distinct ?president) as ?numPresidents) where {
                    ?president wdt:P39 ?presidentPosition ;
                               wdt:P106 ?occupation .
                    ?presidentPosition wdt:P279* wd:Q248577 ;
                               wdt:P17 ?country .
                    
                    filter (?occupation != wd:Q82955)
                } group by ?country ?occupation
            }
        }
    } .
    {
        select ?country ?occupation (count(distinct ?president) as ?numPresidents) where {
            ?president wdt:P39 ?presidentPosition ;
                       wdt:P106 ?occupation .
            ?presidentPosition wdt:P279* wd:Q248577 ;
                       wdt:P17 ?country .
            
            filter (?occupation != wd:Q82955)
        } group by ?country ?occupation
    }
    
    filter (?numPresidents = ?moreFrequentOccupation)
        
    ?country <http://schema.org/name> ?countryName .
    ?occupation <http://schema.org/name> ?occupationName .    
} order by ?countryName ?numPresidents
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Albania'), ('occupations', 'journalist'), ('numPresidents', '3')]
[('countryName', 'Angola'), ('occupations', 'civil engineer, writer, military personnel, poet, physician writer, partisan'), ('numPresidents', '1')]
[('countryName', 'Armenia'), ('occupations', 'entrepreneur, university teacher, historian, military personnel, scientist'), ('numPresidents', '1')]
[('countryName', 'Azerbaijan'), ('occupations', 'diplomat'), ('numPresidents', '1')]
[('countryName', 'Benin'), ('occupations', 'economist, banker'), ('numPresidents', '2')]
[('countryName', 'Botswana'), ('occupations', 'judge, journalist, writer, teacher, businessperson, nurse'), ('numPresidents', '1')]
[('countryName', 'Burundi'), ('occupations', 'military personnel'), ('numPresidents', '2')]
[('countryName', 'Central African Republic'), ('occupations', 'criminal, university teacher, military officer, diplomat, sovereign, lawyer'), ('numPresidents', '1')]
[('countryName', 'Comoros'), ('occupations', 'j

54

## (Bonus) Comparison of gender of presidents for countries

It could also be interesting to list and count the gender of the presidents for each country. First, I need to obtain the entity codes and the list of available genders.

In [26]:
queryString = """
select ?gender ?genderName (count(distinct ?president) as ?numPresidents) where {    
    ?president wdt:P39 ?presidentPosition ;
               wdt:P21 ?gender .

    ?presidentPosition wdt:P279* wd:Q248577 .    
             
    ?gender <http://schema.org/name> ?genderName .
}
"""

print("Results")
run_query(queryString)

Results
[('gender', 'http://www.wikidata.org/entity/Q6581097'), ('genderName', 'male'), ('numPresidents', '1551')]
[('gender', 'http://www.wikidata.org/entity/Q1052281'), ('genderName', 'transgender female'), ('numPresidents', '1')]
[('gender', 'http://www.wikidata.org/entity/Q6581072'), ('genderName', 'female'), ('numPresidents', '49')]


3

And so I can count how many females and males have been presidents for each country. I can also compute a ratio female/male to see what are the more balanced countries. In particular, a can assign a score of parity, using the following formula: `-4 * ratio^2 + 4 * ratio`

In [30]:
queryString = """
select ?countryName ?numMales ?numFemales ?ratio ?parityScore {
    {
        select ?country (count(distinct ?president) as ?numMales) where {
            ?president wdt:P39 ?presidentPosition ;
                       wdt:P21 wd:Q6581097 .

            ?presidentPosition wdt:P279* wd:Q248577 ;
                               wdt:P17 ?country .
        } group by ?country
    } .
    optional {
        select ?country (count(distinct ?president) as ?numFemales) where {
            ?president wdt:P39 ?presidentPosition ;
                       wdt:P21 wd:Q6581072 .

            ?presidentPosition wdt:P279* wd:Q248577 ;
                               wdt:P17 ?country . 
        } group by ?country
    } .
    
    bind (?numMales + ?numFemales as ?totalPresidents) .
    bind (?numFemales * 1. / ?totalPresidents as ?ratio) .
    bind ((?ratio * ?ratio * (-4.0) + 4.0 * ?ratio) * 100 as ?parityScore) .
                   
    ?country <http://schema.org/name> ?countryName .
} order by desc(?parityScore)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'Georgia'), ('numMales', '4'), ('numFemales', '2'), ('ratio', '0.333333333333333'), ('parityScore', '88.888888888888889')]
[('countryName', 'Nepal'), ('numMales', '2'), ('numFemales', '1'), ('ratio', '0.333333333333333'), ('parityScore', '88.888888888888889')]
[('countryName', 'Kosovo'), ('numMales', '5'), ('numFemales', '2'), ('ratio', '0.285714285714286'), ('parityScore', '81.63265306122449')]
[('countryName', 'Bosnia and Herzegovina'), ('numMales', '14'), ('numFemales', '3'), ('ratio', '0.176470588235294'), ('parityScore', '58.131487889273356')]
[('countryName', 'Iceland'), ('numMales', '5'), ('numFemales', '1'), ('ratio', '0.166666666666667'), ('parityScore', '55.555555555555556')]
[('countryName', 'Estonia'), ('numMales', '5'), ('numFemales', '1'), ('ratio', '0.166666666666667'), ('parityScore', '55.555555555555556')]
[('countryName', 'Croatia'), ('numMales', '10'), ('numFemales', '2'), ('ratio', '0.166666666666667'), ('parityScore', '55.555555555555556')]

99