# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-e1cf476009-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Politics Workflow Series ("Politicians in E.U.") 

Consider the following exploratory information need:

> You investigating the careers of politicians in the E.U.

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`     | country       | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wdt:P39`     | position held   | predicate |
| `wdt:P106`    | profession   | predicate |
| `wd:Q82955`   | politician | node      |
| `wd:Q46`      | Europe        | node |
| `wd:Q38`      | Italy          | node |



Also consider

```
?p wdt:P106/wdt:P279* wd:Q82955  . 
```

is the BGP to retrieve all **politicians**

## Workload Goals


1. Identify the BGP to retrieve E.U. countries and their politicians

2. Identify the BGP for obtaining other occupations and properties of politicians

3. How many politicians are recorder for each E.U. country?

4. Are there politicians with double citizenship?

5. Analyze the number of politicians in each country by occupation, for instance
 
   5.1 What are the top-3 occupations for a politician in Italy and France?
   
   5.2 What if you consider only politicians for which we don't have a date of death?
   
   5.3 Which politicians had a spouse that was also a politician? How many in each country?


In [3]:
# start your workflow here

In [5]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P106/wdt:P279 wd:Q82955  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '15556')]


1

## Task 1
1. Identify the BGP to retrieve E.U. countries and their politicians

In [24]:
# Let's see what happens if we use only the property "profession" (wdt:P106) to identify a politian
queryString = """
SELECT ?polname ?oname
WHERE { 

?pol wdt:P106 ?obj .

FILTER(?obj = wd:Q82955).

?pol <http://schema.org/name> ?polname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?polname)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('polname', '"Big" Donnie MacLeod'), ('oname', 'politician')]
[('polname', "'Abd Allah II ibn 'Ali 'Abd ash-Shakur"), ('oname', 'politician')]
[('polname', "'Abd al-'Aziz ibn Marwan"), ('oname', 'politician')]
[('polname', "'Abd al-Ilah"), ('oname', 'politician')]
[('polname', "'Abd al-Razzaq al-Hasani"), ('oname', 'politician')]
[('polname', "'Abdallah ibn Ishaq ibn Ibrahim"), ('oname', 'politician')]
[('polname', "'Alisi Afeaki Taumoepeau"), ('oname', 'politician')]
[('polname', "'Amr ibn Luhayy"), ('oname', 'politician')]
[('polname', "'Aziz 'Ali al-Misri"), ('oname', 'politician')]
[('polname', "'General' Elijah Combs"), ('oname', 'politician')]


10

In [23]:
# Let's see what is counted in the example query
queryString = """
SELECT ?polname ?oname
WHERE { 

?pol wdt:P106/wdt:P279 ?obj .

FILTER(?obj = wd:Q82955).

?pol <http://schema.org/name> ?polname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?polname)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('polname', "'Amr ibn al-'As"), ('oname', 'politician')]
[('polname', "'Apepi"), ('oname', 'politician')]
[('polname', '...ka I'), ('oname', 'politician')]
[('polname', '...ka II'), ('oname', 'politician')]
[('polname', '...ka Nebnanati'), ('oname', 'politician')]
[('polname', '...ren Hepu'), ('oname', 'politician')]
[('polname', '...sa...'), ('oname', 'politician')]
[('polname', '...webenra'), ('oname', 'politician')]
[('polname', 'A Foster'), ('oname', 'politician')]
[('polname', 'A G Phillips'), ('oname', 'politician')]


10

#### Comment 
The BGP to retrieve politians is different from the one usede in the example of query so I try to see what are the differences in using one or another

QUERY -> ?p wdt:P106/wdt:P279 wd:Q82955  .  
BGP -> ?p wdt:P106/wdt:P279* wd:Q82955  .

There is an asterisk more in the BGP expression 

In [22]:
# Let's see if using the given BGP the results are different 
queryString = """
SELECT ?polname ?oname
WHERE { 

?pol wdt:P106/wdt:P279* ?obj .

FILTER(?obj = wd:Q82955).

?pol <http://schema.org/name> ?polname .
?obj <http://schema.org/name> ?oname .
}
ORDER BY ASC (?polname)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('polname', '"Big" Donnie MacLeod'), ('oname', 'politician')]
[('polname', "'Abd Allah II ibn 'Ali 'Abd ash-Shakur"), ('oname', 'politician')]
[('polname', "'Abd Allāh ibn Muhammad ibn Ġānīya"), ('oname', 'politician')]
[('polname', "'Abd al-'Aziz ibn Marwan"), ('oname', 'politician')]
[('polname', "'Abd al-Ilah"), ('oname', 'politician')]
[('polname', "'Abd al-Razzaq al-Hasani"), ('oname', 'politician')]
[('polname', "'Abdallah ibn Ishaq ibn Ibrahim"), ('oname', 'politician')]
[('polname', "'Abdawayh ibn Jabalah"), ('oname', 'politician')]
[('polname', "'Abdun ibn Muhammad"), ('oname', 'politician')]
[('polname', "'Adud al-Dawla"), ('oname', 'politician')]


10

#### Comment
During a brief internet search I have discovered that "Big" Donnie MacLeod is actually a politician and so probably this last query is the one to be used. 

So let's look for which relation passes between a politician and the country which he/she works for and even the belonging of a country to the E.U.

In [36]:
#Let's check the belonging of a country to the E.U. tring with Italy
queryString = """
SELECT ?pname ?oname ?p ?obj
WHERE { 

wd:Q38 ?p ?obj .

?p <http://schema.org/name> ?pname .
OPTIONAL { ?obj <http://schema.org/name> ?oname .} .
}
ORDER BY ASC (?pname)
"""

print("Results")
run_query(queryString)

Results
[('pname', 'AGROVOC ID'), ('p', 'http://www.wikidata.org/prop/direct/P8061'), ('obj', 'c_4026')]
[('pname', 'ASC Leiden Thesaurus ID'), ('p', 'http://www.wikidata.org/prop/direct/P5198'), ('obj', '294918175')]
[('pname', 'Academy Awards Database nominee ID'), ('p', 'http://www.wikidata.org/prop/direct/P6150'), ('obj', '7512')]
[('pname', 'All the Tropes identifier'), ('p', 'http://www.wikidata.org/prop/direct/P8895'), ('obj', 'Italy')]
[('pname', 'Analysis & Policy Observatory term ID'), ('p', 'http://www.wikidata.org/prop/direct/P7870'), ('obj', '10099')]
[('pname', 'AniDB tag ID'), ('p', 'http://www.wikidata.org/prop/direct/P8785'), ('obj', '486')]
[('pname', 'Armeniapedia ID'), ('p', 'http://www.wikidata.org/prop/direct/P9629'), ('obj', '2924')]
[('pname', 'BBC News topic ID'), ('p', 'http://www.wikidata.org/prop/direct/P6200'), ('obj', 'crr7mlg0d2wt')]
[('pname', 'BBC Things ID'), ('p', 'http://www.wikidata.org/prop/direct/P1617'), ('obj', '0021de37-b64a-46ac-a4bb-5bdbdf090

495

#### Comment
We have discovered: 
    
| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P463`   | member of   | predicate |
| `wd:Q458`     | European Union   | node |

European Union = E.U.

In [52]:
#Let's try to find the relations between a politiciaan and a country using as an example the country "Italy" 
queryString = """
SELECT DISTINCT ?pname ?p
WHERE { 

?pol wdt:P106/wdt:P279* wd:Q82955 ;
     ?p wd:Q38 .

?p <http://schema.org/name> ?pname .
?pol <http://schema.org/name> ?sname .
}
"""

print("Results")
run_query(queryString)

Results
The operation failed EndPointInternalError: endpoint returned code 500 and response. 

Response:
b'Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory.  use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool\n\nSPARQL query:\ndefine sql:big-data-const 0\n#output-format:application/sparql-results+json\n\n##-e1cf476009-##\nPREFIX wd: <http://www.wikidata.org/entity/> \nPREFIX wdt: <http://www.wikidata.org/prop/direct/> \nPREFIX sc: <http://schema.org/>\n\n\nSELECT DISTINCT ?pname\nWHERE { \n\n?pol wdt:P106/wdt:P279* wd:Q82955 ;\n     ?p wd:Q38 .\n\n?p <http://schema.org/name> ?pname .\n?pol <http://schema.org/name> ?sname .\n}\n'


#### Comment
It seesm that with the asterisk the search is too onerous, so let's try something else

In [54]:
#Let's try to restrict the usage of "subclass of"property only between 0 and 2 times
queryString = """
SELECT DISTINCT ?pname ?p
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p wd:Q38 .

?p <http://schema.org/name> ?pname .
?pol <http://schema.org/name> ?sname .
}
"""

print("Results")
run_query(queryString)

Results
[('pname', 'country for sport'), ('p', 'http://www.wikidata.org/prop/direct/P1532')]
[('pname', 'headquarters location'), ('p', 'http://www.wikidata.org/prop/direct/P159')]
[('pname', 'place of birth'), ('p', 'http://www.wikidata.org/prop/direct/P19')]
[('pname', 'ethnic group'), ('p', 'http://www.wikidata.org/prop/direct/P172')]
[('pname', 'country of citizenship'), ('p', 'http://www.wikidata.org/prop/direct/P27')]
[('pname', 'place of death'), ('p', 'http://www.wikidata.org/prop/direct/P20')]
[('pname', 'place of detention'), ('p', 'http://www.wikidata.org/prop/direct/P2632')]
[('pname', 'residence'), ('p', 'http://www.wikidata.org/prop/direct/P551')]
[('pname', 'ancestral home'), ('p', 'http://www.wikidata.org/prop/direct/P66')]
[('pname', 'work location'), ('p', 'http://www.wikidata.org/prop/direct/P937')]
[('pname', 'allegiance'), ('p', 'http://www.wikidata.org/prop/direct/P945')]
[('pname', 'domain of saint or deity'), ('p', 'http://www.wikidata.org/prop/direct/P2925')]


12

#### Comment 
These could be potential properties that connects a politician with it's working country

In [55]:
#Let's try to print the distinct politician properies
queryString = """
SELECT DISTINCT ?pname ?p
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?obj .

?p <http://schema.org/name> ?pname .
?pol <http://schema.org/name> ?sname .
OPTIONAL { ?obj <http://schema.org/name> ?oname . } .

}
"""

print("Results")
run_query(queryString)

Results
[('pname', 'stepparent'), ('p', 'http://www.wikidata.org/prop/direct/P3448')]
[('pname', 'Loop ID'), ('p', 'http://www.wikidata.org/prop/direct/P2798')]
[('pname', 'Nationale Thesaurus voor Auteurs ID'), ('p', 'http://www.wikidata.org/prop/direct/P1006')]
[('pname', 'field of work'), ('p', 'http://www.wikidata.org/prop/direct/P101')]
[('pname', 'member of political party'), ('p', 'http://www.wikidata.org/prop/direct/P102')]
[('pname', 'native language'), ('p', 'http://www.wikidata.org/prop/direct/P103')]
[('pname', 'relative'), ('p', 'http://www.wikidata.org/prop/direct/P1038')]
[('pname', 'ResearcherID'), ('p', 'http://www.wikidata.org/prop/direct/P1053')]
[('pname', 'occupation'), ('p', 'http://www.wikidata.org/prop/direct/P106')]
[('pname', 'employer'), ('p', 'http://www.wikidata.org/prop/direct/P108')]
[('pname', 'from narrative universe'), ('p', 'http://www.wikidata.org/prop/direct/P1080')]
[('pname', 'signature'), ('p', 'http://www.wikidata.org/prop/direct/P109')]
[('pnam

2770

#### Comment
Some properties that can highlight the country for which a politician works are:
    
| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P27`   | citizenship   | predicate |
| `wdt:P39`     | position held   | predicate |
| `wdt:P937`     | work location   | predicate |
| `wdt:P945`     | allegiance   | predicate |

Let's try to find the path from position held 

In [59]:
#Let's print some psoition held nodes
queryString = """
SELECT DISTINCT ?sname ?oname ?obj
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?obj .

#?p <http://schema.org/name> ?pname .
?pol <http://schema.org/name> ?sname .
OPTIONAL { ?obj <http://schema.org/name> ?oname . } .

}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Emperor Taizu of Song'), ('oname', 'Emperor'), ('obj', 'http://www.wikidata.org/entity/Q39018')]
[('sname', 'Emperor Taizu of Song'), ('oname', 'Emperor of China'), ('obj', 'http://www.wikidata.org/entity/Q268218')]
[('sname', 'Patrick Buisson'), ('oname', 'vice president'), ('obj', 'http://www.wikidata.org/entity/Q42178')]
[('sname', 'Patrick Buisson'), ('oname', 'general director'), ('obj', 'http://www.wikidata.org/entity/Q1048210')]
[('sname', 'Mahapadma Nanda'), ('oname', 'Emperor'), ('obj', 'http://www.wikidata.org/entity/Q39018')]
[('sname', 'Emperor Duzong of Song'), ('oname', 'crown prince'), ('obj', 'http://www.wikidata.org/entity/Q207293')]
[('sname', 'Emperor Duzong of Song'), ('oname', 'Emperor of China'), ('obj', 'http://www.wikidata.org/entity/Q268218')]
[('sname', 'Al-Mujahid Shirkuh'), ('oname', 'emir'), ('obj', 'http://www.wikidata.org/entity/Q166382')]
[('sname', 'Bahramshah'), ('oname', 'emir'), ('obj', 'http://www.wikidata.org/entity/Q166382')]
[

50

#### Comment
Let's try to see if some of this position held nodes have as outgoing relation "country" (wdt:P17) that connect them with the country in which they held that position 

In [63]:
#Let's try what I saied in the previous comment tring to extract the Italian politicians
queryString = """
SELECT DISTINCT ?sname ?oname
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?obj .
?obj wdt:P17 wd:Q38 .

#?cou <http://schema.org/name> ?cname .
?pol <http://schema.org/name> ?sname .
?obj <http://schema.org/name> ?oname .

}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('sname', 'Henry II, Holy Roman Emperor'), ('oname', 'monarch of Italy')]
[('sname', 'Frederick III, Holy Roman Emperor'), ('oname', 'monarch of Italy')]
[('sname', 'James of Baux'), ('oname', 'Prince of Tarente')]
[('sname', 'Francesco De Pasquale'), ('oname', 'mayor of Carrara')]
[('sname', 'Mario Martone'), ('oname', 'president of the Province of Potenza')]
[('sname', 'Salvatore Chiantia'), ('oname', 'mayor of a place in Italy')]
[('sname', 'Giovanni Bernardini'), ('oname', 'mayor of a place in Italy')]
[('sname', 'Rosa Stanisci'), ('oname', 'member of the Italian Senate')]
[('sname', 'Rosa Stanisci'), ('oname', 'member of the Chamber of Deputies of the Italian Republic')]
[('sname', 'Michelangelo Giusta'), ('oname', 'mayor of a place in Italy')]
[('sname', 'Ludovico Sforza'), ('oname', 'duke of Milan')]
[('sname', 'Filippo Maria Visconti'), ('oname', 'duke of Milan')]
[('sname', 'Mario Beccaria'), ('oname', 'member of the Chamber of Deputies of the Italian Republic')]
[('s

50

#### Comment 
And so the BGP to retrieve for each E.U. country its politicians is:

```    
?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?cou wdt:P463 wd:Q458 .
```

In [67]:
# Let's see if it works in an example query (result list limited to 25) 
queryString = """
SELECT DISTINCT ?country ?sname ?posname
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?cou wdt:P463 wd:Q458 .

?cou <http://schema.org/name> ?country .
?pol <http://schema.org/name> ?sname .
?pos <http://schema.org/name> ?posname .

}
ORDER BY ASC (?country)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('country', 'Austria'), ('sname', 'Gernot Rumpold'), ('posname', 'member of the Austrian federal council')]
[('country', 'Austria'), ('sname', 'Sebastian Hutstocker'), ('posname', 'mayor of Vienna')]
[('country', 'Austria'), ('sname', 'Josef Georg Hörl'), ('posname', 'mayor of Vienna')]
[('country', 'Austria'), ('sname', 'Gustav Zeller'), ('posname', 'mayor of Salzburg')]
[('country', 'Austria'), ('sname', 'Karl Bregartner'), ('posname', 'member of the National Council of Austria')]
[('country', 'Austria'), ('sname', 'Gerald Hauser'), ('posname', 'member of the National Council of Austria')]
[('country', 'Austria'), ('sname', 'Alois Spängler'), ('posname', 'mayor of Salzburg')]
[('country', 'Austria'), ('sname', 'Johann Singer'), ('posname', 'member of the National Council of Austria')]
[('country', 'Austria'), ('sname', 'Sigmund Haffner der Ältere'), ('posname', 'mayor of Salzburg')]
[('country', 'Austria'), ('sname', 'Thomas Drozda'), ('posname', 'member of the National Coun

25

#### Comment 
Ok the BGP retrieves the information requested in the task

## Task 2
2. Identify the BGP for obtaining other occupations and properties of politicians

In [76]:
#Let's print for every politicians his/her other occupation and properties
queryString = """
SELECT DISTINCT ?politician ?property ?oname
WHERE { 

{
?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?obj .

FILTER(?obj != wd:Q82955 && ?p = wdt:P106) .

}UNION{
?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?obj .
     
FILTER(?obj != wd:Q82955) .
}

?p <http://schema.org/name> ?property .
?pol <http://schema.org/name> ?politician .
OPTIONAL { ?obj <http://schema.org/name> ?oname . } .

}
#GROUP BY ?politician ?property
ORDER BY ASC (?politician)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('politician', '"Big" Donnie MacLeod'), ('property', 'member of political party'), ('oname', 'Progressive Conservative Association of Nova Scotia')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'languages spoken, written or signed'), ('oname', 'English')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'place of birth'), ('oname', 'Nova Scotia')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'sex or gender'), ('oname', 'male')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'country of citizenship'), ('oname', 'Canada')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'instance of'), ('oname', 'human')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'family name'), ('oname', 'MacLeod')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'birth name')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'date of birth')]
[('politician', '"Big" Donnie MacLeod'), ('property', 'date of death')]
[('politician', '"Big" Donnie MacLeod')

25

#### Comment
The BGP to find other occupations and properties of a politician is:

```
{
?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?obj .

FILTER(?obj != wd:Q82955 && ?p = wdt:P106) .

}UNION{
?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?obj .
     
FILTER(?obj != wd:Q82955) .
}
```
I'm using the UNION only for better point out the to part of the question in the solution of the task, the same thing can be done also without using this construct.

As we can see from the results we retrieve the other properties and the other occupation different from politician.

EXAMPLE:

[('politician', "'Abd al-Ilah"), ('property', 'occupation'), ('oname', 'military personnel')]  
[('politician', "'Abd al-Ilah"), ('property', 'place of burial'), ('oname', 'Al Khaisaran cemetery')]  
[('politician', "'Abd al-Ilah"), ('property', 'manner of death'), ('oname', 'homicide')]  
[('politician', "'Abd al-Ilah"), ('property', 'religion'), ('oname', 'Islam')]  

## Task 3
3. How many politicians are recorder for each E.U. country?

#### Comment
I don't know what do you mean with "how many politicians ARE RECORDER...", I presume that there is a typing error and so the right phrase is "how many politicians ARE RECORDED..."  
For this reason I'm going to solve the task in this second way.

In [64]:
# Let's count for each E.U. country the number of politician in the DB, the count considers both the living ones and the dead ones 
queryString = """
SELECT DISTINCT ?country (COUNT(DISTINCT ?pol) AS ?numpol)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?cou wdt:P463 wd:Q458 .

?cou <http://schema.org/name> ?country .

}
GROUP BY ?country
ORDER BY ASC (?country)
"""

print("Results")
run_query(queryString)

Results
[('country', 'Austria'), ('numpol', '4693')]
[('country', 'Belgium'), ('numpol', '5661')]
[('country', 'Bulgaria'), ('numpol', '549')]
[('country', 'Croatia'), ('numpol', '346')]
[('country', 'Cyprus'), ('numpol', '494')]
[('country', 'Czech Republic'), ('numpol', '1684')]
[('country', 'Denmark'), ('numpol', '329')]
[('country', 'Estonia'), ('numpol', '1050')]
[('country', 'Finland'), ('numpol', '16175')]
[('country', 'France'), ('numpol', '72983')]
[('country', 'Germany'), ('numpol', '16951')]
[('country', 'Greece'), ('numpol', '3814')]
[('country', 'Hungary'), ('numpol', '3137')]
[('country', 'Ireland'), ('numpol', '1981')]
[('country', 'Italy'), ('numpol', '9727')]
[('country', 'Kingdom of the Netherlands'), ('numpol', '3698')]
[('country', 'Latvia'), ('numpol', '740')]
[('country', 'Lithuania'), ('numpol', '787')]
[('country', 'Luxembourg'), ('numpol', '989')]
[('country', 'Malta'), ('numpol', '112')]
[('country', 'Poland'), ('numpol', '2647')]
[('country', 'Portugal'), ('n

28

#### Comment
As you can see I've counted the number of distinct politicians for each E.U. country.  
Even in this case "United Kingdom" is considered inside E.U. but this is no longer the case. 

## Task 4
4. Are there politicians with double citizenship?

In [14]:
#Let's return the number of distinct citizenship for each politician with two or more of them
queryString = """
SELECT ?polname (COUNT(DISTINCT ?c) AS ?numcity)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos ;
     ?city ?c .
?pos wdt:P17 ?cou .

FILTER(?city = wdt:P27)

?pol <http://schema.org/name> ?polname .

}
GROUP BY ?polname
HAVING((COUNT(DISTINCT ?c)) > 1)
ORDER BY ASC (?polname)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('polname', 'A V Swamy'), ('numcity', '2')]
[('polname', 'A. A. Nikonov'), ('numcity', '2')]
[('polname', 'A. C. Shanmughadas'), ('numcity', '3')]
[('polname', 'A. Charles'), ('numcity', '3')]
[('polname', 'A. Chatterjee'), ('numcity', '2')]
[('polname', 'A. Dharam Dass'), ('numcity', '3')]
[('polname', 'A. F. M. Ahsanuddin Chowdhury'), ('numcity', '5')]
[('polname', 'A. G.  Savinykh'), ('numcity', '3')]
[('polname', 'A. G. Kulkarni'), ('numcity', '3')]
[('polname', 'A. G. Subburaman'), ('numcity', '2')]
[('polname', 'A. J. John, Anaparambil'), ('numcity', '3')]
[('polname', 'A. K. A. Abdul Samad'), ('numcity', '3')]
[('polname', 'A. K. Bose'), ('numcity', '2')]
[('polname', 'A. K. Faezul Huq'), ('numcity', '3')]
[('polname', 'A. K. Gopalan'), ('numcity', '3')]
[('polname', 'A. K. Roy'), ('numcity', '2')]
[('polname', 'A. Karunakara Menon'), ('numcity', '3')]
[('polname', 'A. M. Paraman'), ('numcity', '3')]
[('polname', 'A. M. Zahiruddin Khan'), ('numcity', '3')]
[('polname', 

25

In [16]:
# Same result as the previous query but using another writing design
queryString = """

SELECT DISTINCT ?polname ?num ?p WHERE {

?p wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos ;
     wdt:P27 ?c .
?pos wdt:P17 ?cou .

FILTER(?p = ?pol && ?num > 1)
{
    SELECT ?pol (COUNT(DISTINCT ?nat) AS ?num) WHERE { 

        ?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
             wdt:P39 ?pos1 ;
             wdt:P27 ?nat .
        ?pos1 wdt:P17 ?cou1 .
            
    }
    GROUP BY ?pol
}

?p <http://schema.org/name> ?polname .
}
ORDER BY ASC (?polname)
LIMIT 25
"""

print("Results")
run_query(queryString)

Results
[('polname', 'A V Swamy'), ('num', '2'), ('p', 'http://www.wikidata.org/entity/Q18611756')]
[('polname', 'A. A. Nikonov'), ('num', '2'), ('p', 'http://www.wikidata.org/entity/Q4321109')]
[('polname', 'A. C. Shanmughadas'), ('num', '3'), ('p', 'http://www.wikidata.org/entity/Q15993493')]
[('polname', 'A. Charles'), ('num', '3'), ('p', 'http://www.wikidata.org/entity/Q20047087')]
[('polname', 'A. Chatterjee'), ('num', '2'), ('p', 'http://www.wikidata.org/entity/Q106314804')]
[('polname', 'A. Dharam Dass'), ('num', '3'), ('p', 'http://www.wikidata.org/entity/Q61943802')]
[('polname', 'A. F. M. Ahsanuddin Chowdhury'), ('num', '5'), ('p', 'http://www.wikidata.org/entity/Q278218')]
[('polname', 'A. G.  Savinykh'), ('num', '3'), ('p', 'http://www.wikidata.org/entity/Q4404043')]
[('polname', 'A. G. Kulkarni'), ('num', '3'), ('p', 'http://www.wikidata.org/entity/Q16012824')]
[('polname', 'A. G. Subburaman'), ('num', '2'), ('p', 'http://www.wikidata.org/entity/Q4647807')]
[('polname', 'A

25

#### Comment 
Let's check if the results are correct

In [18]:
queryString = """
SELECT ?polname ?citizenship WHERE { 

?p wdt:P27 ?nat .

FILTER (?p = wd:Q18611756) .

?p <http://schema.org/name> ?polname .
?nat <http://schema.org/name> ?citizenship .

}
"""

print("Results")
run_query(queryString)

Results
[('polname', 'A V Swamy'), ('citizenship', 'Dominion of India')]
[('polname', 'A V Swamy'), ('citizenship', 'India')]


2

#### Comment
The results are correct so I can say that some politicians have double citizenship

## Task 5
5. Analyze the number of politicians in each country by occupation, for instance

### Task 5.1
5.1 What are the top-3 occupations for a politician in Italy and France?

In [74]:
# Top-3 occupations for a politician in Italy
queryString = """
SELECT DISTINCT ?oname (COUNT(DISTINCT ?pol) AS ?num)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?occ != wd:Q82955 && ?p = wdt:P106) .
FILTER(?cou = wd:Q38) .

?occ <http://schema.org/name> ?oname .

}
GROUP BY ?oname
ORDER BY DESC (?num)
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('oname', 'lawyer'), ('num', '593')]
[('oname', 'journalist'), ('num', '507')]
[('oname', 'writer'), ('num', '189')]


3

In [65]:
# Let's print the URIs for the E.U. countries to found the France ones
queryString = """
SELECT DISTINCT ?country ?cou
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?cou wdt:P463 wd:Q458 .

?cou <http://schema.org/name> ?country .
?pol <http://schema.org/name> ?sname .
?pos <http://schema.org/name> ?posname .

}
ORDER BY ASC (?country)
"""

print("Results")
run_query(queryString)

Results
[('country', 'Austria'), ('cou', 'http://www.wikidata.org/entity/Q40')]
[('country', 'Belgium'), ('cou', 'http://www.wikidata.org/entity/Q31')]
[('country', 'Bulgaria'), ('cou', 'http://www.wikidata.org/entity/Q219')]
[('country', 'Croatia'), ('cou', 'http://www.wikidata.org/entity/Q224')]
[('country', 'Cyprus'), ('cou', 'http://www.wikidata.org/entity/Q229')]
[('country', 'Czech Republic'), ('cou', 'http://www.wikidata.org/entity/Q213')]
[('country', 'Denmark'), ('cou', 'http://www.wikidata.org/entity/Q35')]
[('country', 'Estonia'), ('cou', 'http://www.wikidata.org/entity/Q191')]
[('country', 'Finland'), ('cou', 'http://www.wikidata.org/entity/Q33')]
[('country', 'France'), ('cou', 'http://www.wikidata.org/entity/Q142')]
[('country', 'Germany'), ('cou', 'http://www.wikidata.org/entity/Q183')]
[('country', 'Greece'), ('cou', 'http://www.wikidata.org/entity/Q41')]
[('country', 'Hungary'), ('cou', 'http://www.wikidata.org/entity/Q28')]
[('country', 'Ireland'), ('cou', 'http://www

28

#### Comment
We found:
    
| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wd:Q142`     | European Union   | node |

In [25]:
# Top-3 occupations for a politician in France
queryString = """
SELECT DISTINCT ?oname (COUNT(DISTINCT ?pol) AS ?num)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?occ != wd:Q82955 && ?p = wdt:P106) .
FILTER(?cou = wd:Q142) .

?occ <http://schema.org/name> ?oname .

}
GROUP BY ?oname
ORDER BY DESC (?num)
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('oname', 'pensioner'), ('num', '14285')]
[('oname', 'anciens cadres'), ('num', '5433')]
[('oname', 'farm operator'), ('num', '4812')]


3

#### Comment 
Let's check the results

In [50]:
# Let's print the URIs of the top 3 occupations for French politician
queryString = """
SELECT DISTINCT ?oname (COUNT(DISTINCT ?pol) AS ?num) ?occ
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?occ != wd:Q82955 && ?p = wdt:P106) .
FILTER(?cou = wd:Q142) .

?occ <http://schema.org/name> ?oname .

}
GROUP BY ?occ ?oname
ORDER BY DESC (?num)
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('oname', 'pensioner'), ('num', '14285'), ('occ', 'http://www.wikidata.org/entity/Q1749879')]
[('oname', 'anciens cadres'), ('num', '5433'), ('occ', 'http://www.wikidata.org/entity/Q97768274')]
[('oname', 'farm operator'), ('num', '4812'), ('occ', 'http://www.wikidata.org/entity/Q3062119')]


3

In [72]:
#Let's print the French "farm operator" politician
queryString = """
SELECT COUNT(DISTINCT ?pol) 
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?p = wdt:P106 && ?occ = wd:Q3062119) .
FILTER(?cou = wd:Q142) .

?occ <http://schema.org/name> ?oname .

}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '4812')]


1

#### Comment
After the check we see that the results are correct and so we can say that we have discovered the top-3 occupations for the Italian and French politician.  

### Task 5.2
5.2 What if you consider only politicians for which we don't have a date of death?

#### Comment 
I suppose that this task intend to reply the previous results but with this additional condition.
So the results obtained for this task are taking into account only the politician still alive and the ones that have not their date of death saved.

From the initial tasks we discovered that

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P570`   | date of death   | predicate |

In [73]:
# Top-3 occupations for a politician in Italy without "date of death" relation
queryString = """
SELECT DISTINCT ?oname (COUNT(DISTINCT ?pol) AS ?num)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?occ != wd:Q82955 && ?p = wdt:P106) .
FILTER(?cou = wd:Q38) .
FILTER NOT EXISTS { ?pol wdt:P570 ?dd . } .

?p <http://schema.org/name> ?property .
?occ <http://schema.org/name> ?oname .

}
GROUP BY ?oname
ORDER BY DESC (?num)
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('oname', 'journalist'), ('num', '236')]
[('oname', 'lawyer'), ('num', '214')]
[('oname', 'entrepreneur'), ('num', '90')]


3

In [60]:
# Top-3 occupations for a politician in Italy without "date of death" relation
queryString = """
SELECT DISTINCT ?oname (COUNT(DISTINCT ?pol) AS ?num)
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     ?p ?occ ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .

FILTER(?occ != wd:Q82955 && ?p = wdt:P106) .
FILTER(?cou = wd:Q142) .
FILTER NOT EXISTS { ?pol wdt:P570 ?dd . } .

?p <http://schema.org/name> ?property .
?occ <http://schema.org/name> ?oname .

}
GROUP BY ?oname
ORDER BY DESC (?num)
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('oname', 'pensioner'), ('num', '14176')]
[('oname', 'anciens cadres'), ('num', '5433')]
[('oname', 'farm operator'), ('num', '4799')]


3

#### Comment 
For the Italian politicians we see from the difference between the results of this and the previous task that, more or less, a half of them have their "date of death" recorded, instead for the French ones the high majority doesn't have the "date of death".  
Mayybe this behaviour is due to the differences in the total number of the politicians between the two countries. 

### Task 5.3
5.3 Which politicians had a spouse that was also a politician? How many in each country?

#### Comment
I'm going to consider only Italian and French politicians, like in the previous tasks, in answering the task.

We have:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P26`     | spouse   | predicate |

In [16]:
# Let's print the Italian and French politicians which had a spouse that was also a politician
queryString = """
SELECT DISTINCT ?politician ?country ?spouse
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P26 ?sp ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?sp wdt:P106/wdt:P279{0,2} wd:Q82955 ;
    wdt:P39 ?pos1 .
?pos1 wdt:P17 ?cou1 .

FILTER(?cou = wd:Q142 || ?cou = wd:Q38) .

?pol <http://schema.org/name> ?politician .
?sp <http://schema.org/name> ?spouse .
?cou <http://schema.org/name> ?country .

}
"""

print("Results")
run_query(queryString)

Results
[('politician', 'Yvonne de Gaulle'), ('country', 'France'), ('spouse', 'Charles de Gaulle')]
[('politician', 'Cécilia Attias'), ('country', 'France'), ('spouse', 'Nicolas Sarkozy')]
[('politician', 'Bernadette Chirac'), ('country', 'France'), ('spouse', 'Jacques Chirac')]
[('politician', "Anne-Aymone Giscard d'Estaing"), ('country', 'France'), ('spouse', "Valéry Giscard d'Estaing")]
[('politician', 'Boris Vallaud'), ('country', 'France'), ('spouse', 'Najat Vallaud-Belkacem')]
[('politician', 'Aurore Bergé'), ('country', 'France'), ('spouse', 'Nicolas Bays')]
[('politician', 'Girolamo Riario'), ('country', 'Italy'), ('spouse', 'Caterina Sforza')]
[('politician', 'Caterina Sforza'), ('country', 'Italy'), ('spouse', 'Girolamo Riario')]
[('politician', "Marie de' Medici"), ('country', 'France'), ('spouse', 'Henry IV of France')]
[('politician', 'Marie-Claude Bompard'), ('country', 'France'), ('spouse', 'Jacques Bompard')]
[('politician', 'Mara Carfagna'), ('country', 'Italy'), ('sp

160

In [15]:
#Let's count the number of Italian and French politicians which had a spouse that is was also a politician 
queryString = """
SELECT ?country (COUNT(DISTINCT ?pol) AS ?num) 
WHERE { 

?pol wdt:P106/wdt:P279{0,2} wd:Q82955 ;
     wdt:P26 ?sp ;
     wdt:P39 ?pos .
?pos wdt:P17 ?cou .
?sp wdt:P106/wdt:P279{0,2} wd:Q82955 ;
    wdt:P39 ?pos1 .
?pos1 wdt:P17 ?cou1 .

FILTER(?cou = wd:Q142 || ?cou = wd:Q38) .

?cou <http://schema.org/name> ?country .

}
GROUP BY ?country

"""

print("Results")
run_query(queryString)

Results
[('country', 'France'), ('num', '124')]
[('country', 'Italy'), ('num', '36')]


2

#### Comment
The two queries above are answering the task and with the second one we can even verify the correcness of the results of the first.