# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-10b0246009-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Politics Workflow Series ("Politicians in E.U.") 

Consider the following exploratory information need:

> You investigating the careers of politicians in the E.U.

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`     | country       | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wdt:P39`     | position held   | predicate |
| `wdt:P106`    | profession   | predicate |
| `wd:Q82955`   | politician | node      |
| `wd:Q46`      | Europe        | node |
| `wd:Q38`      | Italy          | node |



Also consider

```
?p wdt:P106/wdt:P279* wd:Q82955  . 
```

is the BGP to retrieve all **politicians**

## Workload Goals


1. Identify the BGP to retrieve E.U. countries and their politicians

2. Identify the BGP for obtaining other occupations and properties of politicians

3. How many politicians are recorder for each E.U. country?

4. Are there politicians with double citizenship?

5. Analyze the number of politicians in each country by occupation, for instance
 
   5.1 What are the top-3 occupations for a politician in Italy and France?
   
   5.2 What if you consider only politicians for which we don't have a date of death?
   
   5.3 Which politicians had a spouse that was also a politician? How many in each country?


In [3]:
# start your workflow here

In [4]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P106/wdt:P279 wd:Q82955  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '15556')]


1

### 1.Identify the BGP to retrieve E.U. countries and their politicians

 I add the contraints that politicians must be alive, otherwise it will return me lots of politician totally untied with EU.

In [15]:
queryString = """
SELECT ?c ?cname ?s ?sname WHERE { 

    ?s wdt:P106/wdt:P279* wd:Q82955  ;
       wdt:P27 ?c .
    ?c wdt:P463 wd:Q458 . # same codes found in notebook edb024393e, countries in E.U.

    # this returns the labels
    ?c <http://schema.org/name> ?cname .
    ?s <http://schema.org/name> ?sname .
    
} 
GROUP BY ?c
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('c', 'http://www.wikidata.org/entity/Q142'), ('cname', 'France'), ('s', 'http://www.wikidata.org/entity/Q65591310'), ('sname', 'Benoît Machinet')]
[('c', 'http://www.wikidata.org/entity/Q142'), ('cname', 'France'), ('s', 'http://www.wikidata.org/entity/Q65591365'), ('sname', 'Francis Magdalou')]
[('c', 'http://www.wikidata.org/entity/Q29'), ('cname', 'Spain'), ('s', 'http://www.wikidata.org/entity/Q86849628'), ('sname', 'Manuel María Lejarreta Allende')]
[('c', 'http://www.wikidata.org/entity/Q29'), ('cname', 'Spain'), ('s', 'http://www.wikidata.org/entity/Q66812732'), ('sname', 'Carme Benages i Cort')]
[('c', 'http://www.wikidata.org/entity/Q38'), ('cname', 'Italy'), ('s', 'http://www.wikidata.org/entity/Q105084911'), ('sname', 'Luigia Cordati Rosaia')]
[('c', 'http://www.wikidata.org/entity/Q142'), ('cname', 'France'), ('s', 'http://www.wikidata.org/entity/Q99173782'), ('sname', 'Pierre-Antoine Mio')]
[('c', 'http://www.wikidata.org/entity/Q142'), ('cname', 'France'), ('s',

10

In [16]:
queryString = """
SELECT ?p ?pname ?o ?oname WHERE { 

    wd:Q65591310 ?p ?o .

    # this returns the labels
    ?p <http://schema.org/name> ?pname .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('o', 'http://www.wikidata.org/entity/Q3062119')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('o', 'http://www.wikidata.org/entity/Q82955')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation'), ('o', 'http://www.wikidata.org/entity/Q97768146')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pname', 'languages spoken, written or signed'), ('o', 'http://www.wikidata.org/entity/Q150')]
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pname', 'sex or gender'), ('o', 'http://www.wikidata.org/entity/Q6581097')]
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pname', 'country of citizenship'), ('o', 'http://www.wikidata.org/entity/Q142')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q5')]
[('p', 'http://www.wikidata.org/prop/direct/P39'), ('pname', 'position held'), 

13

In [None]:
Let's see the property of a State in the EU

In [3]:
queryString = """
SELECT ?p ?pname ?o ?oname WHERE { 

    wd:Q38 ?p ?o .

    # this returns the labels
    ?p <http://schema.org/name> ?pname .
    ?o <http://schema.org/name> ?oname .
} 
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P463'), ('pname', 'member of'), ('o', 'http://www.wikidata.org/entity/Q488981'), ('oname', 'European Bank for Reconstruction and Development')]
[('p', 'http://www.wikidata.org/prop/direct/P2936'), ('pname', 'language used'), ('o', 'http://www.wikidata.org/entity/Q33973'), ('oname', 'Sicilian')]
[('p', 'http://www.wikidata.org/prop/direct/P2184'), ('pname', 'history of topic'), ('o', 'http://www.wikidata.org/entity/Q7791'), ('oname', 'history of Italy')]
[('p', 'http://www.wikidata.org/prop/direct/P163'), ('pname', 'flag'), ('o', 'http://www.wikidata.org/entity/Q42876'), ('oname', 'flag of Italy')]
[('p', 'http://www.wikidata.org/prop/direct/P150'), ('pname', 'contains administrative territorial entity'), ('o', 'http://www.wikidata.org/entity/Q1273'), ('oname', 'Tuscany')]
[('p', 'http://www.wikidata.org/prop/direct/P2936'), ('pname', 'language used'), ('o', 'http://www.wikidata.org/entity/Q33845'), ('oname', 'Neapolitan')]
[('p', 'htt

301

I find member of (P463) EU (Q458)

### 2. Identify the BGP for obtaining other occupations and properties of politicians

I'll show BGP to retrive all the properties related to politicians in UE

In [25]:
queryString = """
SELECT ?s ?p ?pname  WHERE { 

    ?s wdt:P106/wdt:P279* wd:Q82955  ;
        wdt:P27 ?c ; #to retrieve country
        ?p ?o .
    ?c wdt:P463 wd:Q458 . #country to be in UE 

    # this returns the labels
    ?p <http://schema.org/name> ?pname .
    
    FILTER NOT EXISTS {?s wdt:P570 ?d} . #add the constraints that they are alive


} 
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q48321355'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q48321355'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q28964087'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q28964087'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q28964087'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q28964087'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('s', 'http://www.wikidata.org/entity/Q65131729'), ('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political 

10

I choose for example Francis Magdalou (from the query above).

In [26]:
queryString = """
SELECT ?p ?pname WHERE { 

    wd:Q65591365 ?p ?o .

    # this returns the labels
    ?p <http://schema.org/name> ?pname .
} 
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P1412'), ('pname', 'languages spoken, written or signed')]
[('p', 'http://www.wikidata.org/prop/direct/P21'), ('pname', 'sex or gender')]
[('p', 'http://www.wikidata.org/prop/direct/P27'), ('pname', 'country of citizenship')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pname', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P39'), ('pname', 'position held')]
[('p', 'http://www.wikidata.org/prop/direct/P39'), ('pname', 'position held')]
[('p', 'http://www.wikidata.org/prop/direct/P39'), ('pname', 'position held')]


10

### 3. How many politicians are recorded for each E.U. country?

In [30]:
queryString = """
SELECT (COUNT (?s) as ?numberPoliticians) ?country  WHERE { 

    ?s wdt:P106/wdt:P279* wd:Q82955  ;
       wdt:P27 ?c . #to retrieve country .
    ?c wdt:P463 wd:Q458 . #country to be in UE 

    # this returns the labels
    ?c <http://schema.org/name> ?country .
    
    FILTER NOT EXISTS {?s wdt:P570 ?d} . #add the constraints that they are alive

} 
GROUP BY ?country
ORDER BY desc (?numberPoliticians)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('numberPoliticians', '57987'), ('country', 'France')]
[('numberPoliticians', '22250'), ('country', 'Spain')]
[('numberPoliticians', '11934'), ('country', 'Germany')]
[('numberPoliticians', '7645'), ('country', 'Romania')]
[('numberPoliticians', '6182'), ('country', 'Italy')]
[('numberPoliticians', '5091'), ('country', 'United Kingdom')]
[('numberPoliticians', '4120'), ('country', 'Kingdom of the Netherlands')]
[('numberPoliticians', '3101'), ('country', 'Poland')]
[('numberPoliticians', '2984'), ('country', 'Austria')]
[('numberPoliticians', '2867'), ('country', 'Sweden')]


10

### 4. Are there politicians with double citizenship?

In [38]:
queryString = """
SELECT ?s ?politician ?country  WHERE { 

    ?s wdt:P106/wdt:P279* wd:Q82955  ;
                wdt:P27 ?c1 ;
                wdt:P27 ?c2 . #to retrieve country
    ?c1 wdt:P463 wd:Q458 . #country to be in UE 

    # this returns the labels
    ?s <http://schema.org/name> ?politician .
    ?c1 <http://schema.org/name> ?country1 .
    ?c2 <http://schema.org/name> ?country2 .
    
    FILTER NOT EXISTS {?s wdt:P570 ?d} . #add the constraints that they are alive
    FILTER (?c1 != ?c2)
    
} 
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q3434633'), ('politician', 'Robert Bourgi')]
[('s', 'http://www.wikidata.org/entity/Q6514674'), ('politician', 'Lee Odenwalder')]
[('s', 'http://www.wikidata.org/entity/Q5246470'), ('politician', 'Dean Smith')]
[('s', 'http://www.wikidata.org/entity/Q3108729'), ('politician', 'Mike Rann')]
[('s', 'http://www.wikidata.org/entity/Q7442423'), ('politician', 'Sebastian Gorka')]
[('s', 'http://www.wikidata.org/entity/Q63987238'), ('politician', 'Josh Burns')]
[('s', 'http://www.wikidata.org/entity/Q3108729'), ('politician', 'Mike Rann')]
[('s', 'http://www.wikidata.org/entity/Q7442423'), ('politician', 'Sebastian Gorka')]
[('s', 'http://www.wikidata.org/entity/Q5271209'), ('politician', 'Diana Laidlaw')]
[('s', 'http://www.wikidata.org/entity/Q22230789'), ('politician', 'Liz Mair')]


10

#### 5.1 What are the top-3 occupations for a politician in Italy and France?

In [5]:
queryString = """
SELECT ( COUNT (?sname) as ?number) ?oname {
    
    ?s  wdt:P106/wdt:P279* wd:Q82955  ;
        wdt:P106 ?o ;
        wdt:P27 ?c .
    
    ?s <http://schema.org/name> ?sname .
    ?o <http://schema.org/name> ?oname .

    FILTER ((?c = wd:Q38) || (?c = wd:Q142)) .
}
GROUP BY ?oname
ORDER BY desc (?number)
LIMIT 4

"""

print("Results")
run_query(queryString)

Results
[('number', '91577'), ('oname', 'politician')]
[('number', '14607'), ('oname', 'pensioner')]
[('number', '5384'), ('oname', 'anciens cadres')]
[('number', '4854'), ('oname', 'farm operator')]


4

#### 5.2 What if you consider only politicians for which we don't have a date of death?

In [6]:
queryString = """
SELECT ( COUNT (?sname) as ?number) ?oname {
    
    ?s  wdt:P106/wdt:P279* wd:Q82955  ;
        wdt:P106 ?o ;
        wdt:P27 ?c .
    
    ?s <http://schema.org/name> ?sname .
    ?o <http://schema.org/name> ?oname .

    FILTER ((?c = wd:Q38) || (?c = wd:Q142)) .
    FILTER NOT EXISTS {?s wdt:P570 ?d} .
}
GROUP BY ?oname
ORDER BY desc (?number)
LIMIT 4

"""

print("Results")
run_query(queryString)

Results
[('number', '63074'), ('oname', 'politician')]
[('number', '14496'), ('oname', 'pensioner')]
[('number', '5384'), ('oname', 'anciens cadres')]
[('number', '4840'), ('oname', 'farm operator')]


4

#### 5.3 Which politicians had a spouse that was also a politician? How many in each country
trovare p26

In [None]:
Let's search for the property of marriage

In [50]:
queryString = """
SELECT DISTINCT ?p ?pname {
    
    ?s  wdt:P31 wd:Q5 ;
        ?p ?sp .
    ?sp wdt:P31 wd:Q5 .
    
    ?s <http://schema.org/name> ?sname .
    ?p <http://schema.org/name> ?pname .
}
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1038'), ('pname', 'relative')]
[('p', 'http://www.wikidata.org/prop/direct/P1066'), ('pname', 'student of')]
[('p', 'http://www.wikidata.org/prop/direct/P1290'), ('pname', 'godparent')]
[('p', 'http://www.wikidata.org/prop/direct/P3448'), ('pname', 'stepparent')]
[('p', 'http://www.wikidata.org/prop/direct/P185'), ('pname', 'doctoral student')]
[('p', 'http://www.wikidata.org/prop/direct/P3373'), ('pname', 'sibling')]
[('p', 'http://www.wikidata.org/prop/direct/P40'), ('pname', 'child')]
[('p', 'http://www.wikidata.org/prop/direct/P22'), ('pname', 'father')]
[('p', 'http://www.wikidata.org/prop/direct/P26'), ('pname', 'spouse')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pname', 'different from')]


10

I found spouse: P26 

In [60]:
queryString = """
SELECT (COUNT (?s) AS ?numberPolitician) ?cname {
    
    ?s  wdt:P106 wd:Q82955  ;
        wdt:P26 ?sp ;
        wdt:P27 ?c .
        
    ?sp  wdt:P106 wd:Q82955 .
    
    ?c wdt:P463 wd:Q458 .   #country to be in UE 
    
    ?c <http://schema.org/name> ?cname .
}
GROUP BY ?cname
ORDER BY desc(?numberPolitician)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('numberPolitician', '311'), ('cname', 'United Kingdom')]
[('numberPolitician', '294'), ('cname', 'France')]
[('numberPolitician', '221'), ('cname', 'Germany')]
[('numberPolitician', '163'), ('cname', 'Spain')]
[('numberPolitician', '95'), ('cname', 'Sweden')]
[('numberPolitician', '79'), ('cname', 'Finland')]
[('numberPolitician', '69'), ('cname', 'Italy')]
[('numberPolitician', '51'), ('cname', 'Denmark')]
[('numberPolitician', '44'), ('cname', 'Poland')]
[('numberPolitician', '43'), ('cname', 'Hungary')]


10

Note: I had to delete /wdt:P279*: system returned a memory error.