# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-c9b02411ac-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("Place of Birth, Death, and Burial") 

Consider the following exploratory information need:

> You want to visit cities connected to famous writers and poets, and you are deciding wether to visit France or Germany

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wd:Q183`     | Germany       | node |
| `wd:Q142`     | France        | node |
| `wd:Q90`      | Paris         | node |
| `wd:Q49757`   | Poet          | node |
| `wd:Q36180`   | Writer        | node |
| `wd:Q501`     | Charles Baudelaire  | node      |
| `wd:Q272208`  | Montparnasse Cemetery       | node |


Also consider

```
?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
```

is the BGP to retrieve all **French writers**

## Workload Goals

1. Identify the BGP that connect people to their place of birth or place, death, or burial

2. Identify the BGP to obtain the country in which a place is located

3. How many poets and writers  have a place of birth, death, or burial in Germany and France?

4. Analyze cities across the two countries
 
   4.1 Is there any poet for which the birth place and the place of burial are located in the same city either in Germany or France?
   
   4.2 Which cities host the place of birth of the larger number of poets or writers across the two countries?
   
   4.3 What are the top 3 cities in each country that you could visit? Based on what criteria?


In [2]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
} 
GROUP BY ?cult  ?arch
"""

print("Predicates")
run_query(queryString)

Predicates
[('callret-0', '14400')]


1

## My workflow
### Search for properties of writers that are related with places of burial, death and birth

### 1. Identify the BGP that connect people to their place of birth or place, death, or burial

In [4]:
queryString = """
SELECT DISTINCT ?p ?pname WHERE {
    # search for poet
    ?poet wdt:P106 wd:Q36180 ;
            ?p ?o .
    
    # this returns the labels
    ?p <http://schema.org/name> ?pname .
    
    # exclude data properties
    FILTER(!isLiteral(?o))
}
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P102'), ('pname', 'member of political party')]
[('p', 'http://www.wikidata.org/prop/direct/P1050'), ('pname', 'medical condition')]
[('p', 'http://www.wikidata.org/prop/direct/P106'), ('pname', 'occupation')]
[('p', 'http://www.wikidata.org/prop/direct/P1066'), ('pname', 'student of')]
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('pname', 'employer')]
[('p', 'http://www.wikidata.org/prop/direct/P109'), ('pname', 'signature')]
[('p', 'http://www.wikidata.org/prop/direct/P119'), ('pname', 'place of burial')]
[('p', 'http://www.wikidata.org/prop/direct/P1196'), ('pname', 'manner of death')]
[('p', 'http://www.wikidata.org/prop/direct/P1303'), ('pname', 'instrument')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pname', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1344'), ('pname', 'participant in')]
[('p', 'http://www.wikidata.org/prop/direct/P136'), ('pname', 'genre')]
[('p', 'http://www.wikida

30

I discover properties P119 (place of burial), P19 (place of birth) and P20 (place of death).

In [7]:
queryString = """
SELECT DISTINCT ?s ?sname ?class ?classname WHERE {
   
    ?s  wdt:P19 ?o ;
        wdt:P31 ?class .
        
    
    # this returns the labels
    ?s <http://schema.org/name> ?sname .
    ?class <http://schema.org/name> ?classname .
    
}
LIMIT 30
"""

print("Results")
run_query(queryString)

Results
[('s', 'http://www.wikidata.org/entity/Q13656462'), ('sname', 'Iris Brooks'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classname', 'fictional human')]
[('s', 'http://www.wikidata.org/entity/Q13656760'), ('sname', 'Lisa Dagdelen'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classname', 'fictional human')]
[('s', 'http://www.wikidata.org/entity/Q13656907'), ('sname', 'Murat Dagdelen'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classname', 'fictional human')]
[('s', 'http://www.wikidata.org/entity/Q13657412'), ('sname', 'Carsten Flöter'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classname', 'fictional human')]
[('s', 'http://www.wikidata.org/entity/Q13660282'), ('sname', 'Andrea Neumann'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classname', 'fictional human')]
[('s', 'http://www.wikidata.org/entity/Q13882716'), ('sname', 'Ahmet Dagdelen'), ('class', 'http://www.wikidata.org/entity/Q15632617'), ('classnam

30

I decided to explicitly avoid literary, fictional or human characters because often they can lack of information and use only Q5 (human).

In [6]:
queryString = """
SELECT DISTINCT ?hname ?bname ?dname ?burial WHERE {
   
   # search for poet
    ?human  wdt:P19 ?birthplace ;
            wdt:P20 ?deathplace ;
            wdt:P119 ?burialplace ;
            wdt:P31 wd:Q5 .
        
    
    # this returns the labels
    ?human <http://schema.org/name> ?hname .
    ?birthplace <http://schema.org/name> ?bname .
    ?deathplace <http://schema.org/name> ?dname .
    ?burialplace <http://schema.org/name> ?burial .

}
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('hname', 'Henri Fantin-Latour'), ('bname', 'Grenoble'), ('dname', 'Buré'), ('burial', 'Montparnasse Cemetery')]
[('hname', 'Pablo Gargallo'), ('bname', 'Maella'), ('dname', 'Reus'), ('burial', 'Montjuïc Cemetery')]
[('hname', 'Vicente Enrique y Tarancón'), ('bname', 'Borriana'), ('dname', 'Valencia'), ('burial', 'San Isidro Church')]
[('hname', 'Bohuslav Balbín'), ('bname', 'Hradec Králové'), ('dname', 'Prague'), ('burial', 'Prague')]
[('hname', 'Bohuslav Balbín'), ('bname', 'Hradec Králové'), ('dname', 'Prague'), ('burial', 'St. Salvator Church')]


5

### 2. Identify the BGP to obtain the country in which a place is located

Starting from the table, I look for a property of the cemetery

In [22]:
queryString = """
SELECT DISTINCT ?p ?property ?pname ?o ?oname WHERE {
    # search for poe
    wd:Q272208 ?p ?o .
    
    # this returns the labels
    ?p <http://schema.org/name> ?property .
    ?o <http://schema.org/name> ?oname .
    
    # exclude data properties
    FILTER(!isLiteral(?o))
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1791'), ('property', 'category of people buried here'), ('o', 'http://www.wikidata.org/entity/Q6444336'), ('oname', 'Category:Burials at Montparnasse Cemetery')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('property', 'country'), ('o', 'http://www.wikidata.org/entity/Q142'), ('oname', 'France')]
[('p', 'http://www.wikidata.org/prop/direct/P495'), ('property', 'country of origin'), ('o', 'http://www.wikidata.org/entity/Q142'), ('oname', 'France')]
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('property', 'located in the administrative territorial entity'), ('o', 'http://www.wikidata.org/entity/Q187153'), ('oname', '14th arrondissement of Paris')]
[('p', 'http://www.wikidata.org/prop/direct/P3032'), ('property', 'adjacent building'), ('o', 'http://www.wikidata.org/entity/Q764465'), ('oname', 'Montparnasse – Bienvenüe')]
[('p', 'http://www.wikidata.org/prop/direct/P669'), ('property', 'located on street'), ('o', 'http://w

9

P17 links cemetery to the country
so the bgp is 

?place  wdt:P17 ?country .
            

### 3. How many poets and writers have a place of birth, death, or burial in Germany and France?

In [8]:
queryString = """
SELECT (COUNT (?literati) AS ?count) WHERE {
   
   # search for poet
    ?s  wdt:P19 ?birthplace ;
        wdt:P20 ?deathplace ;
        wdt:P119 ?burialplace ;  
        wdt:P106 ?occupation .
                
    ?birthplace wdt:P17 ?country .
    ?deathplace wdt:P17 ?country .
    ?burialplace wdt:P17 ?country . 
    
    ?s <http://schema.org/name> ?literati .

    
    # I add filters to specify work (writers or poet) and country (Germany or France )
    FILTER ((?occupation = wd:Q49757) || (?occupation = wd:Q36180)) .
    FILTER ((?country = wd:Q183) || (?country = wd:Q142)) .

}
"""

print("Results")
run_query(queryString)

Results
[('count', '2454')]


1

### 4.1 Is there any poet for which the birth place and the place of burial are located in the same city either in Germany or France?


From the query above, I discover that Montparnasse Cemetery is linked to 14th arrondissement of Paris with P131

In [10]:
queryString = """
SELECT DISTINCT ?p ?property ?pname ?o ?oname WHERE {
    # search for poe
    wd:Q187153 ?p ?o .
    
    # this returns the labels
    ?p <http://schema.org/name> ?property .
    ?o <http://schema.org/name> ?oname .
    
    # exclude data properties
    FILTER(!isLiteral(?o))
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P47'), ('property', 'shares border with'), ('o', 'http://www.wikidata.org/entity/Q175129'), ('oname', '13th arrondissement of Paris')]
[('p', 'http://www.wikidata.org/prop/direct/P1365'), ('property', 'replaces'), ('o', 'http://www.wikidata.org/entity/Q15149909'), ('oname', 'former 12th arrondissement of Paris')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('property', 'country'), ('o', 'http://www.wikidata.org/entity/Q142'), ('oname', 'France')]
[('p', 'http://www.wikidata.org/prop/direct/P47'), ('property', 'shares border with'), ('o', 'http://www.wikidata.org/entity/Q238723'), ('oname', '5th arrondissement of Paris')]
[('p', 'http://www.wikidata.org/prop/direct/P47'), ('property', 'shares border with'), ('o', 'http://www.wikidata.org/entity/Q245546'), ('oname', '6th arrondissement of Paris')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('property', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q702842'), ('ona

22

So, with another P131 I find Paris.

In [11]:
queryString = """
SELECT DISTINCT ?literati ?birth ?cityb WHERE {
   
   # search for poet
    ?s  wdt:P19 ?citybirth ;
        wdt:P119 ?burialplace ;
        wdt:P106 wd:Q49757 .
                
    ?burialplace wdt:P131 ?cityburial .
    
    ?citybirth wdt:P17 ?country .
    ?cityburial wdt:P17 ?country .
    
    # this returns the labels
     ?s <http://schema.org/name> ?literati .
     ?citybirth <http://schema.org/name> ?birth .
     ?cityburial <http://schema.org/name> ?cityb .
     
    FILTER (?citybirth = ?cityburial) .
    FILTER ((?country = wd:Q183) || (?country = wd:Q142)) .
} 
LIMIT 10
"""
print("Results")
run_query(queryString)

Results
[('literati', 'Valery Larbaud'), ('birth', 'Vichy'), ('cityb', 'Vichy')]
[('literati', 'Joseph Victor von Scheffel'), ('birth', 'Karlsruhe'), ('cityb', 'Karlsruhe')]
[('literati', 'Johann Georg Daniel Arnold'), ('birth', 'Strasbourg'), ('cityb', 'Strasbourg')]
[('literati', 'Hanns Heinz Ewers'), ('birth', 'Düsseldorf'), ('cityb', 'Düsseldorf')]
[('literati', 'Jean Lescure'), ('birth', 'Asnières-sur-Seine'), ('cityb', 'Asnières-sur-Seine')]
[('literati', 'Hans Sachs'), ('birth', 'Nuremberg'), ('cityb', 'Nuremberg')]
[('literati', 'Roger Gilbert-Lecomte'), ('birth', 'Reims'), ('cityb', 'Reims')]
[('literati', 'Sophie Hoechstetter'), ('birth', 'Pappenheim'), ('cityb', 'Pappenheim')]
[('literati', 'Édouard Turquety'), ('birth', 'Rennes'), ('cityb', 'Rennes')]
[('literati', 'August Hermann Niemeyer'), ('birth', 'Halle (Saale)'), ('cityb', 'Halle (Saale)')]


10

### 4.2 Which cities host the place of birth of the largest number of poets or writers across the two countries?

In [20]:
queryString = """
SELECT (COUNT (?literari) AS ?numLiterari) ?birthplace ?birth WHERE {
   
    ?s  wdt:P19 ?birthplace ;  
        wdt:P106 ?lavoro .
    
    ?birthplace wdt:P17 ?country .
    
    # this returns the labels
     ?s <http://schema.org/name> ?literari .
     ?birthplace <http://schema.org/name> ?birth .
     
     FILTER ((?lavoro = wd:Q49757) || (?lavoro = wd:Q36180)) .
     FILTER ((?country = wd:Q183) || (?country = wd:Q142)) .
}

GROUP BY ?birthplace ?birth 
ORDER BY desc(?numLiterari)

LIMIT 10


"""
print("Results")
run_query(queryString)

Results
[('numLiterari', '2787'), ('birthplace', 'http://www.wikidata.org/entity/Q90'), ('birth', 'Paris')]
[('numLiterari', '2007'), ('birthplace', 'http://www.wikidata.org/entity/Q64'), ('birth', 'Berlin')]
[('numLiterari', '895'), ('birthplace', 'http://www.wikidata.org/entity/Q1055'), ('birth', 'Hamburg')]
[('numLiterari', '816'), ('birthplace', 'http://www.wikidata.org/entity/Q1726'), ('birth', 'Munich')]
[('numLiterari', '481'), ('birthplace', 'http://www.wikidata.org/entity/Q2079'), ('birth', 'Leipzig')]
[('numLiterari', '461'), ('birthplace', 'http://www.wikidata.org/entity/Q365'), ('birth', 'Cologne')]
[('numLiterari', '446'), ('birthplace', 'http://www.wikidata.org/entity/Q1794'), ('birth', 'Frankfurt am Main')]
[('numLiterari', '395'), ('birthplace', 'http://www.wikidata.org/entity/Q1022'), ('birth', 'Stuttgart')]
[('numLiterari', '388'), ('birthplace', 'http://www.wikidata.org/entity/Q1731'), ('birth', 'Dresden')]
[('numLiterari', '323'), ('birthplace', 'http://www.wikidata

10

Since I see that numers are pretty low, I assume that lots of neighborhood are counted as cities, so with a path I try to force that all the places of birth must be grouped in instances of city

In [32]:
queryString = """
SELECT (COUNT (?literati) AS ?numArtists) ?vera WHERE {
       values ?country {wd:Q183 wd:Q142}
    ?s  wdt:P19 ?birthplace ;  
        wdt:P106 ?job .
                          
    ?birthplace wdt:P17 ?country ;
                wdt:P131? ?real .
    
    ?real wdt:P31*/wdt:P279 wd:Q515 . #path to specifically 
    
    # this returns the labels
     ?s <http://schema.org/name> ?literati .
     ?birthplace <http://schema.org/name> ?birth .
     ?real <http://schema.org/name> ?vera .
     
     FILTER ((?job = wd:Q49757) || (?job = wd:Q36180)) .
}

GROUP BY ?vera 
ORDER BY desc(?numArtists)

LIMIT 10


"""
print("Results")
run_query(queryString)

Results
[('numArtists', '6063'), ('vera', 'Berlin')]
[('numArtists', '5622'), ('vera', 'Hamburg')]
[('numArtists', '2463'), ('vera', 'Munich')]
[('numArtists', '1800'), ('vera', 'Frankfurt am Main')]
[('numArtists', '978'), ('vera', 'Leipzig')]
[('numArtists', '905'), ('vera', 'Magdeburg')]
[('numArtists', '836'), ('vera', 'Stuttgart')]
[('numArtists', '820'), ('vera', 'Dresden')]
[('numArtists', '796'), ('vera', 'Bonn')]
[('numArtists', '792'), ('vera', 'Essen')]


10

### 4.3 What are the top 3 cities in each country that you could visit? Based on what criteria?

In [22]:
# I search for some interesting property of a city

queryString = """
SELECT DISTINCT ?p ?properties ?o ?oname WHERE {
    wd:Q64 ?p ?o  .
    
    ?p <http://schema.org/name> ?properties .
    ?o <http://schema.org/name> ?oname
}
LIMIT 50
"""
print("Results")
run_query(queryString)


Results
[('p', 'http://www.wikidata.org/prop/direct/P8744'), ('properties', 'economy of topic'), ('o', 'http://www.wikidata.org/entity/Q320054'), ('oname', 'economy of Berlin')]
[('p', 'http://www.wikidata.org/prop/direct/P6104'), ('properties', 'maintained by WikiProject'), ('o', 'http://www.wikidata.org/entity/Q59078352'), ('oname', 'Wikipedia:WikiProject Berlin')]
[('p', 'http://www.wikidata.org/prop/direct/P1830'), ('properties', 'owner of'), ('o', 'http://www.wikidata.org/entity/Q685725'), ('oname', 'Deutschlandhalle')]
[('p', 'http://www.wikidata.org/prop/direct/P194'), ('properties', 'legislative body'), ('o', 'http://www.wikidata.org/entity/Q640859'), ('oname', 'Abgeordnetenhaus of Berlin')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('properties', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q257391'), ('oname', 'federal capital')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('properties', 'award received'), ('o', 'http://www.wikidata.org/entity/Q33234

50

I find interesting P8687, social media followers, so I will use it

Now I show the cities with the most social media followers through France & Germany

In [15]:
queryString = """
SELECT DISTINCT ?numFollowers ?city WHERE {
        values ?country {wd:Q183 wd:Q142}
        ?c  wdt:P31*/wdt:P279 wd:Q515 ; #path created to specifically find a city
            wdt:P8687 ?numFollowers  ;
            wdt:P17 ?country .
            
        # this returns the labels
        ?c <http://schema.org/name> ?city .
}

ORDER BY desc (?numFollowers)

"""
print("Results")
run_query(queryString)

Results
[('numFollowers', '260627'), ('city', 'Munich')]
[('numFollowers', '232696'), ('city', 'Bordeaux')]
[('numFollowers', '214132'), ('city', 'Frankfurt am Main')]
[('numFollowers', '201515'), ('city', 'Toulouse')]
[('numFollowers', '159032'), ('city', 'Marseille')]
[('numFollowers', '139002'), ('city', 'Nantes')]
[('numFollowers', '116784'), ('city', 'Berlin')]
[('numFollowers', '105857'), ('city', 'Strasbourg')]
[('numFollowers', '99902'), ('city', 'Nice')]
[('numFollowers', '75580'), ('city', 'Rennes')]
[('numFollowers', '65755'), ('city', 'Düsseldorf')]
[('numFollowers', '55654'), ('city', 'Rouen')]
[('numFollowers', '42931'), ('city', 'Angers')]
[('numFollowers', '42666'), ('city', 'Grenoble')]
[('numFollowers', '37968'), ('city', 'Lille')]
[('numFollowers', '37908'), ('city', 'Montpellier')]
[('numFollowers', '22528'), ('city', 'Hanover')]
[('numFollowers', '20332'), ('city', 'Nancy')]
[('numFollowers', '19812'), ('city', 'Caen')]
[('numFollowers', '17186'), ('city', 'Bonn')]

41

I think there can be some linking issue with Paris since it doesn't appear.