# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-1f86b111ac-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("Place of Birth, Death, and Burial") 

Consider the following exploratory information need:

> You want to visit cities connected to famous writers and poets, and you are deciding wether to visit France or Germany

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wd:Q183`     | Germany       | node |
| `wd:Q142`     | France        | node |
| `wd:Q90`      | Paris         | node |
| `wd:Q49757`   | Poet          | node |
| `wd:Q36180`   | Writer        | node |
| `wd:Q501`     | Charles Baudelaire  | node      |
| `wd:Q272208`  | Montparnasse Cemetery       | node |


Also consider

```
?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
```

is the BGP to retrieve all **French writers**

## Workload Goals

1. Identify the BGP that connect people to their place of birth or place, death, or burial

2. Identify the BGP to obtain the country in which a place is located

3. How many poets and writers  have a place of birth, death, or burial in Germany and France?

4. Analyze cities across the two countries
 
   4.1 Is there any poet for which the birth place and the place of burial are located in the same city either in Germany or France?
   
   4.2 Which cities host the place of birth of the larger number of poets or writers across the two countries?
   
   4.3 What are the top 3 cities in each country that you could visit? Based on what criteria?


In [2]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
} 
GROUP BY ?cult  ?arch
"""

print("Predicates")
run_query(queryString)

Predicates
[('callret-0', '14400')]


1

***

# Task 1

*Identify the BGP that connect people to their place of birth or place, death, or burial*

Find, one at the time, what is required by the task through regex

In [6]:
# identify the place of burial property
queryString = """
SELECT ?x ?name
WHERE { 

wd:Q501 ?x wd:Q272208.
 ?x <http://schema.org/name> ?name .
 filter regex(?name, "burial")
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P119'), ('name', 'place of burial')]


1

In [7]:
# identify DOB and DOD
queryString = """
SELECT distinct ?x ?name
WHERE { 

wd:Q501 ?x ?y.
 ?x <http://schema.org/name> ?name .
 filter regex(?name, "birth")
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P19'), ('name', 'place of birth')]
[('x', 'http://www.wikidata.org/prop/direct/P569'), ('name', 'date of birth')]


2

In [2]:
# identify DOB and DOD
queryString = """
SELECT distinct ?x ?name
WHERE { 

wd:Q501 ?x ?y.
 ?x <http://schema.org/name> ?name .
 filter regex(?name, "death")
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P1196'), ('name', 'manner of death')]
[('x', 'http://www.wikidata.org/prop/direct/P20'), ('name', 'place of death')]
[('x', 'http://www.wikidata.org/prop/direct/P509'), ('name', 'cause of death')]
[('x', 'http://www.wikidata.org/prop/direct/P570'), ('name', 'date of death')]


4

***

# Task 2

*Identify the BGP to obtain the country in which a place is located*

Using the nodes provided in the introduction.

In [11]:
# ANSWER to 2) identify the property that identify the country where a place is located
queryString = """
SELECT distinct ?x ?name
WHERE { 
 wd:Q272208 ?x wd:Q142 .
 ?x <http://schema.org/name> ?name .
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('x', 'http://www.wikidata.org/prop/direct/P495'), ('name', 'country of origin')]


2

***

# Task 3

*How many poets and writers have a place of birth, death, or burial in Germany and France?*

We use what we gathered from the previous query

In [12]:
# ANSWER to 3) How many poets and writers have a place of birth, death, or burial in Germany and France?
queryString = """
SELECT (count(distinct ?x)as ?xx) ?name
WHERE { 
  ?x wdt:P106 ?y.
  ?x wdt:P27 ?country .
  ?x wdt:P19|wdt:P20|wdt:P119 ?z .
   ?country <http://schema.org/name> ?name .
  filter(?country=wd:Q183 || ?country=wd:Q142)
  filter(?y=wd:Q36180 || ?y=wd:Q49757)
} group by ?name
"""

print("Predicates")
run_query(queryString)

Predicates
[('xx', '14014'), ('name', 'France')]
[('xx', '19862'), ('name', 'Germany')]


2

***

# Task 4.1

*Is there any poet for which the birth place and the place of burial are located in the same city either in Germany or France?*

First I want to check if the property about the place of birth returns a city, at least for Baudelaire or in general some people. 

In [14]:
# Is returned a city by the property place of birth?
queryString = """
SELECT ?city ?name
WHERE { 
  wd:Q501 wdt:P19 ?city .
   ?city <http://schema.org/name> ?name .
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('city', 'http://www.wikidata.org/entity/Q90'), ('name', 'Paris')]


1

In [2]:
# ANSWER to 4) 
queryString = """
ASK WHERE{
SELECT ?p 
WHERE {
  ?p wdt:P19 ?city .
  ?p wdt:P119 ?city .
  ?p wdt:P106 wd:Q49757.
  ?p wdt:P27 ?country .
  filter(?country=wd:Q183 || ?country=wd:Q142)

  }
}

"""

print("Predicates")
run_ask_query(queryString)

Predicates


{'head': {'link': []}, 'boolean': True}

***

# Task 4.2

*Which cities host the place of birth of the larger number of poets or writers across the two countries?*



In [26]:
# ANSWER to 5) 
queryString = """
SELECT (count(distinct ?x)as ?xx) ?name
WHERE { 
  ?x wdt:P106 ?y.
  ?x wdt:P27 ?country .
  ?x wdt:P19 ?city .
  filter(?country=wd:Q183 || ?country=wd:Q142)
  filter(?y=wd:Q36180 || ?y=wd:Q49757)
 ?city <http://schema.org/name> ?name .
} group by ?name
order by desc (?xx)
limit 10
"""

print("Predicates")
run_query(queryString)

Predicates
[('xx', '2260'), ('name', 'Paris')]
[('xx', '1243'), ('name', 'Berlin')]
[('xx', '594'), ('name', 'Hamburg')]
[('xx', '540'), ('name', 'Munich')]
[('xx', '321'), ('name', 'Cologne')]
[('xx', '318'), ('name', 'Leipzig')]
[('xx', '285'), ('name', 'Frankfurt am Main')]
[('xx', '260'), ('name', 'Stuttgart')]
[('xx', '241'), ('name', 'Lyon')]
[('xx', '237'), ('name', 'Marseille')]


10

# Random exploration 

I realized that in the given nodes is given a cemetery. The problem is that until now I assumed that the "place of burial" would have been a city, but it probably isn't. So I want to check if the place can be connected to the city.

In [3]:
# START EXPLORATION: Even thought the cemetry is in Paris, is not directly linked to it 
queryString = """
SELECT ?x
WHERE { 
  wd:Q272208 ?x wd:Q90
  
} 
"""

print("Predicates")
run_query(queryString)

Predicates
Empty


0

In [5]:
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q272208 ?x ?y .
  ?x <http://schema.org/name> ?name .
} 
"""

print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P131'), ('name', 'located in the administrative territorial entity')]
[('x', 'http://www.wikidata.org/prop/direct/P1329'), ('name', 'phone number')]
[('x', 'http://www.wikidata.org/prop/direct/P1566'), ('name', 'GeoNames ID')]
[('x', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('x', 'http://www.wikidata.org/prop/direct/P1766'), ('name', 'place name sign')]
[('x', 'http://www.wikidata.org/prop/direct/P1791'), ('name', 'category of people buried here')]
[('x', 'http://www.wikidata.org/prop/direct/P18'), ('name', 'image')]
[('x', 'http://www.wikidata.org/prop/direct/P2025'), ('name', 'Find A Grave cemetery ID')]
[('x', 'http://www.wikidata.org/prop/direct/P2046'), ('name', 'area')]
[('x', 'http://www.wikidata.org/prop/direct/P281'), ('name', 'postal code')]
[('x', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('x', 'http://www.wikidata.org/prop/direct/P3182'), ('name', 'FANTOIR code')]
[('

27

In [9]:
# trying to match for location or located
queryString = """
SELECT ?name ?x
WHERE { 
  wd:Q272208 ?x ?y .
  ?x <http://schema.org/name> ?name .
  filter regex(?name, "loca")
  
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'located in the administrative territorial entity'), ('x', 'http://www.wikidata.org/prop/direct/P131')]
[('name', 'located on street'), ('x', 'http://www.wikidata.org/prop/direct/P669')]


2

In [20]:
# Following P131
queryString = """
SELECT ?name ?p
WHERE { 
  wd:Q272208 wdt:P131 ?p .
   ?p <http://schema.org/name> ?name .
  
} 
"""
print("Predicates")

run_query(queryString)

Predicates
[('name', '14th arrondissement of Paris'), ('p', 'http://www.wikidata.org/entity/Q187153')]


1

In [13]:
# is the administrative territorial entity related to the city and by what
queryString = """
SELECT ?name ?p
WHERE { 
  wd:Q187153 ?p wd:Q90 .
   ?p <http://schema.org/name> ?name .
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'located in the administrative territorial entity'), ('p', 'http://www.wikidata.org/prop/direct/P131')]


1

In [17]:
# Find by what a city is related to its country
queryString = """
SELECT ?x ?name
WHERE { 
  wd:Q90 ?x wd:Q142 .
   ?x <http://schema.org/name> ?name .
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P1376'), ('name', 'capital of')]
[('x', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]


2

In [24]:
# Find the cemetery node
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q272208 ?x ?z .
   ?x <http://schema.org/name> ?name .
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/prop/direct/P131'), ('name', 'located in the administrative territorial entity')]
[('x', 'http://www.wikidata.org/prop/direct/P1329'), ('name', 'phone number')]
[('x', 'http://www.wikidata.org/prop/direct/P1566'), ('name', 'GeoNames ID')]
[('x', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('x', 'http://www.wikidata.org/prop/direct/P1766'), ('name', 'place name sign')]
[('x', 'http://www.wikidata.org/prop/direct/P1791'), ('name', 'category of people buried here')]
[('x', 'http://www.wikidata.org/prop/direct/P18'), ('name', 'image')]
[('x', 'http://www.wikidata.org/prop/direct/P2025'), ('name', 'Find A Grave cemetery ID')]
[('x', 'http://www.wikidata.org/prop/direct/P2046'), ('name', 'area')]
[('x', 'http://www.wikidata.org/prop/direct/P281'), ('name', 'postal code')]
[('x', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('x', 'http://www.wikidata.org/prop/direct/P3182'), ('name', 'FANTOIR code')]
[('

27

So basically the cemetery is not directly in Paris but is in an administrative location that is located in an administrative location that is located in Paris. This is a problem because maybe in other countries or small city "place of burial" can be:
1) directly linked to a city;
2) Indirectly linked to a city but through another path.

In [6]:
# Find the cemetery node
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q272208 wdt:P31|wdt:P279 ?x .
   ?x <http://schema.org/name> ?name .
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q2972684'), ('name', 'Parisian cemetery')]


1

In [5]:
# Find if is instance of something a little bit more general
queryString = """
SELECT distinct ?x ?name
WHERE { 
    wd:Q2972684 wdt:P31|wdt:P279 ?x .
   ?x <http://schema.org/name> ?name .
} 
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q39614'), ('name', 'cemetery')]


1

In [7]:
# Other than Parisian cemetery there are other type of cemetery?
queryString = """
SELECT distinct ?x ?name
WHERE { 
    ?x wdt:P31|wdt:P279 wd:Q39614.
   ?x <http://schema.org/name> ?name .
} 
limit 100
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q13637728'), ('name', 'Imprisoned Graves')]
[('x', 'http://www.wikidata.org/entity/Q19879396'), ('name', 'St Kilda Cemetery')]
[('x', 'http://www.wikidata.org/entity/Q15726482'), ('name', 'Alter Friedhof (Karlsruhe)')]
[('x', 'http://www.wikidata.org/entity/Q15730201'), ('name', 'Jüdischer Friedhof in Neumarkt in der Oberpfalz')]
[('x', 'http://www.wikidata.org/entity/Q15916295'), ('name', "St. Raphael's Catholic Cemetery")]
[('x', 'http://www.wikidata.org/entity/Q15918892'), ('name', "Tianyi's Tomb")]
[('x', 'http://www.wikidata.org/entity/Q15929006'), ('name', 'Tomb of Ge Yunfei')]
[('x', 'http://www.wikidata.org/entity/Q23930654'), ('name', 'Friedhof Kessenich')]
[('x', 'http://www.wikidata.org/entity/Q98484105'), ('name', 'Friedhof')]
[('x', 'http://www.wikidata.org/entity/Q107404053'), ('name', 'Right Bank Cemetery, Saint-Dié-des-Vosges')]
[('x', 'http://www.wikidata.org/entity/Q16237853'), ('name', 'Lawnview Memorial Park')]
[('x'

100

Another dead end. Going back to the city of Paris, maybe I can find a path that connects it to a class "city" and work from there

In [6]:
# What is Paris
queryString = """
SELECT distinct ?x ?name
WHERE { 
   wd:Q90 wdt:P31|wdt:P279 ?x.
   ?x <http://schema.org/name> ?name .
} 
limit 100
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q5119'), ('name', 'capital')]
[('x', 'http://www.wikidata.org/entity/Q22923920'), ('name', 'territorial collectivity of France with special status')]


2

In [10]:
# What is a Capital
queryString = """
SELECT distinct ?x ?name
WHERE { 
   wd:Q5119 wdt:P279 ?x.
   ?x <http://schema.org/name> ?name .
} 
limit 100
"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q486972'), ('name', 'human settlement')]
[('x', 'http://www.wikidata.org/entity/Q2097994'), ('name', 'municipal corporation')]


2

I can consider human settlement as my idea of "city"

In [20]:
# Create a BGP to go from the cemetery to the human settlment
queryString = """
SELECT distinct ?x ?name
WHERE { 
    wd:Q272208 wdt:P131* ?x .
    ?x (wdt:P31*|wdt:P279)/wdt:P279 wd:Q486972 .
   ?x <http://schema.org/name> ?name .
} 

"""
print("Predicates")
run_query(queryString)

Predicates
[('x', 'http://www.wikidata.org/entity/Q90'), ('name', 'Paris')]


1

I knew that a certain number of P131 will bring me to the human settlment, and that from here I have to use P31 or P279 to get to a class that is subclass of human settlment.

Let's go back to the topic
***

# Task 4.3

*What are the top 3 cities in each country that you could visit? Based on what criteria?*

Let's explore a bit and think about features of places that could make a place worthy of a visit.

In [34]:
# Top 3 german city by births 
queryString = """
SELECT ?name (count(?x) as ?n)
WHERE { 
  ?x wdt:P106 ?y.
  ?city wdt:P17 wd:Q183.
  ?x wdt:P19 ?city .
  ?city <http://schema.org/name> ?name .
  filter(?y=wd:Q36180 || ?y=wd:Q49757)
  
} group by ?name
order by desc(?n)
limit 3

"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'Berlin'), ('n', '2015')]
[('name', 'Hamburg'), ('n', '896')]
[('name', 'Munich'), ('n', '819')]


3

Usually more than an half of the properties are about IDs so I started filtering the word just to have less properties to chek and make everything more readable

In [8]:
# properties of Baudelaire
queryString = """
SELECT distinct ?name ?p
WHERE { 
 wd:Q501 ?p ?x .
  ?p <http://schema.org/name> ?name .
  filter(!regex(?name,"ID"))
}
"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'field of work'), ('p', 'http://www.wikidata.org/prop/direct/P101')]
[('name', 'native language'), ('p', 'http://www.wikidata.org/prop/direct/P103')]
[('name', 'occupation'), ('p', 'http://www.wikidata.org/prop/direct/P106')]
[('name', 'signature'), ('p', 'http://www.wikidata.org/prop/direct/P109')]
[('name', 'Library of Congress Classification'), ('p', 'http://www.wikidata.org/prop/direct/P1149')]
[('name', 'place of burial'), ('p', 'http://www.wikidata.org/prop/direct/P119')]
[('name', 'manner of death'), ('p', 'http://www.wikidata.org/prop/direct/P1196')]
[('name', 'described by source'), ('p', 'http://www.wikidata.org/prop/direct/P1343')]
[('name', 'genre'), ('p', 'http://www.wikidata.org/prop/direct/P136')]
[('name', 'religion'), ('p', 'http://www.wikidata.org/prop/direct/P140')]
[('name', 'languages spoken, written or signed'), ('p', 'http://www.wikidata.org/prop/direct/P1412')]
[('name', "topic's main template"), ('p', 'http://www.wikidata.org/prop/direct/P1

53

The criteria I chose is the number of notable works produced by people born in the city. It is safe to say that ther place where an author is born would probably have something more to visit about the author if we compare it to the place where he/her died. 

Obviously this is a rule that I think make sense in general but is not always correct. We have a clear exception of the rule here in Padua for example.

In addition I considered the notable works because famous people can be more or less famous and the notable works produced could give a rough ranking.

In [37]:
# ANSWER to 6 (part 1) criteria is "cities by numer of notable works produced by people born in it "
queryString = """
SELECT ?name (count(distinct ?notable) as ?n)
WHERE { 
  ?city wdt:P17 wd:Q183.
  ?p wdt:P19 ?city .
  ?p wdt:P106 ?y .
  ?p wdt:P800 ?notable . 
  ?city <http://schema.org/name> ?name .
} group by ?name
order by desc (?n)
limit 3
"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'Berlin'), ('n', '373')]
[('name', 'Hamburg'), ('n', '141')]
[('name', 'Munich'), ('n', '135')]


3

In [38]:
# ANSWER to 6 (part 2)
queryString = """
SELECT ?name (count(distinct ?notable) as ?n)
WHERE { 
  ?city wdt:P17 wd:Q142.
  ?p wdt:P19 ?city .
  ?p wdt:P106 ?y .
  ?p wdt:P800 ?notable . 
  ?city <http://schema.org/name> ?name .
} group by ?name
order by desc (?n)
limit 3
"""
print("Predicates")
run_query(queryString)

Predicates
[('name', 'Paris'), ('n', '1752')]
[('name', 'Lyon'), ('n', '208')]
[('name', 'Nantes'), ('n', '126')]


3