# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("Place of Birth, Death, and Burial") 

Consider the following exploratory information need:

> You want to visit cities connected to famous writers and poets, and you are deciding wether to visit France or Germany

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wd:Q183`     | Germany       | node |
| `wd:Q142`     | France        | node |
| `wd:Q90`      | Paris         | node |
| `wd:Q49757`   | Poet          | node |
| `wd:Q36180`   | Writer        | node |
| `wd:Q501`     | Charles Baudelaire  | node      |
| `wd:Q272208`  | Montparnasse Cemetery       | node |


Also consider

```
?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
```

is the BGP to retrieve all **French writers**

## Workload Goals

1. Identify the BGP that connect people to their place of birth or place, death, or burial

2. Identify the BGP to obtain the country in which a place is located

3. How many poets and writers  have a place of birth, death, or burial in Germany and France?

4. Analyze cities across the two countries
 
   4.1 Is there any poet for which the birth place and the place of burial are located in the same city either in Germany or France?
   
   4.2 Which cities host the place of birth of the larger number of poets or writers across the two countries?
   
   4.3 What are the top 3 cities in each country that you could visit? Based on what criteria?


In [3]:
# start your workflow here

In [2]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P27 wd:Q142  . 
?p wdt:P106 wd:Q36180  . 
} 
GROUP BY ?cult  ?arch
"""

print("Predicates")
run_query(queryString)

Predicates
[('callret-0', '14400')]


1