# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-1b1cee840b-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("European Cathedrals") 

Consider the following exploratory information need:

> Explore how many cathedrals exists in the U.K., Italy, and France and the information about them

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P17`     | country   | predicate |
| `wd:Q2977`    | Cathedral     | node      |
| `wd:Q29265`   | Canterbury Cathedral   | node |
| `wd:Q38`      | Italy         | node |
| `wd:Q142`     | France        | node |
| `wd:Q145`     | U.K.          | node |
| `wd:Q46261`   | Romanesque architecture | node |


Also consider

```
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q2977  . 
```

is the BGP to retrieve all **French cathedrals**

## Workload Goals

1. Identify the BGP to obtain relevant attributes of a cathedral, e.g., the inception or the height

2. Identify the BGP that connect each cathedreal to other relevant entities, e.g., architectural style and city

3. Which country hosts the largest number of cathedrals?

4. Analyze cathedrals across architectural styles in each city
 
   4.1 Which styles exists in each country, and how many cathedrals for each one?
   
   4.2 where is the largest cathedral, or the oldest?
   
   4.3 If you had to pick a city or a country to visit some cathedrals? Which one would you choose?  Based on what criteria?


In [24]:
# start your workflow here

In [3]:
# search all data properties of a cathedral node
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977 ;
        ?p ?o . FILTER(isLiteral(?o))
    ?p <http://schema.org/name> ?pname .
}
LIMIT 50
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1626'), ('pname', 'Thai cultural heritage ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5611'), ('pname', 'BeWeb church ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3449'), ('pname', 'NSW Heritage database ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3941'), ('pname', 'Israel Antiquities Authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2526'), ('pname', 'National Historic Sites of Canada ID')]
[('p', 'http://www.wikidata.org/prop/direct/P381'), ('pname', 'PCP reference number')]
[('p', 'http://www.wikidata.org/prop/direct/P6776'), ('pname', 'IMTL.org ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1453'), ('pname', 'catholic.ru ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3371'), ('pname', 'Observatoire du Patrimoine Religieux ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9637'), ('pname', 'Erfgoedkaart ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1004'), ('pname', 'MusicBrainz place I

50

Here we have all the attributes (data properties).

In [4]:
#search all properies that connect cathedral to other nodes
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977 ;
        ?p ?o . FILTER(!isLiteral(?o))
    ?p <http://schema.org/name> ?pname .
}
LIMIT 200
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P8517'), ('pname', 'view')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pname', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P112'), ('pname', 'founded by')]
[('p', 'http://www.wikidata.org/prop/direct/P127'), ('pname', 'owned by')]
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pname', 'located in the administrative territorial entity')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pname', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P1366'), ('pname', 'replaced by')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pname', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P140'), ('pname', 'religion')]
[('p', 'http://www.wikidata.org/prop/direct/P1435'), ('pname', 'heritage designation')]
[('p', 'http://www.wikidata.org/prop/direct/P149'), ('pname', 'architectural style')]
[('p', 'http://www.wikidata.org/prop/direct/P1621'), ('pname', 'detail map')]
[(

107

Here we have all the object (other entities) connected to a node that is an instance of 'cathedral'.

In [5]:
#search all cathedral with the respective country
queryString = """
SELECT DISTINCT ?ch ?coname
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P17 ?co.
    ?co <http://schema.org/name> ?coname
}
LIMIT 50
"""
print("Results")
run_query(queryString)

Results
[('ch', 'http://www.wikidata.org/entity/Q38251739'), ('coname', 'Greenland')]
[('ch', 'http://www.wikidata.org/entity/Q4868422'), ('coname', 'Dominican Republic')]
[('ch', 'http://www.wikidata.org/entity/Q20921731'), ('coname', 'Dominican Republic')]
[('ch', 'http://www.wikidata.org/entity/Q4868419'), ('coname', 'Dominican Republic')]
[('ch', 'http://www.wikidata.org/entity/Q5758519'), ('coname', 'Dominican Republic')]
[('ch', 'http://www.wikidata.org/entity/Q16399765'), ('coname', 'Republic of Artsakh')]
[('ch', 'http://www.wikidata.org/entity/Q63385248'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/entity/Q63400454'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/entity/Q11466397'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/entity/Q63401277'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/entity/Q63400348'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/entity/Q63385297'), ('coname', 'Gabon')]
[('ch', 'http://www.wikidata.org/enti

50

In [6]:
#group results for country
queryString = """
SELECT DISTINCT (COUNT(?ch) AS ?howManyChatedrals) ?coname
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P17 ?co.
    ?co <http://schema.org/name> ?coname
}
GROUP BY ?coname
LIMIT 100
"""
print("Results")
run_query(queryString)

Results
[('howManyChatedrals', '3'), ('coname', 'Mozambique')]
[('howManyChatedrals', '2'), ('coname', 'Uganda')]
[('howManyChatedrals', '8'), ('coname', 'Slovenia')]
[('howManyChatedrals', '10'), ('coname', 'South Korea')]
[('howManyChatedrals', '6'), ('coname', 'Burundi')]
[('howManyChatedrals', '1'), ('coname', 'Uzbekistan')]
[('howManyChatedrals', '2'), ('coname', 'Eritrea')]
[('howManyChatedrals', '1'), ('coname', 'Samoa')]
[('howManyChatedrals', '1'), ('coname', 'Marshall Islands')]
[('howManyChatedrals', '1'), ('coname', 'East Jerusalem')]
[('howManyChatedrals', '3'), ('coname', 'Cameroon')]
[('howManyChatedrals', '6'), ('coname', 'Czech Republic')]
[('howManyChatedrals', '10'), ('coname', 'Cuba')]
[('howManyChatedrals', '1'), ('coname', 'Algeria')]
[('howManyChatedrals', '35'), ('coname', 'Poland')]
[('howManyChatedrals', '24'), ('coname', 'Australia')]
[('howManyChatedrals', '7'), ('coname', 'New Zealand')]
[('howManyChatedrals', '12'), ('coname', 'Uruguay')]
[('howManyChatedr

100

In [7]:
#select the country with more cathedrals
queryString = """
SELECT DISTINCT (COUNT(?ch) AS ?howManyChatedrals) ?coname
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P17 ?co.
    ?co <http://schema.org/name> ?coname
}
GROUP BY ?coname
ORDER BY DESC (COUNT(?ch))
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('howManyChatedrals', '293'), ('coname', 'Italy')]


1

Italy is the country with the highest number of cathedral.

In [8]:
queryString = """
SELECT ?country ?style (COUNT(?ch) AS ?howManyChatedrals)
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P17 ?co.
    ?ch wdt:P149 ?s.
    ?co <http://schema.org/name> ?country.
    ?s <http://schema.org/name> ?style.
}
GROUP BY ?country ?style
LIMIT 100
"""
print("Results")
run_query(queryString)

Results
[('country', 'Philippines'), ('style', 'Neoclassical architecture'), ('howManyChatedrals', '3')]
[('country', 'Colombia'), ('style', 'Neoclassical architecture'), ('howManyChatedrals', '7')]
[('country', 'Germany'), ('style', 'modern architecture'), ('howManyChatedrals', '1')]
[('country', 'Slovenia'), ('style', 'Gothic architecture'), ('howManyChatedrals', '2')]
[('country', 'France'), ('style', 'Gothic architecture'), ('howManyChatedrals', '21')]
[('country', 'The Gambia'), ('style', 'Gothic Revival'), ('howManyChatedrals', '1')]
[('country', 'Austria'), ('style', 'Gothic Revival'), ('howManyChatedrals', '1')]
[('country', 'Ukraine'), ('style', 'baroque architecture'), ('howManyChatedrals', '3')]
[('country', 'Uruguay'), ('style', 'Armenian architecture'), ('howManyChatedrals', '2')]
[('country', 'Mexico'), ('style', 'rationalism'), ('howManyChatedrals', '1')]
[('country', 'Croatia'), ('style', 'Byzantine architecture'), ('howManyChatedrals', '1')]
[('country', 'Italy'), ('st

100

We found the number of cathedral grouped by country and architectural style. I found the name of the properties I need in the previus point where we seach all the object properties connected to cathedral. In this case the property was 'wdt:P149'(we already have the name of the property 'country').

In [9]:
# search the largest cathedral
queryString = """
SELECT ?cname ?area
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P2046 ?area.
    ?ch <http://schema.org/name> ?cname.
  
}
ORDER BY DESC (?area)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('cname', 'Roman Catholic Diocese of Yendi'), ('area', '19160')]


1

This cathedral is the largest because we search the one with the greatest area. The property 'area' founded in the previus point where we seach all the data properties connected to cathedral.

In [10]:
#search the oldest cathedral
queryString = """
SELECT ?cname ?date
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P1619 ?date.
    ?ch <http://schema.org/name> ?cname

}
ORDER BY ASC (?date)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('cname', 'Cathedral of Ani'), ('date', '1001-01-01T00:00:00Z')]


1

This is the oldest cathedral in the database.

In italy we already find that there are the highest number of cathedral so if I have to choose a country for visit some cathedral I will choose it.

In [11]:
#city with the highest number of cathedral
queryString = """
SELECT (COUNT(?cname) AS ?howmanycathedral) ?cityname
WHERE {
    ?ch wdt:P31 wd:Q2977.
    ?ch wdt:P276 ?city.
    ?ch <http://schema.org/name> ?cname.
    ?city <http://schema.org/name> ?cityname.
}
GROUP BY ?cityname
ORDER BY DESC (?howmanycathedral)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('howmanycathedral', '4'), ('cityname', 'Sé')]


1

So at the end we found that in 'Sè' there are the highest number of cathedral.