# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("European Cathedrals") 

Consider the following exploratory information need:

> Explore how many cathedrals exists in the U.K., Italy, and France and the information about them

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P17`     | country   | predicate |
| `wd:Q2977`    | Cathedral     | node      |
| `wd:Q29265`   | Canterbury Cathedral   | node |
| `wd:Q38`      | Italy         | node |
| `wd:Q142`     | France        | node |
| `wd:Q145`     | U.K.          | node |
| `wd:Q46261`   | Romanesque architecture | node |


Also consider

```
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q2977  . 
```

is the BGP to retrieve all **French cathedrals**

## Workload Goals

1. Identify the BGP to obtain relevant attributes of a cathedral, e.g., the inception or the height

2. Identify the BGP that connect each cathedreal to other relevant entities, e.g., architectural style and city

3. Which country hosts the largest number of cathedrals?

4. Analyze cathedrals across architectural styles in each city
 
   4.1 Which styles exists in each country, and how many cathedrals for each one?
   
   4.2 where is the largest cathedral, or the oldest?
   
   4.3 If you had to pick a city or a country to visit some cathedrals? Which one would you choose?  Based on what criteria?


In [2]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q2977  . 

} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '87')]


1

### 1. Identify the BGP to obtain relevant attributes of a cathedral, e.g., the inception or the height

In [4]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977 ;
        ?p ?o . FILTER(isLiteral(?o))
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*[H|h]eight.*") || REGEX(?pname, ".*[I|i]nception.*"))
}
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P571'), ('pname', 'inception')]
[('p', 'http://www.wikidata.org/prop/direct/P2048'), ('pname', 'height')]


2

### 2. Identify the BGP that connect each cathedreal to other relevant entities, e.g., architectural style and city

In [5]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977;
        ?p ?o . FILTER(isLiteral(?o))
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*ID.*"))
}
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1626'), ('pname', 'Thai cultural heritage ID')]
[('p', 'http://www.wikidata.org/prop/direct/P5611'), ('pname', 'BeWeb church ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3449'), ('pname', 'NSW Heritage database ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3941'), ('pname', 'Israel Antiquities Authority ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2526'), ('pname', 'National Historic Sites of Canada ID')]
[('p', 'http://www.wikidata.org/prop/direct/P6776'), ('pname', 'IMTL.org ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1453'), ('pname', 'catholic.ru ID')]
[('p', 'http://www.wikidata.org/prop/direct/P3371'), ('pname', 'Observatoire du Patrimoine Religieux ID')]
[('p', 'http://www.wikidata.org/prop/direct/P9637'), ('pname', 'Erfgoedkaart ID')]
[('p', 'http://www.wikidata.org/prop/direct/P1004'), ('pname', 'MusicBrainz place ID')]


10

In [6]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977;
        ?p ?o . FILTER(!isLiteral(?o))
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*[S|s]tyle.*"))
}
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P149'), ('pname', 'architectural style')]


1

In [7]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977;
        ?p ?o . FILTER(!isLiteral(?o))
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*[L|l]oc.*|.*[C|c]ity.*"))
}
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pname', 'located in the administrative territorial entity')]
[('p', 'http://www.wikidata.org/prop/direct/P242'), ('pname', 'locator map image')]
[('p', 'http://www.wikidata.org/prop/direct/P276'), ('pname', 'location')]
[('p', 'http://www.wikidata.org/prop/direct/P5607'), ('pname', 'located in the ecclesiastical territorial entity')]
[('p', 'http://www.wikidata.org/prop/direct/P669'), ('pname', 'located on street')]
[('p', 'http://www.wikidata.org/prop/direct/P1943'), ('pname', 'location map')]
[('p', 'http://www.wikidata.org/prop/direct/P421'), ('pname', 'located in time zone')]
[('p', 'http://www.wikidata.org/prop/direct/P706'), ('pname', 'located in/on physical feature')]
[('p', 'http://www.wikidata.org/prop/direct/P159'), ('pname', 'headquarters location')]


9

### 3. Which country hosts the largest number of cathedrals?

In [8]:
queryString = """
SELECT COUNT(*) ?country
WHERE {
    ?p wdt:P31 wd:Q2977.
    ?p wdt:P17 ?c.
    ?c sc:name ?country
}
GROUP BY ?country
ORDER BY DESC (COUNT(*))
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('callret-0', '293'), ('country', 'Italy')]


1

### 4.1 Which styles exists in each country, and how many cathedrals for each one?
I will use architertural style obtained in 1. P149

In [9]:
queryString = """
SELECT DISTINCT ?country ?arch COUNT(*)
WHERE {
    ?p wdt:P31 wd:Q2977.
    ?p wdt:P17 ?c.
    ?c sc:name ?country.
    ?p wdt:P149 ?a.
    ?a sc:name ?arch.
}
GROUP BY ?country ?arch
ORDER BY ?country
"""
print("Results")
run_query(queryString)

Results
[('country', 'Albania'), ('arch', 'Byzantine Revival architecture'), ('callret-2', '1')]
[('country', 'Albania'), ('arch', 'Romanesque Revival architecture'), ('callret-2', '1')]
[('country', 'Albania'), ('arch', 'Byzantine art'), ('callret-2', '1')]
[('country', 'Algeria'), ('arch', 'Islamic architecture'), ('callret-2', '1')]
[('country', 'Angola'), ('arch', 'modern architecture'), ('callret-2', '1')]
[('country', 'Angola'), ('arch', 'Neoclassical architecture'), ('callret-2', '1')]
[('country', 'Angola'), ('arch', 'Gothic Revival'), ('callret-2', '1')]
[('country', 'Argentina'), ('arch', 'Neoclassical architecture'), ('callret-2', '2')]
[('country', 'Argentina'), ('arch', 'Gothic Revival'), ('callret-2', '7')]
[('country', 'Argentina'), ('arch', 'Gothic architecture'), ('callret-2', '1')]
[('country', 'Argentina'), ('arch', 'neoclassicism'), ('callret-2', '1')]
[('country', 'Argentina'), ('arch', 'Baroque Revival architecture'), ('callret-2', '1')]
[('country', 'Argentina'),

475

### 4.2 where is the largest cathedral, or the oldest?

In [10]:
queryString = """
SELECT ?name ?country ?i
WHERE {
    ?p wdt:P31 wd:Q2977.
    ?p sc:name ?name.
    ?p wdt:P17 ?c.
    ?c sc:name ?country.
    ?p wdt:P571 ?i.
}
ORDER BY ASC (?i)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('name', 'Etchmiadzin Cathedral'), ('country', 'Armenia'), ('i', '0303-01-01T00:00:00Z')]


1

In [11]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977 ;
        ?p ?o . FILTER(isLiteral(?o))
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*[L|l]ength.*"))
}
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P2043'), ('pname', 'length')]


1

In [12]:
queryString = """
SELECT ?name ?country ?l
WHERE {
    ?p wdt:P31 wd:Q2977.
    ?p sc:name ?name.
    ?p wdt:P17 ?c.
    ?c sc:name ?country.
    ?p wdt:P2043 ?l.
}
ORDER BY DESC (?l)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('name', 'Mary, Queen of the World Cathedral'), ('country', 'Canada'), ('l', '10150')]


1

### 4.3 If you had to pick a city or a country to visit some cathedrals? Which one would you choose?  Based on what criteria?
I will choose the one with the highest maximum capacity because it is supposed to be big. So let's see what it is

In [13]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?c wdt:P31 wd:Q2977 ;
        ?p ?o .
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*capacity.*"))
}
   
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1083'), ('pname', 'maximum capacity')]


1

In [14]:
queryString = """
SELECT ?name ?location ?capacity
WHERE {
    ?p wdt:P31 wd:Q2977.
    ?p sc:name ?name.
    ?p wdt:P1083 ?capacity.
}
ORDER BY DESC (?capacity)
LIMIT 1
"""
print("Results")
run_query(queryString)

Results
[('name', "St. Michael's Cathedral"), ('capacity', '12000')]


1