# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [None]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-09c3207aca-## 
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# GEO Workflow Series ("American Architects") 

Consider the following exploratory information need:

> You want to study the most prolific american architects

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate |
| `wdt:P17`     | country       | predicate |
| `wdt:P27`     | citizenship   | predicate |
| `wd:Q5604`    | Frank Lloyd Wright     | node      |
| `wd:Q30`      | U.S.A.        | node |
| `wd:Q42973`   | architect     | node |
| `wd:Q2081276` | Pettit Memorial Chapel        | node |
| `wd:Q2977`    | Cathedral     | node |



Also consider

```
?p wdt:P27 wd:Q30  . 
?p wdt:P106 wd:Q42973  . 
```

is the BGP to retrieve all **american architects**

## Workload Goals

1. Identify the BGP to obtain the notable works of an architect and their location

2. Identify the BGP that connect a building with its architect, type, and architectural style

3. Which architect is the most prolific in the U.S. and outside?

4. Analyze the location of the buildings designed by the american architects
 
   4.1 Which styles exists in the U.S.? Which type of building?
   
   4.2 In which U.S. states there are more notable buildings?
   
   4.3 Which U.S. state contains the largest number of building designed by U.S. architects


In [None]:
# start your workflow here

In [296]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P27 wd:Q30  . 
?p wdt:P106 wd:Q42973  . 

}
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '3354')]


1

In [312]:
# first we should find all architect name and check is there Frank Lloyd Wright in the dataset
queryString = """
SELECT DISTINCT  ?architect ?architectName 
WHERE { 
    #?architect wdt:P27 wd:Q30  .
    ?architect wdt:P106 wd:Q42973  . 
    
    # this returns the labels
    ?architect <http://schema.org/name> ?architectName .
    #Filter regex(?architectName,"Frank Lloyd Wright",'i')
} 
LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('architect', 'http://www.wikidata.org/entity/Q30346830'), ('architectName', 'Carlos Lamela')]
[('architect', 'http://www.wikidata.org/entity/Q5860205'), ('architectName', 'Fernando Salvador Carreras')]
[('architect', 'http://www.wikidata.org/entity/Q8194957'), ('architectName', 'Alexandre Azedo Lacerda')]
[('architect', 'http://www.wikidata.org/entity/Q8195105'), ('architectName', 'Alfons Milà i Sagnier')]
[('architect', 'http://www.wikidata.org/entity/Q8197980'), ('architectName', 'Amós Salvador Sáenz y Carreras')]


5

In [311]:
# we should extract all properties for a specific architect (ex. Frank Lloyd Wright -> wd:Q5604)
# in this code I find that the 'notebale work' ID -> P800
queryString = """
SELECT  DISTINCT ?p ?pName 
WHERE { 
     
    wd:Q5604 ?p  ?o.FILTER(!isLiteral(?o)).
    
    # this returns the labels
    ?p <http://schema.org/name> ?pName .
    Filter regex(?pName,"notable",'i')

} 
LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P800'), ('pName', 'notable work')]


1

In [313]:
# find all notable works and their country (location)
queryString = """
SELECT  DISTINCT ?work ?workName ?locationName
WHERE { 
     ?architect wdt:P106 wd:Q42973  ; 
                 wdt:P800  ?work.
    ?work wdt:P17 ?location.
    # this returns the labels
    ?work <http://schema.org/name> ?workName .
    ?location <http://schema.org/name> ?locationName 

} 
LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/Q1369314'), ('workName', 'Estadio Las Gaunas'), ('locationName', 'Spain')]
[('work', 'http://www.wikidata.org/entity/Q15885886'), ('workName', 'Torre Caja Badajoz'), ('locationName', 'Spain')]
[('work', 'http://www.wikidata.org/entity/Q24439272'), ('workName', 'Edificio Pirámide, Madrid'), ('locationName', 'Spain')]
[('work', 'http://www.wikidata.org/entity/Q26834201'), ('workName', 'Edificio Galaxia, Madrid'), ('locationName', 'Spain')]
[('work', 'http://www.wikidata.org/entity/Q27032920'), ('workName', 'Edificio de viviendas en Paseo de la Castellana 121-123, Madrid'), ('locationName', 'Spain')]


5

In [None]:
1. Identify the BGP to obtain the notable works of an architect and their location

In [320]:
# all notable works and locations for a specific architect name (ex. Frank Lloyd Wright)
queryString = """
SELECT  DISTINCT ?workName ?locationName ?architectName
WHERE { 
    ?architect wdt:P106 wd:Q42973  ; 
                 wdt:P800  ?work.
    ?work wdt:P17 ?location.
    # this returns the labels
    ?work <http://schema.org/name> ?workName .
    ?architect <http://schema.org/name> ?architectName .
    ?location <http://schema.org/name> ?locationName 
     Filter regex(?architectName,"Frank Lloyd Wright",'i')
}

"""

print("Results")
run_query(queryString)

Results
[('workName', 'Pettit Memorial Chapel'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]
[('workName', 'Solomon R. Guggenheim Museum'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]
[('workName', 'Fallingwater'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]
[('workName', 'A. D. German Warehouse'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]
[('workName', 'Annunciation Greek Orthodox Church'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]
[('workName', 'Emil Bach House'), ('locationName', 'United States of America'), ('architectName', 'Frank Lloyd Wright')]


6

In [321]:
#  choose a notable work (ex. Pettit Memorial Chapel -> wd:Q2081276) and get all properties
# i found  architect->P84, architectural style->P149, type(instance of)->P31
queryString = """
SELECT  DISTINCT ?p ?pName 
WHERE { 
     wd:Q2081276 ?p ?o
     .
    # this returns the labels
    ?p <http://schema.org/name> ?pName .

}

"""

print("Results")

run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pName', 'located in the administrative territorial entity')]
[('p', 'http://www.wikidata.org/prop/direct/P1435'), ('pName', 'heritage designation')]
[('p', 'http://www.wikidata.org/prop/direct/P149'), ('pName', 'architectural style')]
[('p', 'http://www.wikidata.org/prop/direct/P1566'), ('pName', 'GeoNames ID')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country')]
[('p', 'http://www.wikidata.org/prop/direct/P18'), ('pName', 'image')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('pName', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P5383'), ('pName', 'archINFORM project ID')]
[('p', 'http://www.wikidata.org/prop/direct/P646'), ('pName', 'Freebase ID')]
[('p', 'http://www.wikidata.org/prop/direct/P84'), ('pName', 'architect')]
[('p', 'http://www.wikidata.org/prop/direct/P910'), ('pName', "topic's main ca

14

In [None]:
2. Identify the BGP that connect a building with its architect, type, and architectural style

In [324]:
# we find architect->P84 , architectural style->P149, type(instance of)->P31 
#for  notable work (ex. Pettit Memorial Chapel -> wd:Q2081276)
queryString = """
SELECT  DISTINCT ?architectName ?typeName ?styleName
WHERE { 
     wd:Q2081276 wdt:P84 ?architect;
                 wdt:P149 ?style;
                 wdt:P31 ?type.
                 
     
    # this returns the labels
    ?architect <http://schema.org/name> ?architectName .
    ?style <http://schema.org/name> ?styleName .
    ?type <http://schema.org/name> ?typeName 

}

"""

print("Results")
run_query(queryString)

Results
[('architectName', 'Frank Lloyd Wright'), ('typeName', 'chapel'), ('styleName', 'modern architecture')]


1

In [None]:
3. Which architect is the most prolific in the U.S. and outside?

In [349]:
#get most prolific architect in the USA
queryString = """
SELECT  DISTINCT ?architectName  (COUNT(?work) AS ?howmanywork)
WHERE { 
    
     ?architect wdt:P106 wd:Q42973  ;
                wdt:P27 wd:Q30   ;
                wdt:P800  ?work.
    
    
    # this returns the labels
    ?architect <http://schema.org/name> ?architectName.

} 
GROUP BY ?architectName
ORDER BY DESC (?howmanywork)
LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('architectName', 'Bruce Price'), ('howmanywork', '17')]
[('architectName', 'Cornelia Oberlander'), ('howmanywork', '13')]
[('architectName', 'Egerton Swartwout'), ('howmanywork', '11')]
[('architectName', 'John Trumbull'), ('howmanywork', '10')]
[('architectName', 'Frank Gehry'), ('howmanywork', '9')]


5

In [367]:
#get most prolific architect outside the USA
queryString = """
SELECT  DISTINCT ?architectName  (COUNT(?work) AS ?howmanywork)
WHERE { 
       
     ?architect wdt:P106 wd:Q42973 ; 
                 wdt:P27 ?country  ;
                 wdt:P800  ?work.
    
    
    # this returns the labels
    
    ?architect <http://schema.org/name> ?architectName.
    ?country <http://schema.org/name> ?countryName. 
    FILTER NOT EXISTS {?architect wdt:P27 wd:Q30.}
} 
GROUP BY ?architectName
ORDER BY DESC (?howmanywork)
LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('architectName', 'Norman Foster'), ('howmanywork', '50')]
[('architectName', 'Zaha Hadid'), ('howmanywork', '38')]
[('architectName', 'Henrik Nissen'), ('howmanywork', '33')]
[('architectName', 'Friedrich Grünanger'), ('howmanywork', '32')]
[('architectName', 'Jelisaveta Načić'), ('howmanywork', '30')]


5

In [None]:
   4.1 Which styles exists in the U.S.?

In [370]:
#get all styles in the USA
queryString = """
SELECT  DISTINCT ?styleName  ?work
WHERE { 
    ?architect wdt:P106 wd:Q42973  ; 
                 wdt:P800  ?work.
                 
    ?work   wdt:P17 wd:Q30;
            wdt:P149 ?style.
            
    # this returns the labels
    ?style <http://schema.org/name> ?styleName.
    
} 

LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('styleName', 'Greek Revival architecture'), ('work', 'http://www.wikidata.org/entity/Q3584637')]
[('styleName', 'Stripped Classicism'), ('work', 'http://www.wikidata.org/entity/Q11208')]
[('styleName', 'Romanesque Revival in America'), ('work', 'http://www.wikidata.org/entity/Q74861306')]
[('styleName', 'Neoclassical architecture'), ('work', 'http://www.wikidata.org/entity/Q673076')]
[('styleName', 'Beaux-Arts'), ('work', 'http://www.wikidata.org/entity/Q4820259')]


5

In [None]:
  4.1 Which type of buildings exists in the U.S.? 

In [372]:
#get all type of buldings in the USA
queryString = """
SELECT DISTINCT ?typeName  
WHERE { 
     
    ?architect wdt:P106 wd:Q42973  ; 
                 wdt:P800  ?work.
                 
    ?work   wdt:P17 wd:Q30;
            wdt:P31 ?type.
            
    # this returns the labels
    ?type <http://schema.org/name> ?typeName.
    
} 

LIMIT 5
"""


print("Results")
run_query(queryString)

Results
[('typeName', 'government building')]
[('typeName', 'sculpture garden')]
[('typeName', 'headquarters')]
[('typeName', 'neighborhood')]
[('typeName', 'art museum')]


5

In [None]:
4.2 In which U.S. states there are more notable buildings?

In [388]:
# find Q35657 to get the name of state of each city
queryString = """
#SELECT  DISTINCT ?work ?locationName  ?countryName
SELECT  DISTINCT ?p  ?pName
WHERE { 
      
    ?architect wdt:P106 wd:Q42973  ; 
               wdt:P800  ?work.
               
    ?work wdt:P17 wd:Q30;
           wdt:P131 ?location.
           
    #?location wdt:P17 ?country.
    
    ?location ?p ?o.FILTER(!isLiteral(?o))
    
    # this returns the labels
    #?location <http://schema.org/name> ?locationName.
    #?country <http://schema.org/name> ?countryName.
    ?p <http://schema.org/name> ?pName.
    #Filter exists {?location wdt:P17 wd:Q30}
    
} 

LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P5353'), ('pName', 'school district')]
[('p', 'http://www.wikidata.org/prop/direct/P112'), ('pName', 'founded by')]
[('p', 'http://www.wikidata.org/prop/direct/P1151'), ('pName', "topic's main Wikimedia portal")]
[('p', 'http://www.wikidata.org/prop/direct/P131'), ('pName', 'located in the administrative territorial entity')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source')]


5

In [395]:
#get notable buildings in 
queryString = """
SELECT  DISTINCT  ?locationName  (COUNT(?countryName) AS ?howmanywork)
WHERE { 
      
    ?architect wdt:P106 wd:Q42973  ; 
               wdt:P800  ?work.
               
    ?work wdt:P17 wd:Q30;
           wdt:P131 ?location.
           
    ?location wdt:P131 ?country.
    
    
    # this returns the labels
    ?location <http://schema.org/name> ?locationName.
    ?country <http://schema.org/name> ?countryName.
    
} 
GROUP BY ?locationName
ORDER BY DESC (?howmanywork)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('locationName', 'Manhattan'), ('howmanywork', '79')]
[('locationName', 'Washington, D.C.'), ('howmanywork', '47')]
[('locationName', 'Chicago'), ('howmanywork', '32')]
[('locationName', 'California'), ('howmanywork', '26')]
[('locationName', 'Los Angeles'), ('howmanywork', '21')]


5

In [255]:
#get all 
queryString = """
SELECT  DISTINCT ?p ?pName  
WHERE { 
      
     ?architect wdt:P106 wd:Q42973  ; 
                 wdt:P800  ?work.
      ?work wdt:P17 wd:Q30;
           wdt:P276 ?s.
           
     ?s wdt:P17 ?p.
    
    # this returns the labels
    ?p <http://schema.org/name> ?pName.
    
    
} 

LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q30'), ('pName', 'United States of America')]
[('p', 'http://www.wikidata.org/entity/Q34'), ('pName', 'Sweden')]
[('p', 'http://www.wikidata.org/entity/Q55'), ('pName', 'Netherlands')]


3

In [None]:
4.3 Which U.S. state contains the largest number of building designed by U.S. architects