# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [2]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Companies Workflow Series ("Business People in Germany") 

Consider the following exploratory information need:

> You are investigating important business people in Germany

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`      | country       | predicate |
| `wdt:P27`      | citizenship   | predicate |
| `wdt:P106`     | occupation   | predicate |
| `wd:Q183`      | Germany      | node      |
| `wd:Q2462658`  | manager      | node      |
| `wd:Q40966`    | Opel         | node      |
| `wd:Q56509715` | Michael Lohscheller | node |
| `wd:Q57479`    | Adam Opel    | node |



Also consider

```

?p wdt:P27 wd:Q183  . 
?p wdt:P106 wd:Q2462658  . 

```

is the BGP to retrieve all **german managers**

## Workload Goals

1. Identify the BGP for obtaining C.E.O., managers, directors, or founders of German companies

2. Identify the BGP to retrieve gender and profession of people and industry of companies

3. Are there german companies with C.E.O., managers, or founders that are not German?

4. Are there people related to multiple german companies?

5. Analyze the number of business people per role, type of company, and gender
 
   5.1 How many people for each role and gender are there in Germany?
   
   5.2 Are there companies with multiple german people having important roles?
   
   5.3 In which sectors are important german business people working?
   
   5.4 Are there german business people related to non german companies? How many?


In [1]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P27 wd:Q183  . 
?p wdt:P106 wd:Q2462658  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '829')]


1