# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p sc:name ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-NOTEBOOK_CODE_HERE-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Political Magazines explorative search") 

Consider the following exploratory scenario:


>  Investigate the U.S. Magazines which write about politics and their media presence



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P921`    | main subject  | predicate | 
| `wdt:P17`     | country       | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wd:Q41298`   | Magazine      | node |
| `wd:Q7163`    | politics      | node |
| `wd:Q30`      | U.S.          | node |
| `wd:Q217305`  | The New Yorker  | node |







Also consider

```
?p wdt:P17 wd:Q30 .
?p wdt:P31/wdt:P279* wd:Q41298  . 
```

is the BGP to retrieve all **type of publications in the U.S.**


The workload should


1. Identify the BGP for obtaining the US magazines that write about politics

2. Compare the number of social media followers and get the top five magazines along with main properties as the place of publication and subfields of work

3. Compare the number of followers with the top three US magazines writing mainly about sports

4. Get the name of notable employees working for The New Yorker and any other political magazine published in the US. Check if these employees have witten any book and if so get the title.  

5. Is any employee of The New Yorker ever been nominated for a prize or award?

In [2]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(?p)
WHERE { 
?p wdt:P17 wd:Q30 .
?p wdt:P31/wdt:P279* wd:Q41298  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '5422')]


1

### 1. Identify the BGP for obtaining the US magazines that write about politics

In [4]:
queryString = """
SELECT DISTINCT ?name ?subject ?country
WHERE { 
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298  . 
    ?p sc:name ?name .
    ?p wdt:P921 ?m .
    ?m sc:name ?subject .
    FILTER REGEX(?subject, ".*polit.*")
    ?p wdt:P17 ?c.
    ?c sc:name ?country .
} ORDER BY ?subject
  
"""

print("Results")
run_query(queryString)

Results
[('name', 'Legislative Studies Quarterly'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'The Good Society'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Journal of World-Systems Research'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Research & Politics'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Bulletin of the Atomic Scientists'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Journal of Political Ecology'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Dissent'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'Telos'), ('subject', 'political science'), ('country', 'United States of America')]
[('name', 'The Port Folio'), ('subject', 'politics'), ('country', 'United States of America')]
[('name

15

### 2. Compare the number of social media followers and get the top five magazines along with main properties as the place of publication and subfields of work

In [5]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?cs wdt:P31/wdt:P279* wd:Q41298;
        ?p ?o .
    # this returns the labels
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*follow.*") || REGEX(?pname, ".*work.*") || REGEX(?pname, ".*publ.*"))
}

"""

print("Results")
run_query(queryString)

Results
The operation failed timed out


P8687 Media followers

P101 work fields

P291 place of publication

In [6]:
queryString = """
SELECT DISTINCT ?name ?place ?work ?mediafollowers
WHERE { 
    ?p wdt:P31/wdt:P279* wd:Q41298.
    ?p wdt:P1476 ?name.
    ?p wdt:P291 ?pl .
    ?pl sc:name ?place .
    ?p wdt:P101 ?w .
    ?w sc:name ?work .
    ?p wdt:P8687 ?mediafollowers .
       
} ORDER BY DESC(?mediafollowers)
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('name', 'Time'), ('place', 'New York City'), ('work', 'news'), ('mediafollowers', '17806837')]
[('name', 'Apparel Arts'), ('place', 'New York City'), ('work', 'lifestyle'), ('mediafollowers', '6280000')]
[('name', 'GQ'), ('place', 'New York City'), ('work', 'lifestyle'), ('mediafollowers', '6280000')]
[('name', "Gentlemen's Quarterly"), ('place', 'New York City'), ('work', 'lifestyle'), ('mediafollowers', '6280000')]
[('name', 'Scientific American'), ('place', 'New York City'), ('work', 'history'), ('mediafollowers', '3964337')]


5

### 3. Compare the number of followers with the top three US magazines writing mainly about sports

In [7]:
queryString = """
SELECT DISTINCT ?name ?subject ?country ?mediafollowers
WHERE { 
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298  . 
    ?p sc:name ?name .
    ?p wdt:P921 ?m .
    ?m sc:name ?subject .
    FILTER REGEX(?subject, ".*sport.*")
    ?p wdt:P17 ?c.
    ?c sc:name ?country .
    ?p wdt:P8687 ?mediafollowers .
} ORDER BY DESC(?mediafollowers)
LIMIT 3
  
"""

print("Results")
run_query(queryString)

Results
[('name', 'Track & Field News'), ('subject', 'sports magazine'), ('country', 'United States of America'), ('mediafollowers', '144716')]
[('name', 'The Ring'), ('subject', 'sports magazine'), ('country', 'United States of America'), ('mediafollowers', '104600')]
[('name', 'National Speed Sport News'), ('subject', 'motorsport'), ('country', 'United States of America'), ('mediafollowers', '24714')]


3

### 4. Get the name of notable employees working for The New Yorker and any other political magazine published in the US. Check if these employees have witten any book and if so get the title. 

In [8]:
queryString = """
SELECT DISTINCT ?p ?pname
WHERE {
    ?cs wdt:P17 wd:Q30 .
    ?cs wdt:P31/wdt:P279* wd:Q41298;
        ?p ?o .
    # this returns the labels
    ?p sc:name ?pname .
    FILTER(REGEX(?pname, ".*empl.*"))
}

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1128'), ('pname', 'employees')]
[('p', 'http://www.wikidata.org/prop/direct/P4195'), ('pname', 'category for employees of the organization')]
[('p', 'http://www.wikidata.org/prop/direct/P108'), ('pname', 'employer')]


3

P1128 Employee

In [9]:
queryString = """
SELECT DISTINCT ?predicatename ?predicate
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298 .
    ?p wdt:P921 ?m .
    ?m sc:name ?subjectname . FILTER REGEX(?subjectname, "politic", 'i').
    ?e wdt:P108 ?p .
    ?e ?predicate ?o .
    ?predicate sc:name ?predicatename
    FILTER (REGEX(?predicatename, ".*book.*") || REGEX(?predicatename, ".*work.*"))
}
"""

print("Results")
run_query(queryString)

Results
[('predicatename', 'field of work'), ('predicate', 'http://www.wikidata.org/prop/direct/P101')]
[('predicatename', 'work period (start)'), ('predicate', 'http://www.wikidata.org/prop/direct/P2031')]
[('predicatename', 'has works in the collection'), ('predicate', 'http://www.wikidata.org/prop/direct/P6379')]
[('predicatename', 'Facebook ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P2013')]
[('predicatename', 'CiNii author ID (books)'), ('predicate', 'http://www.wikidata.org/prop/direct/P271')]
[('predicatename', 'Prabook ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P3368')]
[('predicatename', 'notable work'), ('predicate', 'http://www.wikidata.org/prop/direct/P800')]
[('predicatename', 'contributed to creative work'), ('predicate', 'http://www.wikidata.org/prop/direct/P3919')]


8

In [10]:
queryString = """
SELECT DISTINCT ?employer ?book
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298 .
    ?e wdt:P108 ?p.
    ?p wdt:P921 ?m .
    ?m sc:name ?subject . FILTER REGEX(?subject, ".*polit.*")
    ?e sc:name ?employer .
    ?e wdt:P800 ?b .
    ?b sc:name ?book
}
"""

print("Results")
run_query(queryString)

Results
[('employer', 'Hannah Arendt'), ('book', 'The Human Condition')]
[('employer', 'Hannah Arendt'), ('book', 'The Origins of Totalitarianism')]
[('employer', 'Hannah Arendt'), ('book', 'Eichmann in Jerusalem')]
[('employer', 'Hannah Arendt'), ('book', 'Rahel Varnhagen')]
[('employer', 'Hannah Arendt'), ('book', 'On Revolution')]
[('employer', 'Saul Steinberg'), ('book', 'View of the World from 9th Avenue')]
[('employer', 'Jean-Jacques Sempé'), ('book', 'Le petit Nicolas')]
[('employer', 'Edmund Wilson'), ('book', 'Patriotic Gore')]
[('employer', 'Edmund Wilson'), ('book', 'To the Finland Station')]
[('employer', 'Jonathan Schell'), ('book', 'The Fate of the Earth')]
[('employer', 'Adam Gopnik'), ('book', 'A Thousand Small Sanities')]
[('employer', 'Dexter Filkins'), ('book', 'The Forever War')]
[('employer', 'Lewis Mumford'), ('book', 'The City in History')]
[('employer', 'Lewis Mumford'), ('book', 'The Myth of the Machine')]
[('employer', 'Lewis Mumford'), ('book', 'Technics and 

24