# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [113]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-831cee9075-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Political Magazines explorative search") 

Consider the following exploratory scenario:


>  Investigate the U.S. Magazines which write about politics and their media presence



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P921`    | main subject  | predicate | 
| `wdt:P17`     | country       | predicate | 
| `wdt:P279`    | subclass      | predicate |
| `wd:Q41298`   | Magazine      | node |
| `wd:Q7163`    | politics      | node |
| `wd:Q30`      | U.S.          | node |
| `wd:Q217305`  | The New Yorker  | node |







Also consider

```
?p wdt:P17 wd:Q30 .
?p wdt:P31/wdt:P279* wd:Q41298  . 
```

is the BGP to retrieve all **type of publications in the U.S.**


The workload should


1. Identify the BGP for obtaining the US magazines that write about politics

2. Compare the number of social media followers and get the top five magazines along with main properties as the place of publication and subfields of work

3. Compare the number of followers with the top three US magazines writing mainly about sports

4. Get the name of notable employees working for The New Yorker and any other political magazine published in the US. Check if these employees have witten any book and if so get the title.  

5. Is any employee of The New Yorker ever been nominated for a prize or award?

In [114]:
# start your workflow here

1) Here we retrieve the type of publications with the main subject that contains the word 'politic'. It means that the publication retrived talks about politic.

In [115]:
queryString = """
SELECT DISTINCT ?pname ?subjectname
WHERE { 

    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298  . 
    ?p <http://schema.org/name> ?pname .
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    FILTER REGEX(?subjectname, "politic", 'i')
} 


"""

print("Results")
run_query(queryString)

Results
[('pname', 'Legislative Studies Quarterly'), ('subjectname', 'political science')]
[('pname', 'Journal of World-Systems Research'), ('subjectname', 'political science')]
[('pname', 'The Good Society'), ('subjectname', 'political science')]
[('pname', 'Research & Politics'), ('subjectname', 'political science')]
[('pname', 'Bulletin of the Atomic Scientists'), ('subjectname', 'political science')]
[('pname', 'Journal of Political Ecology'), ('subjectname', 'political science')]
[('pname', 'The Port Folio'), ('subjectname', 'politics')]
[('pname', 'The New Yorker'), ('subjectname', 'politics')]
[('pname', 'Politico'), ('subjectname', 'politics')]
[('pname', 'Commentary'), ('subjectname', 'politics')]
[('pname', 'Dissent'), ('subjectname', 'political science')]
[('pname', 'Telos'), ('subjectname', 'political science')]
[('pname', 'The Politic'), ('subjectname', 'politics of the United States')]
[('pname', 'Jacobin'), ('subjectname', 'politics')]
[('pname', 'The Liberator'), ('subj

15

2) First of all let's search all the predicate related to the magazines.

In [116]:
queryString = """
SELECT DISTINCT ?predicatename
WHERE { 
    ?p wdt:P31 wd:Q41298 .
    ?p ?predicate ?o . FILTER(isLiteral(?o))
    ?predicate <http://schema.org/name> ?predicatename .
} 

LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('predicatename', 'memoriademadrid publication ID')]
[('predicatename', 'Czech National Bibliography book ID')]
[('predicatename', 'Periodical ID in a database of the Ministry of Culture of the Czech Republic')]
[('predicatename', 'Norwegian Register journal ID')]
[('predicatename', 'ERA Journal ID')]
[('predicatename', 'CODEN')]
[('predicatename', 'ISO 4 abbreviation')]
[('predicatename', 'Danish Bibliometric Research Indicator level')]
[('predicatename', 'Danish Bibliometric Research Indicator (BFI) SNO/CNO')]
[('predicatename', 'JUFO ID')]
[('predicatename', 'OpenCitations bibliographic resource ID')]
[('predicatename', 'DOI')]
[('predicatename', 'UniProt journal ID')]
[('predicatename', 'Scilit journal ID')]
[('predicatename', 'Crossref journal ID')]
[('predicatename', 'Courrier international source ID')]
[('predicatename', 'Japanese magazine code')]
[('predicatename', 'Directory of Open Access Journals ID')]
[('predicatename', 'HathiTrust ID')]
[('predicatename', 'bibcode

50

Let's filter for finding the right predicate.

In [118]:
queryString = """
SELECT DISTINCT ?predicatename ?predicate
WHERE { 
    ?p wdt:P31 wd:Q41298 .
    ?p ?predicate ?o . FILTER(isLiteral(?o))
    ?predicate <http://schema.org/name> ?predicatename .
    FILTER REGEX(?predicatename, 'follower|social', 'i')
} 

LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('predicatename', 'social media followers'), ('predicate', 'http://www.wikidata.org/prop/direct/P8687')]


1

So we found a predicate named 'social media followers'. Let's use this for findings the number of followers of the top 5 magazines.

In [119]:
queryString = """
SELECT DISTINCT ?magazines ?followers
WHERE { 
    ?p wdt:P31 wd:Q41298 .
    ?p <http://schema.org/name> ?magazines .
    ?p wdt:P8687 ?followers
} 
ORDER BY DESC (?followers)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('magazines', 'The Economist'), ('followers', '25421540')]
[('magazines', 'Time'), ('followers', '17806837')]
[('magazines', 'Vogue'), ('followers', '9770000')]
[('magazines', 'Veja'), ('followers', '8617912')]
[('magazines', 'Muy Interesante'), ('followers', '8404687')]


5

Here we search for useful properties of a magazine useful for our objective.

In [120]:
queryString = """
SELECT DISTINCT ?predicatename ?predicate ?a
WHERE { 
    wd:Q180089 ?predicate ?o .
    ?predicate <http://schema.org/name> ?predicatename .
    FILTER REGEX(?predicatename, 'work|place|country', 'i')
} 


"""

print("Results")
run_query(queryString)

Results
[('predicatename', 'country'), ('predicate', 'http://www.wikidata.org/prop/direct/P17')]
[('predicatename', 'place of publication'), ('predicate', 'http://www.wikidata.org/prop/direct/P291')]
[('predicatename', 'language of work or name'), ('predicate', 'http://www.wikidata.org/prop/direct/P407')]
[('predicatename', 'country of origin'), ('predicate', 'http://www.wikidata.org/prop/direct/P495')]


4

Add this informations to the previus query and let's see what we can find.

In [121]:
queryString = """
SELECT DISTINCT ?magazines ?followers ?fieldofworkname ?countryname ?placeofpublicationsname
WHERE { 
    ?p wdt:P31 wd:Q41298 ;
      <http://schema.org/name> ?magazines ;
      wdt:P8687 ?followers .
     OPTIONAL{
     ?p wdt:P291 ?placeofpublications .
     ?placeofpublications <http://schema.org/name> ?placeofpublicationsname 
     }
     OPTIONAL{
        ?p wdt:P101 ?fieldofwork .
        ?fieldofwork <http://schema.org/name> ?fieldofworkname 
     }
     OPTIONAL{
        ?p wdt:P17 ?country .
        ?country <http://schema.org/name> ?countryname
     }
    
} 
ORDER BY DESC (?followers)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('magazines', 'The Economist'), ('followers', '25421540'), ('countryname', 'United Kingdom'), ('placeofpublicationsname', 'London')]
[('magazines', 'Time'), ('followers', '17806837'), ('fieldofworkname', 'news'), ('countryname', 'United States of America'), ('placeofpublicationsname', 'New York City')]
[('magazines', 'Vogue'), ('followers', '9770000'), ('fieldofworkname', 'fashion')]
[('magazines', 'Veja'), ('followers', '8617912'), ('countryname', 'Brazil')]
[('magazines', 'Muy Interesante'), ('followers', '8404687')]


5

3) Here we search the property that corresponds to main subject about sports.

In [124]:
queryString = """
SELECT DISTINCT ?subjectname ?m
WHERE { 

    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298  . 
    ?p <http://schema.org/name> ?pname .
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    FILTER REGEX(?subjectname, "sport", 'i')
} 
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('subjectname', 'sports medicine'), ('m', 'http://www.wikidata.org/entity/Q840545')]
[('subjectname', 'transportation engineering'), ('m', 'http://www.wikidata.org/entity/Q775325')]
[('subjectname', 'sports magazine'), ('m', 'http://www.wikidata.org/entity/Q568765')]
[('subjectname', 'rail transport in the United States'), ('m', 'http://www.wikidata.org/entity/Q3537832')]
[('subjectname', 'sport'), ('m', 'http://www.wikidata.org/entity/Q349')]
[('subjectname', 'motorsport'), ('m', 'http://www.wikidata.org/entity/Q5367')]
[('subjectname', 'sport psychology'), ('m', 'http://www.wikidata.org/entity/Q632190')]


7

We found that the properties we are interested in are 'sports medicine' (Q840545), 'sports magazine' (Q568765), 'sport' (Q349), 'motorsport' (Q5367) and 'sport psychology' (Q632190). Now we can find the 3 top magazines that talkabout sport.

In [143]:
queryString = """
SELECT DISTINCT ?p ?pname ?followers ?subjectname
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298  .
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    ?p wdt:P8687 ?followers .
    ?p <http://schema.org/name> ?pname .
    ?p wdt:P921 ?subject .
    ?subject <http://schema.org/name> ?subjectname .
    FILTER(?subject = wd:Q840545 || ?subject = wd:Q568765 || ?subject = wd:Q349 || ?subject = wd:Q5367 || ?subject = wd:Q632190)
}
ORDER BY DESC (?followers)
LIMIT 3

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q3108681'), ('pname', 'Track & Field News'), ('followers', '144716'), ('subjectname', 'sports magazine')]
[('p', 'http://www.wikidata.org/entity/Q1140774'), ('pname', 'The Ring'), ('followers', '104600'), ('subjectname', 'sports magazine')]
[('p', 'http://www.wikidata.org/entity/Q22023367'), ('pname', 'National Speed Sport News'), ('followers', '24714'), ('subjectname', 'motorsport')]


3

4) We can easily retrive the magazines published in U.S. but we have to find the property that relate a employer to the magazine.

In [164]:
queryString = """
SELECT DISTINCT ?predicate ?p
WHERE {
    ?o ?p wd:Q217305.
    ?p <http://schema.org/name> ?predicate
    FILTER REGEX(?predicate, 'employe', 'i')
}

"""

print("Results")
run_query(queryString)

Results
[('predicate', 'employer'), ('p', 'http://www.wikidata.org/prop/direct/P108')]


1

Now we search all the employer.

In [176]:
queryString = """
SELECT DISTINCT ?employer ?e ?pname
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298 .
    ?e wdt:P108 ?p.
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    FILTER REGEX(?subjectname, "politic", 'i').
    ?e <http://schema.org/name> ?employer .
    ?p <http://schema.org/name> ?pname
}
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('employer', 'Hyman H. Goldsmith'), ('e', 'http://www.wikidata.org/entity/Q108598571'), ('pname', 'Bulletin of the Atomic Scientists')]
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker')]
[('employer', 'Saul Steinberg'), ('e', 'http://www.wikidata.org/entity/Q432856'), ('pname', 'The New Yorker')]
[('employer', 'Jean-Jacques Sempé'), ('e', 'http://www.wikidata.org/entity/Q354371'), ('pname', 'The New Yorker')]
[('employer', 'Alastair Reid'), ('e', 'http://www.wikidata.org/entity/Q4708767'), ('pname', 'The New Yorker')]
[('employer', 'Robert Mankoff'), ('e', 'http://www.wikidata.org/entity/Q7347308'), ('pname', 'The New Yorker')]
[('employer', 'Françoise Mouly'), ('e', 'http://www.wikidata.org/entity/Q941550'), ('pname', 'The New Yorker')]
[('employer', 'Edmund Wilson'), ('e', 'http://www.wikidata.org/entity/Q704931'), ('pname', 'The New Yorker')]
[('employer', 'Rea Irvin'), ('e', 'http://www.wikidata.org/entity/Q7300143'

20

Here we have to find the predicate that connect an employer to the books that he write.

In [193]:
queryString = """
SELECT DISTINCT ?predicatename ?predicate
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298 .
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    FILTER REGEX(?subjectname, "politic", 'i').
    ?e wdt:P108 ?p .
    ?e ?predicate ?o .
    ?predicate <http://schema.org/name> ?predicatename
    FILTER REGEX(?predicatename, 'notable|work|book', 'i')
}
LIMIT 1000
"""

print("Results")
run_query(queryString)

Results
[('predicatename', 'field of work'), ('predicate', 'http://www.wikidata.org/prop/direct/P101')]
[('predicatename', 'Facebook ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P2013')]
[('predicatename', 'work period (start)'), ('predicate', 'http://www.wikidata.org/prop/direct/P2031')]
[('predicatename', 'BookBrainz author ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P2607')]
[('predicatename', 'CiNii author ID (books)'), ('predicate', 'http://www.wikidata.org/prop/direct/P271')]
[('predicatename', 'Prabook ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P3368')]
[('predicatename', 'Online Books Page author ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P4629')]
[('predicatename', 'Los Angeles Review of Books author ID'), ('predicate', 'http://www.wikidata.org/prop/direct/P5705')]
[('predicatename', 'has works in the collection'), ('predicate', 'http://www.wikidata.org/prop/direct/P6379')]
[('predicatename', 'notable work'), ('predicate', '

13

After some tests we find that the right predicate is the one called 'notable work' (P800).

In [191]:
queryString = """
SELECT DISTINCT ?oname
WHERE {
    ?e wdt:P800 ?o .
    ?o <http://schema.org/name> ?oname
}
LIMIT 20

"""

print("Results")
run_query(queryString)

Results
[('oname', 'Kubuntu')]
[('oname', 'Ironfist Chinmi')]
[('oname', 'The Hunger Games')]
[('oname', 'Estadio Las Gaunas')]
[('oname', 'Torre Caja Badajoz')]
[('oname', 'Edificio Pirámide, Madrid')]
[('oname', 'Edificio Galaxia, Madrid')]
[('oname', 'Edificio de viviendas en Paseo de la Castellana 121-123, Madrid')]
[('oname', 'Dispensario antituberculoso Victoria Eugenia')]
[('oname', 'Torres de Colón')]
[('oname', 'Motel El Hidalgo')]
[('oname', 'Break Shot')]
[('oname', 'Gal Factory')]
[('oname', 'Mobile Fighter G Gundam')]
[('oname', '15 Park Avenue')]
[('oname', 'Amanda Krueger')]
[('oname', 'Convent of Santo Domingo, La Guardia de Jaén')]
[('oname', 'Casa Consistorial de Valladolid')]
[('oname', 'Estació del Nord, Valencia')]
[('oname', 'Charlie and Lola')]


20

Here we have all the employer of all US magazines that talk about politic and their notable works title.

In [192]:
queryString = """
SELECT DISTINCT ?employer ?e ?pname ?oname
WHERE {
    ?p wdt:P17 wd:Q30 .
    ?p wdt:P31/wdt:P279* wd:Q41298 .
    ?e wdt:P108 ?p.
    ?p wdt:P921 ?m .
    ?m <http://schema.org/name> ?subjectname .
    FILTER REGEX(?subjectname, "politic", 'i').
    ?e <http://schema.org/name> ?employer .
    ?p <http://schema.org/name> ?pname .
    ?e wdt:P800 ?o .
    ?o <http://schema.org/name> ?oname
}
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker'), ('oname', 'The Human Condition')]
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker'), ('oname', 'The Origins of Totalitarianism')]
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker'), ('oname', 'Eichmann in Jerusalem')]
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker'), ('oname', 'Rahel Varnhagen')]
[('employer', 'Hannah Arendt'), ('e', 'http://www.wikidata.org/entity/Q60025'), ('pname', 'The New Yorker'), ('oname', 'On Revolution')]
[('employer', 'Saul Steinberg'), ('e', 'http://www.wikidata.org/entity/Q432856'), ('pname', 'The New Yorker'), ('oname', 'View of the World from 9th Avenue')]
[('employer', 'Jean-Jacques Sempé'), ('e', 'http://www.wikidata.org/entity/Q354371'), ('pname', 'The New Yorker'), 

20

5) For the last point we need to find the predicates for our scope. Let's start from the employers of the 'The New Yourker'.

In [201]:
queryString = """
SELECT DISTINCT ?predicate ?p ?ename 
WHERE {
    ?e wdt:P108 wd:Q217305 .
    ?e <http://schema.org/name> ?ename .
    ?e ?p ?o .
    ?p <http://schema.org/name> ?predicate
    FILTER REGEX(?predicate, 'prize|award|nominated', 'i')
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('predicate', 'nominated for'), ('p', 'http://www.wikidata.org/prop/direct/P1411'), ('ename', 'Hannah Arendt')]
[('predicate', 'nominated for'), ('p', 'http://www.wikidata.org/prop/direct/P1411'), ('ename', 'Edmund Wilson')]
[('predicate', 'nominated for'), ('p', 'http://www.wikidata.org/prop/direct/P1411'), ('ename', 'Lewis Mumford')]
[('predicate', 'nominated for'), ('p', 'http://www.wikidata.org/prop/direct/P1411'), ('ename', 'Lawrence Wright')]
[('predicate', 'award received'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('ename', 'Hannah Arendt')]
[('predicate', 'award received'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('ename', 'Saul Steinberg')]
[('predicate', 'award received'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('ename', 'Jean-Jacques Sempé')]
[('predicate', 'award received'), ('p', 'http://www.wikidata.org/prop/direct/P166'), ('ename', 'Alastair Reid')]
[('predicate', 'award received'), ('p', 'http://www.wikidata.org/prop/direct/P16

41

Finally we found all the employer that wins the awards or are nominated for something or wins a Nobel Prize.