# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
    is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [82]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-7cc32015f6-## 
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)

# Book Workflow Series ("Author comparison explorative search") 

Consider the following exploratory scenario:


>  Investigate Italian and French book authors in terms of awards, books published and copyright types



## Useful URIs for the current workflow
The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P106`    | occupation    | predicate | 
| `wdt:P17`     | country       | predicate | 
| `wdt:P27`     | citizenship   | predicate | 
| `wd:Q36180`   | writer        | node |
| `wd:Q38`      | Italy         | node |
| `wd:Q172579`  | Kingdom of Italy        | node |
| `wd:Q142`     | France        | node |
| `wd:Q37922`   | Nobel Prize literature        | node |
| `wd:Q213678`  | Vatican Library        | node |


Also consider that

```
?p wdt:P27 wd:Q142
```

is the BGP to retrieve all **French citizens**


The workload should


1. Identify the BGP for obtaining the Italian and French writers who published a book in the last 50 years

2. Compare the number of books written by Italian and French writers

3. Count how many books written by Italian authors are now released with a "public domain" copyright form

4. How many Literature Nobel awards won authors from Italy and from the Kingdom of Italy? 

5. Are there books from Litarature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatical Library)?

In [3]:
# start your workflow here

In [4]:
queryString = """
SELECT COUNT(?p)
WHERE { 
?p wdt:P27 wd:Q142 .
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '273456')]


1

In [5]:
# all italian citizens
queryString = """
SELECT COUNT(?p)
WHERE { 
?p wdt:P27 wd:Q38 .
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '103803')]


1

In [90]:
# get occupation->P106
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 

    ?person wdt:P27 wd:Q38 ;
        ?p  ?o. FILTER(!isLiteral(?o))
     
      #?o <http://schema.org/name> ?oName .
      ?p <http://schema.org/name> ?pName .
} 
order by ?p
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P10'), ('pName', 'video')]
[('p', 'http://www.wikidata.org/prop/direct/P1000'), ('pName', 'record held')]
[('p', 'http://www.wikidata.org/prop/direct/P101'), ('pName', 'field of work')]
[('p', 'http://www.wikidata.org/prop/direct/P1019'), ('pName', 'web feed URL')]
[('p', 'http://www.wikidata.org/prop/direct/P102'), ('pName', 'member of political party')]


5

In [7]:
# get writer->Q36180
queryString = """
SELECT DISTINCT ?o ?oName
WHERE { 

    ?person wdt:P27 wd:Q38 ;
                wdt:P106  ?o.
     
      ?o <http://schema.org/name> ?oName .
      Filter regex(?oName,"writer",'i')
      #?p <http://schema.org/name> ?pName .
} 
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q15949613'), ('oName', 'short story writer')]
[('o', 'http://www.wikidata.org/entity/Q15980158'), ('oName', 'non-fiction writer')]
[('o', 'http://www.wikidata.org/entity/Q18844224'), ('oName', 'science fiction writer')]
[('o', 'http://www.wikidata.org/entity/Q28389'), ('oName', 'screenwriter')]
[('o', 'http://www.wikidata.org/entity/Q36180'), ('oName', 'writer')]


5

In [8]:
# get all italian writer names
queryString = """
SELECT DISTINCT ?writer ?writerName
WHERE { 

    ?writer wdt:P27 wd:Q38 ;
            wdt:P106  wd:Q36180.
     
      ?writer <http://schema.org/name> ?writerName .
      #?p <http://schema.org/name> ?pName .
} 
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('writer', 'http://www.wikidata.org/entity/Q15918504'), ('writerName', 'Giovanni Pascutto')]
[('writer', 'http://www.wikidata.org/entity/Q17285057'), ('writerName', 'Philip Kwok')]
[('writer', 'http://www.wikidata.org/entity/Q17285111'), ('writerName', 'Francesca Manzoni')]
[('writer', 'http://www.wikidata.org/entity/Q17285372'), ('writerName', 'Benito Recchilongo')]
[('writer', 'http://www.wikidata.org/entity/Q17285428'), ('writerName', 'Eugenia Romanelli')]
[('writer', 'http://www.wikidata.org/entity/Q17285433'), ('writerName', 'Rosa Rosà')]
[('writer', 'http://www.wikidata.org/entity/Q17319638'), ('writerName', 'Francesco Gungui')]
[('writer', 'http://www.wikidata.org/entity/Q17327845'), ('writerName', 'Giuseppe Ciabattini')]
[('writer', 'http://www.wikidata.org/entity/Q2963243'), ('writerName', 'Chiara Valentini')]
[('writer', 'http://www.wikidata.org/entity/Q17453230'), ('writerName', 'Maria Costa')]


10

In [9]:
# get properties of all writers and find the author->P50
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 

    ?writer wdt:P27 wd:Q38 ;
            wdt:P106  wd:Q36180.
            
    ?o ?p ?writer.
     
    ?p <http://schema.org/name> ?pName .
} 
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1037'), ('pName', 'director / manager')]
[('p', 'http://www.wikidata.org/prop/direct/P1038'), ('pName', 'relative')]
[('p', 'http://www.wikidata.org/prop/direct/P1040'), ('pName', 'film editor')]
[('p', 'http://www.wikidata.org/prop/direct/P1066'), ('pName', 'student of')]
[('p', 'http://www.wikidata.org/prop/direct/P110'), ('pName', 'illustrator')]


5

In [10]:
# get all types of works  for italian writers 
queryString = """
SELECT DISTINCT ?work ?workName
WHERE { 

    ?writer wdt:P27 wd:Q38 ;
            wdt:P106  wd:Q36180.
            
    ?work wdt:P50 ?writer.
     
    ?work <http://schema.org/name> ?workName .
} 
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('work', 'http://www.wikidata.org/entity/Q17635217'), ('workName', "L'amore è un dio")]
[('work', 'http://www.wikidata.org/entity/Q17652232'), ('workName', 'Bisexuality in the ancient Word')]
[('work', 'http://www.wikidata.org/entity/Q18745523'), ('workName', 'Indian Summer')]
[('work', 'http://www.wikidata.org/entity/Q2915264'), ('workName', 'Click')]
[('work', 'http://www.wikidata.org/entity/Q533391'), ('workName', 'The Garden of the Finzi-Continis')]


5

In [11]:
# get only books->Q571 
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 

      ?writer wdt:P27 wd:Q38 ;
              wdt:P106  wd:Q36180.
            
      ?work wdt:P50 ?writer;
              wdt:P31 ?p .
     
     ?p <http://schema.org/name> ?pName .

} 
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q25379'), ('pName', 'play')]
[('p', 'http://www.wikidata.org/entity/Q732577'), ('pName', 'publication')]
[('p', 'http://www.wikidata.org/entity/Q105420'), ('pName', 'anthology')]
[('p', 'http://www.wikidata.org/entity/Q14406742'), ('pName', 'comic book series')]
[('p', 'http://www.wikidata.org/entity/Q17537576'), ('pName', 'creative work')]


5

In [134]:
# find the date of published books (publication date->P577)   (copyright status->P6216) (award received ->P166)
queryString = """ 
SELECT DISTINCT ?p  ?pName
WHERE { 

      ?writer wdt:P27 wd:Q38 ;
              wdt:P106  wd:Q36180.
            
      ?work wdt:P50 ?writer;
              wdt:P31 wd:Q571 ;
              ?p ?o.
     
          ?p <http://schema.org/name> ?pName .
} 
order by ?pName
LIMIT 500

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P5749'), ('pName', 'Amazon Standard Identification Number')]
[('p', 'http://www.wikidata.org/prop/direct/P268'), ('pName', 'Bibliothèque nationale de France ID')]
[('p', 'http://www.wikidata.org/prop/direct/P373'), ('pName', 'Commons category')]
[('p', 'http://www.wikidata.org/prop/direct/P356'), ('pName', 'DOI')]
[('p', 'http://www.wikidata.org/prop/direct/P1036'), ('pName', 'Dewey Decimal Classification')]
[('p', 'http://www.wikidata.org/prop/direct/P8359'), ('pName', 'Dewey Decimal Classification (works and editions)')]
[('p', 'http://www.wikidata.org/prop/direct/P3962'), ('pName', 'Global Trade Item Number')]
[('p', 'http://www.wikidata.org/prop/direct/P2969'), ('pName', 'Goodreads version/edition ID')]
[('p', 'http://www.wikidata.org/prop/direct/P8383'), ('pName', 'Goodreads work ID')]
[('p', 'http://www.wikidata.org/prop/direct/P675'), ('pName', 'Google Books ID')]
[('p', 'http://www.wikidata.org/prop/direct/P2671'), ('pName', '

61

In [79]:
# find all published books after 1970s for french and italian writers
queryString = """
SELECT DISTINCT ?period ?workName
WHERE { 

      ?writer wdt:P27 ?country ;
              wdt:P106  wd:Q36180.
            
      ?work wdt:P50 ?writer;
              wdt:P31 wd:Q571 ;
              wdt:P577 ?period.
     
     
    ?work <http://schema.org/name> ?workName .
    Filter regex (?country,"Q38|Q142",'i')
    Filter regex(str(?period),"197|198|199|200|201|202",'i')
} 
order by ?period
LIMIT 5

"""

print("Results")
run_query(queryString)

Results
[('period', '1970-01-01T00:00:00Z'), ('workName', 'Les sanglots longs')]
[('period', '1972-01-01T00:00:00Z'), ('workName', 'The Necrophiliac')]
[('period', '1972-01-01T00:00:00Z'), ('workName', 'Les Boulevards de Ceinture')]
[('period', '1973-01-01T00:00:00Z'), ('workName', "L'Educazione delle donne")]
[('period', '1973-01-01T00:00:00Z'), ('workName', 'Oh, Serafinaǃ (romanzo)')]


5

In [13]:
2. Compare the number of books written by Italian and French writers

In [93]:
# Number of Italian books
queryString = """
SELECT DISTINCT (COUNT(?frenchbook) AS ?howmanyfrenchbook) 
WHERE { 

       ?frenchwriter wdt:P27 wd:Q142 ;
                     wdt:P106  wd:Q36180.
               
       ?frenchbook wdt:P50 ?frenchwriter;
                   wdt:P31 wd:Q571 .

} 


"""

print("Results")
run_query(queryString)

Results
[('howmanyfrenchbook', '906')]


1

In [107]:
# Number of Italian books
queryString = """
SELECT DISTINCT (COUNT(?italybook) AS ?howmanyitalybook)
WHERE { 


             
        ?italywriter wdt:P27 wd:Q38 ;
                     wdt:P106  wd:Q36180.
               
        ?italybook wdt:P50 ?italywriter;
                   wdt:P31 wd:Q571 .
               
     
} 


"""

print("Results")
run_query(queryString)

Results
[('howmanyitalybook', '75')]


1

In [None]:
3. Count how many books written by Italian authors are now released with a "public domain" copyright form

In [128]:
# get public domain-> Q19652'
queryString = """
SELECT ?copyright  ?copyrightName 
WHERE { 


             
    ?italywriter wdt:P27 wd:Q38 ;
                 wdt:P106  wd:Q36180.
               
     ?italybook wdt:P50 ?italywriter;
                wdt:P31 wd:Q571 ;
                wdt:P6216 ?copyright.
    #filter regex()
                   
    ?copyright <http://schema.org/name> ?copyrightName .
} 

LIMIT 10

"""

print("Results")
run_query(queryString)

Results
[('copyright', 'http://www.wikidata.org/entity/Q19652'), ('copyrightName', 'public domain')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]
[('copyright', 'http://www.wikidata.org/entity/Q50423863'), ('copyrightName', 'copyrighted')]


7

In [133]:
# Number of books written by Italian authors are now released with a "public domain"
queryString = """
SELECT Count(?italybook)
WHERE { 


             
    ?italywriter wdt:P27 wd:Q38 ;
                 wdt:P106  wd:Q36180.
               
     ?italybook wdt:P50 ?italywriter;
                wdt:P31 wd:Q571 ;
                wdt:P6216 wd:Q19652.
  
} 


"""

print("Results")
run_query(queryString)

Results
[('callret-0', '1')]


1

In [None]:
4. How many Literature Nobel awards won authors from Italy and from the Kingdom of Italy? 

In [145]:
# find Literature Nobelprize -> Q37922
queryString = """
SELECT distinct ?o ?oName
WHERE { 


             
    ?italywriter wdt:P27 wd:Q38 ;
                 wdt:P106  wd:Q36180;
                 wdt:P166 ?o.
                 
   ?o <http://schema.org/name> ?oName .

  Filter Regex(?oName,"Nobel",'i')
  
} 
limit 5

"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q37922'), ('oName', 'Nobel Prize in Literature')]


1

In [149]:
# how many authors won noble prize
queryString = """
SELECT distinct (Count( ?italywriter) AS ?howmanyauthors)
WHERE { 


             
    ?italywriter wdt:P27 ?country ;
                 wdt:P106  wd:Q36180;
                 wdt:P166 wd:Q37922.
                 
   ?country <http://schema.org/name> ?countryName .

  Filter Regex(?countryName,"Italy",'i')
  
} 
limit 5

"""

print("Results")
run_query(queryString)

Results
[('howmanyauthors', '6')]


1

In [None]:
5. Are there books from Litarature Nobel Award winners which are not present in the Vatican Library? (if so, who is the author with more books not in the Vatical Library)?

In [None]:
# find the Vatican Library -> Q213678
queryString = """
SELECT distinct ?o ?oName
WHERE { 


             
    ?s ?p ?o.
                 
   ?o <http://schema.org/name> ?oName .

  filter regex(?oName,"Vatican Library",'i')
  
} 
limit 50

"""

print("Results")
run_query(queryString)

Results
[('o', 'http://www.wikidata.org/entity/Q254032'), ('oName', 'Olga Tokarczuk')]
[('o', 'http://www.wikidata.org/entity/Q34743'), ('oName', 'Rudyard Kipling')]
[('o', 'http://www.wikidata.org/entity/Q5878'), ('oName', 'Gabriel García Márquez')]
[('o', 'http://www.wikidata.org/entity/Q1860'), ('oName', 'English')]
[('o', 'http://www.wikidata.org/entity/Q571'), ('oName', 'book')]


5

In [274]:
# find the Vatican Library ID -> P8034
queryString = """
SELECT distinct ?p ?pName 
WHERE { 


             
    ?winner wdt:P106  wd:Q36180;
             wdt:P166 wd:Q37922;
                ?p ?o.
                
            
                 
   ?p <http://schema.org/name> ?pName .

   filter regex(?pName,"vatican",'i')  
  
} 
limit 500

"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1017'), ('pName', 'Vatican Library ID (former scheme)')]
[('p', 'http://www.wikidata.org/prop/direct/P8034'), ('pName', 'Vatican Library VcBA ID')]


2

In [280]:
# Nobel Award winners which are not present in the Vatican Library
queryString = """
SELECT distinct ?winner 
WHERE { 


             
    ?winner wdt:P106  wd:Q36180;
             wdt:P166 wd:Q37922;
             wdt:P8034 ?o.
                 
    Filter not exists {?winner wdt:P8034 ?o}.
                
        
                 
   ?winner <http://schema.org/name> ?winnerName .
    #filter regex(?oName,"Vatican Library",'i')
  
} 
limit 500

"""

print("Results")
run_query(queryString)

Results
Empty


0