# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [89]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-42c3202f8a-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-03.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Companies Workflow Series ("Trademarks across the world") 

Consider the following exploratory information need:

> You are investigating different kinds of trademarks

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`      | country       | predicate |
| `wdt:P27`      | citizenship   | predicate |
| `wd:Q167270`  | trademark     | node      |
| `wd:Q14091`   | iMac          | node      |
| `wd:Q312`     | Apple Inc.    | node      |



Also consider

```
?p wdt:P31/wdt:P279* wd:Q167270  . 
```

is the BGP to retrieve all **entities that are trademarks**

## Workload Goals

1. Identify the BGP for the company or person related to a given trademark

2. Identify the BGP to retrieve types or categories for a given trademark

3. What companies have the largest number of trademarks? 

4. What are the types of trademarks, how many trademarks exist for each type?

5. Analyze the number of trademarks across types, companies, and countries
 
   5.1 How many U.S. companies hold trademarks? In which sectors?
   
   5.2 How many people hold or are connected to trademarks? In which role?
   
   5.3 In which sector there is the highest number of trademarks?   


In [3]:
# start your workflow here 

In [4]:
queryString = """
SELECT COUNT(?p)
WHERE { 


?p wdt:P31/wdt:P279* wd:Q167270  . 

} 
LIMIT 3
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '44700')]


1

In [20]:
#find owned by-> P127  , founded by->P112 , industry->P452
#  
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 
wd:Q312 wdt:P31/wdt:P279* wd:Q167270;
       ?p ?o.FILTER(!isLiteral(?o))
        
?p <http://schema.org/name> ?pName.


} 
LIMIT 500
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1056'), ('pName', 'product or material produced')]
[('p', 'http://www.wikidata.org/prop/direct/P112'), ('pName', 'founded by')]
[('p', 'http://www.wikidata.org/prop/direct/P1151'), ('pName', "topic's main Wikimedia portal")]
[('p', 'http://www.wikidata.org/prop/direct/P127'), ('pName', 'owned by')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source')]
[('p', 'http://www.wikidata.org/prop/direct/P138'), ('pName', 'named after')]
[('p', 'http://www.wikidata.org/prop/direct/P1454'), ('pName', 'legal form')]
[('p', 'http://www.wikidata.org/prop/direct/P154'), ('pName', 'logo image')]
[('p', 'http://www.wikidata.org/prop/direct/P1546'), ('pName', 'motto')]
[('p', 'http://www.wikidata.org/prop/direct/P159'), ('pName', 'headquarters location')]
[('p', 'http://www.wikidata.org/prop/direct/P166'), ('pName', 'award received')]
[('p', 'http://www.wikidata.org/prop/direct/P169'), ('pName', 'chief executive office

41

In [14]:
1. Identify the BGP for the company or person related to a given trademark

In [13]:
#find owner of Apple Inc. 
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 
wd:Q312 wdt:P127 ?p.

            
?p <http://schema.org/name> ?pName.


} 
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q849363'), ('pName', 'The Vanguard Group')]
[('p', 'http://www.wikidata.org/entity/Q217583'), ('pName', 'Berkshire Hathaway')]
[('p', 'http://www.wikidata.org/entity/Q1196231'), ('pName', 'Government Pension Fund Global')]
[('p', 'http://www.wikidata.org/entity/Q219635'), ('pName', 'BlackRock')]


4

In [None]:
2. Identify the BGP to retrieve types or categories for a given trademark

In [25]:
#find type of Apple INC
queryString = """
SELECT DISTINCT ?p ?pName
WHERE { 
wd:Q312 wdt:P31/wdt:P279* wd:Q167270;
       wdt:P452 ?p.
   
?p <http://schema.org/name> ?pName.


} 
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q581105'), ('pName', 'consumer electronics')]
[('p', 'http://www.wikidata.org/entity/Q11650'), ('pName', 'electronics')]
[('p', 'http://www.wikidata.org/entity/Q56598901'), ('pName', 'mobile phone industry')]
[('p', 'http://www.wikidata.org/entity/Q11661'), ('pName', 'information technology')]
[('p', 'http://www.wikidata.org/entity/Q880371'), ('pName', 'software industry')]


5

In [None]:
3. What companies have the largest number of trademarks? 

In [39]:
queryString = """
SELECT DISTINCT (count(?trademarkName) as ?numberoftrademark) ?companyName 
WHERE { 
?trademark wdt:P31/wdt:P279* wd:Q167270;
           wdt:P127 ?company.
   
?company <http://schema.org/name> ?companyName.
?trademark <http://schema.org/name> ?trademarkName.

} 
order by desc (?numberoftrademark)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('numberoftrademark', '108'), ('companyName', 'Nestlé')]
[('numberoftrademark', '60'), ('companyName', 'Universal Music Group')]
[('numberoftrademark', '57'), ('companyName', 'Covestro')]
[('numberoftrademark', '44'), ('companyName', 'Unilever')]
[('numberoftrademark', '40'), ('companyName', 'Mondelēz International')]


5

In [None]:
4. What are the types of trademarks, how many trademarks exist for each type?

In [43]:
# get all types of trademarks
queryString = """
SELECT DISTINCT ?type ?typeName
WHERE { 
?trademark wdt:P31/wdt:P279* wd:Q167270;
       wdt:P452 ?type.
   
?type <http://schema.org/name> ?typeName.


} 
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('type', 'http://www.wikidata.org/entity/Q20679540'), ('typeName', 'cinema chain')]
[('type', 'http://www.wikidata.org/entity/Q126793'), ('typeName', 'retail')]
[('type', 'http://www.wikidata.org/entity/Q19595701'), ('typeName', 'entertainment industry')]
[('type', 'http://www.wikidata.org/entity/Q5354754'), ('typeName', 'talent agency')]
[('type', 'http://www.wikidata.org/entity/Q55638'), ('typeName', 'tertiary sector of the economy')]


5

In [46]:
# get how many trademarks exist for each type
queryString = """
SELECT DISTINCT  ?typeName (count(?trademark ) AS ?howmanytrademarks)
WHERE { 
?trademark wdt:P31/wdt:P279* wd:Q167270;
       wdt:P452 ?type.
   
?type <http://schema.org/name> ?typeName.


} 
order by desc (?howmanytrademarks)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('typeName', 'retail'), ('howmanytrademarks', '1652')]
[('typeName', 'automotive industry'), ('howmanytrademarks', '509')]
[('typeName', 'music industry'), ('howmanytrademarks', '413')]
[('typeName', 'restaurant'), ('howmanytrademarks', '372')]
[('typeName', 'gastronomy'), ('howmanytrademarks', '272')]


5

In [None]:
5.1 How many U.S. companies hold trademarks? In which sectors?

In [61]:
# get U.S. companies hold trademarks
queryString = """
SELECT DISTINCT ?trademarkName ?typeName ?countryName
WHERE { 

?trademark wdt:P31/wdt:P279* wd:Q167270;
           wdt:P17 ?country;
           wdt:P452 ?type.
   
   filter regex(?countryName,"united states of am",'i')
   
?type <http://schema.org/name> ?typeName.
?country <http://schema.org/name> ?countryName.
?trademark <http://schema.org/name> ?trademarkName.
} 

LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('trademarkName', 'Foot Locker'), ('typeName', 'retail'), ('countryName', 'United States of America')]
[('trademarkName', 'Lord & Taylor'), ('typeName', 'retail'), ('countryName', 'United States of America')]
[('trademarkName', 'Abraham & Straus'), ('typeName', 'retail'), ('countryName', 'United States of America')]
[('trademarkName', 'Elder-Beerman'), ('typeName', 'retail'), ('countryName', 'United States of America')]
[('trademarkName', "Macy's"), ('typeName', 'retail'), ('countryName', 'United States of America')]


5

In [64]:
# get number of U.S. companies hold trademarks in each sectors
queryString = """
SELECT DISTINCT (count(?trademarkName) as ?howmanytrademarks)  ?typeName ?countryName
WHERE { 

?trademark wdt:P31/wdt:P279* wd:Q167270;
           wdt:P17 ?country;
           wdt:P452 ?type.
   
   filter regex(?countryName,"united states of am",'i')
   
?type <http://schema.org/name> ?typeName.
?country <http://schema.org/name> ?countryName.
?trademark <http://schema.org/name> ?trademarkName.
} 

LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('howmanytrademarks', '10'), ('typeName', 'hotel'), ('countryName', 'United States of America')]
[('howmanytrademarks', '12'), ('typeName', 'film industry'), ('countryName', 'United States of America')]
[('howmanytrademarks', '40'), ('typeName', 'entertainment'), ('countryName', 'United States of America')]
[('howmanytrademarks', '2'), ('typeName', 'retail chain'), ('countryName', 'United States of America')]
[('howmanytrademarks', '2'), ('typeName', 'textile and clothing industry'), ('countryName', 'United States of America')]


5

In [None]:
 5.2 How many people hold or are connected to trademarks? In which role?

In [111]:
# find all connections which connect people  to trademarks
queryString = """
SELECT DISTINCT  ?p ?pName
WHERE { 

?person wdt:P31 ?human.
?trademark wdt:P31/wdt:P279* wd:Q167270;
           ?p ?person.
   
filter regex(?humanName,"human$",'i')
   
?person <http://schema.org/name> ?personName.
?human <http://schema.org/name> ?humanName.
?p <http://schema.org/name> ?pName.
} 

LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1037'), ('pName', 'director / manager')]
[('p', 'http://www.wikidata.org/prop/direct/P112'), ('pName', 'founded by')]
[('p', 'http://www.wikidata.org/prop/direct/P127'), ('pName', 'owned by')]
[('p', 'http://www.wikidata.org/prop/direct/P169'), ('pName', 'chief executive officer')]
[('p', 'http://www.wikidata.org/prop/direct/P1889'), ('pName', 'different from')]


5

In [114]:
# people hold or are connected to trade marks
queryString = """
SELECT DISTINCT ?connectionName (count(?personName) AS ?numberofpeople)
WHERE { 

?person wdt:P31 ?human.
?trademark wdt:P31/wdt:P279* wd:Q167270;
           ?connection ?person.
   
filter regex(?humanName,"human$",'i')
   
?person <http://schema.org/name> ?personName.
?human <http://schema.org/name> ?humanName.
?connection <http://schema.org/name> ?connectionName.
} 
order by desc (?numberofpeople)

"""

print("Results")
run_query(queryString)

Results
[('connectionName', 'founded by'), ('numberofpeople', '5288')]
[('connectionName', 'owned by'), ('numberofpeople', '486')]
[('connectionName', 'chief executive officer'), ('numberofpeople', '321')]
[('connectionName', 'named after'), ('numberofpeople', '294')]
[('connectionName', 'has part'), ('numberofpeople', '177')]
[('connectionName', 'director / manager'), ('numberofpeople', '161')]
[('connectionName', 'performer'), ('numberofpeople', '81')]
[('connectionName', 'board member'), ('numberofpeople', '72')]
[('connectionName', 'discoverer or inventor'), ('numberofpeople', '70')]
[('connectionName', 'chairperson'), ('numberofpeople', '69')]
[('connectionName', 'different from'), ('numberofpeople', '67')]
[('connectionName', 'creator'), ('numberofpeople', '57')]
[('connectionName', 'represents'), ('numberofpeople', '50')]
[('connectionName', 'significant person'), ('numberofpeople', '18')]
[('connectionName', 'designed by'), ('numberofpeople', '17')]
[('connectionName', 'partici

50

In [None]:
5.3 In which sector there is the highest number of trademarks?

In [88]:
queryString = """
SELECT DISTINCT (count(?trademarkName) as ?howmanytrademarks)  ?typeName 
WHERE { 

?trademark wdt:P31/wdt:P279* wd:Q167270;
           wdt:P452 ?type.
   

   
?type <http://schema.org/name> ?typeName.
?trademark <http://schema.org/name> ?trademarkName.
} 
order by desc (?howmanytrademarks)
LIMIT 5
"""

print("Results")
run_query(queryString)

Results
[('howmanytrademarks', '1598'), ('typeName', 'retail')]
[('howmanytrademarks', '403'), ('typeName', 'music industry')]
[('howmanytrademarks', '368'), ('typeName', 'restaurant')]
[('howmanytrademarks', '307'), ('typeName', 'automotive industry')]
[('howmanytrademarks', '262'), ('typeName', 'gastronomy')]


5