# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-8186b1393e-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Companies Workflow Series ("Economy of EU States") 

Consider the following exploratory information need:

> Compare businesses across different sectors and types in E.U. countries

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`      | country       | predicate |
| `wd:Q458`      | E.U.         | node |
| `wd:Q142`      | France       | node      |
| `wd:Q4830453`  | Business     | node      |
| `wd:Q6881511`  | Enterprise   | node      |
| `wd:Q29110228` | AXA          | node |
| `wd:Q43183`    | insurance    | node |




Also consider

```
{ 
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q6881511  . 
} UNION {
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q4830453  . 
}



```

is the BGP to retrieve all **french enterprises and businesses**

## Workload Goals

1. Identify the BGP for obtaining the type, legal form, and industry of a company

2. Identify the BGP to retrieve all companies owned by a company located in a EU country

3. Which company has the largest presence in E.U.?

4. Companies have different 'legal forms', compare the number of companies divided in different legal forms

5. Analyze the number of companies per type, legal form, and industry in each state
 
   5.1 What are the top-3 legal form in E.U.? 
   
   5.2 For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each country for a given legal form?
   
   5.3 Which business in each country owns more businesses in other E.U. countries?
   
   5.4 What can we say about industry sectors in various countries?


In [2]:
# start your workflow here

In [3]:
queryString = """
SELECT COUNT(*)
WHERE { 

{ 
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q6881511  . 
} UNION {
?p wdt:P17 wd:Q142  . 
?p wdt:P31 wd:Q4830453  . 
}
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '4577')]


1

# Task 1

*Identify the BGP for obtaining the type, legal form, and industry of a company*

I already know that AXA is an insurance company. I got a bit confused by the fact that the given node for AXA isn't the node for AXA but for Panasonic.

In [4]:
# Find the properties of a company (in this case AXA)
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q29110228 ?x ?y .
  ?x <http://schema.org/name> ?name .
} 
"""

print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/prop/direct/P1278'), ('name', 'Legal Entity Identifier')]
[('x', 'http://www.wikidata.org/prop/direct/P1320'), ('name', 'OpenCorporates ID')]
[('x', 'http://www.wikidata.org/prop/direct/P1454'), ('name', 'legal form')]
[('x', 'http://www.wikidata.org/prop/direct/P17'), ('name', 'country')]
[('x', 'http://www.wikidata.org/prop/direct/P1830'), ('name', 'owner of')]
[('x', 'http://www.wikidata.org/prop/direct/P213'), ('name', 'ISNI')]
[('x', 'http://www.wikidata.org/prop/direct/P2657'), ('name', 'EU Transparency Register ID')]
[('x', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('x', 'http://www.wikidata.org/prop/direct/P3500'), ('name', 'Ringgold ID')]
[('x', 'http://www.wikidata.org/prop/direct/P355'), ('name', 'subsidiary')]
[('x', 'http://www.wikidata.org/prop/direct/P3608'), ('name', 'EU VAT number')]
[('x', 'http://www.wikidata.org/prop/direct/P4776'), ('name', 'MAC Address Block Large ID')]
[('x', 'http://www.wikidata

17

In [6]:
# Find by what something connects to insurance (an industry)
queryString = """
SELECT distinct ?x ?name
WHERE { 
  ?y ?x wd:Q43183 .
  ?x <http://schema.org/name> ?name .
} 
"""

print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/prop/direct/P1423'), ('name', 'template has topic')]
[('x', 'http://www.wikidata.org/prop/direct/P101'), ('name', 'field of work')]
[('x', 'http://www.wikidata.org/prop/direct/P1056'), ('name', 'product or material produced')]
[('x', 'http://www.wikidata.org/prop/direct/P106'), ('name', 'occupation')]
[('x', 'http://www.wikidata.org/prop/direct/P1269'), ('name', 'facet of')]
[('x', 'http://www.wikidata.org/prop/direct/P1889'), ('name', 'different from')]
[('x', 'http://www.wikidata.org/prop/direct/P279'), ('name', 'subclass of')]
[('x', 'http://www.wikidata.org/prop/direct/P2959'), ('name', 'permanent duplicated item')]
[('x', 'http://www.wikidata.org/prop/direct/P301'), ('name', "category's main topic")]
[('x', 'http://www.wikidata.org/prop/direct/P31'), ('name', 'instance of')]
[('x', 'http://www.wikidata.org/prop/direct/P361'), ('name', 'part of')]
[('x', 'http://www.wikidata.org/prop/direct/P425'), ('name', 'field of this occupation')]
[('x',

15

In [8]:
# Find by what something connects to business (a type of company, maybe)
queryString = """
SELECT distinct ?x ?name
WHERE { 
  ?y ?x wd:Q4830453 .
  ?x <http://schema.org/name> ?name .
} 
"""

print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/prop/direct/P1423'), ('name', 'template has topic')]
[('x', 'http://www.wikidata.org/prop/direct/P101'), ('name', 'field of work')]
[('x', 'http://www.wikidata.org/prop/direct/P1056'), ('name', 'product or material produced')]
[('x', 'http://www.wikidata.org/prop/direct/P106'), ('name', 'occupation')]
[('x', 'http://www.wikidata.org/prop/direct/P1269'), ('name', 'facet of')]
[('x', 'http://www.wikidata.org/prop/direct/P127'), ('name', 'owned by')]
[('x', 'http://www.wikidata.org/prop/direct/P136'), ('name', 'genre')]
[('x', 'http://www.wikidata.org/prop/direct/P137'), ('name', 'operator')]
[('x', 'http://www.wikidata.org/prop/direct/P138'), ('name', 'named after')]
[('x', 'http://www.wikidata.org/prop/direct/P1454'), ('name', 'legal form')]
[('x', 'http://www.wikidata.org/prop/direct/P1535'), ('name', 'used by')]
[('x', 'http://www.wikidata.org/prop/direct/P159'), ('name', 'headquarters location')]
[('x', 'http://www.wikidata.org/prop/direct/P169

39

In [9]:
# Check instance of (maybe it's type?)
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q29110228 wdt:P31 ?x .
  ?x <http://schema.org/name> ?name .
} 
"""
# it is
print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q4830453'), ('name', 'business')]
[('x', 'http://www.wikidata.org/entity/Q6881511'), ('name', 'enterprise')]


2

In [12]:
# Find what is connected to the industry (for example insurance)
queryString = """
SELECT distinct ?x ?name
WHERE { 
  ?x wdt:P452 wd:Q43183 .
  ?x <http://schema.org/name> ?name .
  filter(?x= wd:Q29110228 )
} 

"""
# I'm pretty sure AXA does insurance so there is something strange
print("Results")
run_query(queryString)

Results
Empty


0

In [13]:
# Apparently is not connected in any way to insurance
queryString = """
SELECT distinct ?x ?name
WHERE { 
  ?x ?y wd:Q43183 .
  ?x <http://schema.org/name> ?name .
  filter(?x= wd:Q29110228 )
} 

"""

print("Results")
run_query(queryString)

Results
Empty


0

In [14]:
queryString = """
SELECT distinct ?x ?name
WHERE { 
  wd:Q43183 ?y ?x.
  ?x <http://schema.org/name> ?name .
  filter(?x= wd:Q29110228 )
} 

"""

print("Results")
run_query(queryString)

Results
Empty


0

In [18]:
queryString = """
SELECT distinct ?y ?name
WHERE { 
  ?x wdt:P452 wd:Q43183 .
  ?x ?y ?z .
  ?y <http://schema.org/name> ?name .
  filter(!regex(?name, "ID"))
} 

"""

print("Results")
run_query(queryString)

Results
[('y', 'http://www.wikidata.org/prop/direct/P9279'), ('name', 'Egapro gender equality index')]
[('y', 'http://www.wikidata.org/prop/direct/P4829'), ('name', 'Swiss Enterprise Identification Number')]
[('y', 'http://www.wikidata.org/prop/direct/P101'), ('name', 'field of work')]
[('y', 'http://www.wikidata.org/prop/direct/P1037'), ('name', 'director / manager')]
[('y', 'http://www.wikidata.org/prop/direct/P1056'), ('name', 'product or material produced')]
[('y', 'http://www.wikidata.org/prop/direct/P112'), ('name', 'founded by')]
[('y', 'http://www.wikidata.org/prop/direct/P1128'), ('name', 'employees')]
[('y', 'http://www.wikidata.org/prop/direct/P127'), ('name', 'owned by')]
[('y', 'http://www.wikidata.org/prop/direct/P1278'), ('name', 'Legal Entity Identifier')]
[('y', 'http://www.wikidata.org/prop/direct/P1299'), ('name', 'depicted by')]
[('y', 'http://www.wikidata.org/prop/direct/P1329'), ('name', 'phone number')]
[('y', 'http://www.wikidata.org/prop/direct/P1344'), ('name'

124

In [25]:
queryString = """
SELECT distinct ?name 
WHERE { 
  
  wd:Q29110228 wdt:P31 ?type .
  ?type <http://schema.org/name> ?name .
  
}
"""

print("Results")
run_query(queryString)

Results
[('name', 'business')]
[('name', 'enterprise')]


2

In [26]:
# ( Q29110228 is not AXA)
queryString = """
SELECT distinct ?name 
WHERE { 
  wd:Q29110228 <http://schema.org/name> ?name .
}
"""

print("Results")
run_query(queryString)

Results
[('name', 'Panasonic Europe Ltd')]


1

In [27]:
# New node for AXA (that is also connected with "industry" so we have everything for ANSWER 1)
queryString = """
SELECT distinct ?name ?x 
WHERE { 
  ?x wdt:P452 wd:Q43183 .
  ?x <http://schema.org/name> ?name .
  filter(regex(?name, "AXA"))
}
"""

print("Results")
run_query(queryString)

Results
[('name', 'AXA'), ('x', 'http://www.wikidata.org/entity/Q160054')]


1

Everything put together

In [29]:
# ANSWER 1
queryString = """
SELECT ?name ?legal ?type ?industry 
WHERE { 
  wd:Q160054 <http://schema.org/name> ?name .
  wd:Q160054 wdt:P1454 ?x . 
  wd:Q160054 wdt:P31 ?y .
  wd:Q160054 wdt:P452 ?z .
  ?x <http://schema.org/name> ?legal .
  ?y <http://schema.org/name> ?type .
  ?z <http://schema.org/name> ?industry . 
}
"""

print("Results")
run_query(queryString)

Results
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'insurance company'), ('industry', 'insurance')]
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'insurance company'), ('industry', 'financial services')]
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'enterprise'), ('industry', 'insurance')]
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'enterprise'), ('industry', 'financial services')]
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'public company'), ('industry', 'insurance')]
[('name', 'AXA'), ('legal', 'société anonyme'), ('type', 'public company'), ('industry', 'financial services')]


6

***

# Task 2

*Identify the BGP to retrieve all companies owned by a company located in an EU country*

In [30]:
# How a state is connected to the EU?
queryString = """
SELECT ?x
WHERE { 
  wd:Q142 ?x wd:Q458 
}
"""

print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/prop/direct/P463')]


1

In [33]:
# ANSWER to 2
queryString = """
SELECT (count(distinct ?company) as ?c)
WHERE { 
  ?x wdt:P17 ?country .
  ?country wdt:P463 wd:Q458 .
  ?x wdt:P1830 ?company .
}
"""

print("Results")
run_query(queryString)

Results
[('c', '11170')]


1

***

# Task 3

*Which company has the largest presence in E.U.?*

I assume that the presence of a company means in how many place I can find it, in this case in how many country I can find it

In [39]:
# Company that operate in more EU states
queryString = """
SELECT (count(distinct ?country) as ?cc) ?x
 WHERE { 
    ?x wdt:P17 ?country .
    ?x wdt:P31 wd:Q4830453 .
    ?country wdt:P463 wd:Q458 .
} group by ?x
order by desc (?x)
limit 5

"""
# it doesnt work, let's analyze a company that i know operates in more states
print("Results")
run_query(queryString)

Results
[('cc', '1'), ('x', 'http://www.wikidata.org/entity/Q99976386')]
[('cc', '1'), ('x', 'http://www.wikidata.org/entity/Q99976320')]
[('cc', '1'), ('x', 'http://www.wikidata.org/entity/Q99976174')]
[('cc', '1'), ('x', 'http://www.wikidata.org/entity/Q99968126')]
[('cc', '1'), ('x', 'http://www.wikidata.org/entity/Q99964754')]


5

I know for sure that Vodafone operates at least in the UK and in Italy so I start from analyzing it

In [41]:
# Check for vodafone
queryString = """
SELECT ?x ?name
 WHERE { 
    ?x wdt:P17 ?country .
    ?x wdt:P31 wd:Q4830453 .
    ?country wdt:P463 wd:Q458 .
    ?x <http://schema.org/name> ?name .
    filter(regex(lcase(str(?name)), "vodafone"))
} 

"""
# It exists a general one and the specific state ones
print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q122141'), ('name', 'Vodafone')]
[('x', 'http://www.wikidata.org/entity/Q7939294'), ('name', 'Vodafone UK')]
[('x', 'http://www.wikidata.org/entity/Q1560372'), ('name', 'Vodafone Kabel Deutschland')]
[('x', 'http://www.wikidata.org/entity/Q2529830'), ('name', 'Vodafone Germany')]
[('x', 'http://www.wikidata.org/entity/Q1673864'), ('name', 'Vodafone NRW GmbH')]
[('x', 'http://www.wikidata.org/entity/Q3486990'), ('name', 'Vodafone Czech Republic')]
[('x', 'http://www.wikidata.org/entity/Q98271125'), ('name', 'Vodafone Towers Czech Republic 1')]
[('x', 'http://www.wikidata.org/entity/Q6272576'), ('name', 'Vodafone Romania')]
[('x', 'http://www.wikidata.org/entity/Q7939286'), ('name', 'Vodafone Ireland')]
[('x', 'http://www.wikidata.org/entity/Q7939283'), ('name', 'Vodafone Hungary')]
[('x', 'http://www.wikidata.org/entity/Q108169117'), ('name', 'Vodafone Magyarország Mobil Távközlési')]
[('x', 'http://www.wikidata.org/entity/Q7939295'), ('nam

16

In [43]:
# How they are connected? 
queryString = """
SELECT ?x ?name
 WHERE { 
     wd:Q122141 ?x wd:Q3562042.
    ?x <http://schema.org/name> ?name .
} 

"""
# It exists a general one and the specific state ones
print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/prop/direct/P1830'), ('name', 'owner of')]
[('x', 'http://www.wikidata.org/prop/direct/P355'), ('name', 'subsidiary')]


2

In [3]:
# ANSWER to 3. Company that operate in more EU states
queryString = """
SELECT (count(distinct ?country) as ?cc) ?name
 WHERE { 
    ?x wdt:P1830|wdt:P355 ?subs .
    ?x wdt:P31 wd:Q4830453 .
    ?subs wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?x <http://schema.org/name> ?name .
} group by ?name
order by desc (?cc)
limit 100

"""
# The guess about Vodafone was pretty good, also I can check that there are more subsidiaries in the same country (e.g. germany and czech republic) but only the country is considered
print("Results")

run_query(queryString)

Results
[('cc', '11'), ('name', 'Siemens')]
[('cc', '11'), ('name', 'Vodafone')]
[('cc', '10'), ('name', 'Deutsche Telekom')]
[('cc', '10'), ('name', 'The Walt Disney Company')]
[('cc', '10'), ('name', 'Heineken')]
[('cc', '10'), ('name', 'Volkswagen Group')]
[('cc', '9'), ('name', 'Carlsberg Group')]
[('cc', '9'), ('name', 'Microsoft')]
[('cc', '9'), ('name', 'RELX Group')]
[('cc', '9'), ('name', 'Vinci')]
[('cc', '9'), ('name', 'ArcelorMittal')]
[('cc', '8'), ('name', 'Phoenix Pharmahandel')]
[('cc', '8'), ('name', 'TE Connectivity')]
[('cc', '8'), ('name', 'Hutchison Whampoa')]
[('cc', '8'), ('name', 'Unibail Rodamco Westfield')]
[('cc', '8'), ('name', 'Nokia')]
[('cc', '8'), ('name', 'LeasePlan Corporation')]
[('cc', '8'), ('name', 'BNP Paribas')]
[('cc', '7'), ('name', 'Gazprom')]
[('cc', '7'), ('name', 'Liberty Global')]
[('cc', '7'), ('name', 'TotalEnergies')]
[('cc', '7'), ('name', 'Medtronic plc')]
[('cc', '7'), ('name', 'Thermo Fisher Scientific')]
[('cc', '7'), ('name', 'Ora

100

***

# Task 4

*Companies have different 'legal forms', compare the number of companies divided in different legal forms*

In [8]:
queryString = """
SELECT distinct (count(?x) as ?xx) sample(?name) ?y
 WHERE { 
    ?x wdt:P1454/wdt:P279 ?y .
    ?x wdt:P31 wd:Q4830453 .
    ?x wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y <http://schema.org/name> ?name .
    
} group by ?y
order by desc (?xx)
limit 100
"""
print("Results")
run_query(queryString)

Results
[('xx', '29204'), ('callret-1', 'juridical person'), ('y', 'http://www.wikidata.org/entity/Q155076')]
[('xx', '17463'), ('callret-1', 'private limited liability company'), ('y', 'http://www.wikidata.org/entity/Q18624259')]
[('xx', '9308'), ('callret-1', 'joint-stock company'), ('y', 'http://www.wikidata.org/entity/Q134161')]
[('xx', '3089'), ('callret-1', 'limited company'), ('y', 'http://www.wikidata.org/entity/Q33685')]
[('xx', '2580'), ('callret-1', 'Gesellschaft mit beschränkter Haftung'), ('y', 'http://www.wikidata.org/entity/Q15829892')]
[('xx', '1067'), ('callret-1', 'Kommanditgesellschaft'), ('y', 'http://www.wikidata.org/entity/Q1780029')]
[('xx', '734'), ('callret-1', 'Aktiengesellschaft'), ('y', 'http://www.wikidata.org/entity/Q22084735')]
[('xx', '733'), ('callret-1', 'S.A.'), ('y', 'http://www.wikidata.org/entity/Q166280')]
[('xx', '593'), ('callret-1', 'public company'), ('y', 'http://www.wikidata.org/entity/Q891723')]
[('xx', '550'), ('callret-1', 'company'), ('y

100

***

# Task 5.1

*What are the top-3 legal form in E.U.?*

In [7]:
# Top 3 legal forms in eu 
queryString = """
SELECT distinct (count(?x) as ?xx) sample(?name) ?y
 WHERE { 
    ?x wdt:P1454 ?y .
    ?x wdt:P31 wd:Q4830453 .
    ?x wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y <http://schema.org/name> ?name .
    
} group by ?y
order by desc (?xx)
limit 3
"""
print("Results")
run_query(queryString)

Results
[('xx', '29204'), ('callret-1', 'juridical person'), ('y', 'http://www.wikidata.org/entity/Q155076')]
[('xx', '17463'), ('callret-1', 'private limited liability company'), ('y', 'http://www.wikidata.org/entity/Q18624259')]
[('xx', '9308'), ('callret-1', 'joint-stock company'), ('y', 'http://www.wikidata.org/entity/Q134161')]


3

In [65]:
# Get information about this very common legal type of which i don't understand the language
queryString = """
SELECT ?name1 ?name2 
 WHERE { 
   wd:Q15646299 ?x ?y .
   ?x <http://schema.org/name> ?name1 .
   ?y <http://schema.org/name> ?name2 .
} 
"""
# basically is the equivalent of the Srl in italy
print("Results")
run_query(queryString)

Results
[('name1', 'country'), ('name2', 'Czech Republic')]
[('name1', 'subclass of'), ('name2', 'juridical person')]
[('name1', 'subclass of'), ('name2', 'private limited liability company')]
[('name1', 'instance of'), ('name2', 'type of business entity in Czech Republic')]


4

In [5]:
# ANSWER to 5
queryString = """
SELECT distinct (count(?x) as ?xx) ?name
 WHERE { 
    ?x wdt:P1454 ?y .
    ?x wdt:P31 wd:Q4830453 .
    ?x wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P279 ?z .
    ?z <http://schema.org/name> ?name .
    filter(!regex(lcase(str(?name)),"juridical person"))
} group by ?name
order by desc (?xx)
limit 3
"""
print("Results")
run_query(queryString)

Results
[('xx', '17463'), ('name', 'private limited liability company')]
[('xx', '9308'), ('name', 'joint-stock company')]
[('xx', '3089'), ('name', 'limited company')]


3

***

# Task 5.2

*For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each country for a given legal form?*

First I have to take the hint from the question and understand if this value are present in every business.

In [71]:
# Business with market cap
queryString = """
SELECT DISTINCT ?x ?xx ?xxx
WHERE{
    ?y wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P2226 ?z .
    ?y wdt:P452 ?x1 .
    ?y wdt:P31 ?xx1 .
    ?y wdt:P1454 ?xxx1. 
    ?xxx1 wdt:P279 ?xxx2 .
    ?x1 <http://schema.org/name> ?x .
    ?xx1 <http://schema.org/name> ?xx .
    ?xxx2 <http://schema.org/name> ?xxx .
}limit 100
"""
print("Results")
run_query(queryString)

Results
[('x', 'electricity generation'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'electricity generation'), ('xx', 'business'), ('xxx', 'joint-stock company')]
[('x', 'electricity generation'), ('xx', 'enterprise'), ('xxx', 'public company')]
[('x', 'electricity generation'), ('xx', 'enterprise'), ('xxx', 'joint-stock company')]
[('x', 'electricity generation'), ('xx', 'public company'), ('xxx', 'public company')]
[('x', 'electricity generation'), ('xx', 'public company'), ('xxx', 'joint-stock company')]
[('x', 'retail'), ('xx', 'business'), ('xxx', 'joint-stock company')]
[('x', 'retail'), ('xx', 'business'), ('xxx', 'type of business entity')]
[('x', 'retail'), ('xx', 'public company'), ('xxx', 'joint-stock company')]
[('x', 'retail'), ('xx', 'public company'), ('xxx', 'type of business entity')]
[('x', 'mechanical engineering'), ('xx', 'business'), ('xxx', 'Aktiengesellschaft')]
[('x', 'mechanical engineering'), ('xx', 'business'), ('xxx', 'juridical person')]
[('x', 

100

In [75]:
# It seems that Srl do not have market cap (make sense). Basically only the public ones
queryString = """
SELECT DISTINCT ?x ?xx ?xxx
WHERE{
    ?y wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P2226 ?z .
    ?y wdt:P452 ?x1 .
    ?y wdt:P31 ?xx1 .
    ?y wdt:P1454 ?xxx1. 
    ?xxx1 wdt:P279 ?xxx2 .
    ?x1 <http://schema.org/name> ?x .
    ?xx1 <http://schema.org/name> ?xx .
    ?xxx2 <http://schema.org/name> ?xxx .
    filter(regex(?xxx, "private"))
}limit 100
"""
# i did the filtering for "liability" other than for private
print("Results")
run_query(queryString)

Results
Empty


0

In [77]:
# Asset wise there is no limitation it seems 
queryString = """
SELECT DISTINCT ?x ?xx ?xxx
WHERE{
    ?y wdt:P17 ?country .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P2403 ?z .
    ?y wdt:P452 ?x1 .
    ?y wdt:P31 ?xx1 .
    ?y wdt:P1454 ?xxx1. 
    ?xxx1 wdt:P279 ?xxx2 .
    ?x1 <http://schema.org/name> ?x .
    ?xx1 <http://schema.org/name> ?xx .
    ?xxx2 <http://schema.org/name> ?xxx .
    filter(regex(?xxx, "private"))
}limit 100
"""
# I also filtered for "public"
print("Results")
run_query(queryString)

Results
[('x', 'electricity generation'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'electricity generation'), ('xx', 'enterprise'), ('xxx', 'public company')]
[('x', 'electricity generation'), ('xx', 'public company'), ('xxx', 'public company')]
[('x', 'transport'), ('xx', 'railway company'), ('xxx', 'public company')]
[('x', 'transport'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'transport'), ('xx', 'public company'), ('xxx', 'public company')]
[('x', 'energy industry'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'energy industry'), ('xx', 'public limited company'), ('xxx', 'public company')]
[('x', 'electrical appliance'), ('xx', 'brand'), ('xxx', 'public company')]
[('x', 'electrical appliance'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'electrical appliance'), ('xx', 'public company'), ('xxx', 'public company')]
[('x', 'aircraft construction'), ('xx', 'business'), ('xxx', 'public company')]
[('x', 'aircraft construction'), ('xx', 'e

100


Based on the analysis done before I can say that the market capitalization is only for the public traded company while the total assets are present for every business.

I did not filter the specific or more general legal types because I think they are interesting 

In [9]:
# ANSWER to 5 (for france but can be for any state)
queryString = """
SELECT DISTINCT ?name (sum(?mcap) as ?tot) (max(?mcap) as ?max) (min(?mcap) as ?min) (avg(?mcap) as ?avg)
WHERE {
    ?y wdt:P17 ?country  .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P1454 ?x .
    ?x wdt:P279 ?legalform .
    ?y wdt:P2226 ?mcap .
    ?legalform <http://schema.org/name> ?name .
    filter(!regex(?name,"business entity"))
}group by (?name )
order by desc (?tot)
"""

print("Results")
run_query(queryString)

Results
[('name', 'joint-stock company'), ('tot', '1723383755522.7'), ('max', '289600000000'), ('min', '3896025.7'), ('avg', '71807656480.1125')]
[('name', 'juridical person'), ('tot', '1601114860100'), ('max', '289600000000'), ('min', '111760000'), ('avg', '84269203163.157894736842105')]
[('name', 'aktiebolag'), ('tot', '415000000000'), ('max', '415000000000'), ('min', '415000000000'), ('avg', '415000000000')]
[('name', 'public company'), ('tot', '270427000000'), ('max', '185400000000'), ('min', '3500000000'), ('avg', '54085400000')]
[('name', 'S.A.'), ('tot', '269105000000'), ('max', '139000000000'), ('min', '32000000000'), ('avg', '67276250000')]
[('name', 'Aktiengesellschaft'), ('tot', '250462660100'), ('max', '139800000000'), ('min', '111760000'), ('avg', '41743776683.333333333333333')]
[('name', 'Oy'), ('tot', '26257000000'), ('max', '26257000000'), ('min', '26257000000'), ('avg', '26257000000')]
[('name', 'limited company'), ('tot', '9431200000'), ('max', '9431200000'), ('min', 

8

In [87]:
# ANSWER to 5 (total assets instead of market cap, is the same so i didn't change var names)
queryString = """
SELECT DISTINCT ?name (sum(?mcap) as ?tot) (max(?mcap) as ?max) (min(?mcap) as ?min) (avg(?mcap) as ?avg)
WHERE {
    ?y wdt:P17 ?country  .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P1454 ?x .
    ?x wdt:P279 ?legalform .
    ?y wdt:P2403 ?mcap .
    ?legalform <http://schema.org/name> ?name .
    filter(!regex(?name,"business entity"))
}group by (?name )
order by desc (?tot)
"""

print("Results")
run_query(queryString)

Results
[('name', 'juridical person'), ('tot', '16574162319972.78'), ('max', '1458650000000'), ('min', '0'), ('avg', '19249898164.892891986062718')]
[('name', 'joint-stock company'), ('tot', '16263319620387'), ('max', '1458650000000'), ('min', '3165000'), ('avg', '29731845741.109689213893967')]
[('name', 'limited company'), ('tot', '5572286762564.16'), ('max', '4103786000000'), ('min', '58464000'), ('avg', '398020483040.297142857142857')]
[('name', 'public company'), ('tot', '3914905010000'), ('max', '2374986000000'), ('min', '3165000'), ('avg', '156596200400')]
[('name', 'S.A.'), ('tot', '1741716687000'), ('max', '698690000000'), ('min', '3310735000'), ('avg', '108857292937.5')]
[('name', 'private limited liability company'), ('tot', '1550513127387'), ('max', '75540700000'), ('min', '0'), ('avg', '4891208603.744479495268139')]
[('name', 'Aktiengesellschaft'), ('tot', '1142490358839'), ('max', '299060000000'), ('min', '82339839'), ('avg', '95207529903.25')]
[('name', 'company'), ('tot'

34

In [14]:
# ANSWER to 5 (For every country, market cap)
queryString = """
SELECT DISTINCT ?name ?nameC (sum(?mcap) as ?tot) (max(?mcap) as ?max) (min(?mcap) as ?min) (avg(?mcap) as ?avg)
WHERE {
    ?y wdt:P17 ?country  .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P1454 ?x .
    ?x wdt:P279 ?legalform .
    ?y wdt:P2226 ?mcap .
    ?legalform <http://schema.org/name> ?name .
    ?country  <http://schema.org/name> ?nameC .
    filter(!regex(?name,"business entity"))
}group by ?name ?nameC
order by desc (?tot)
"""

print("Results")
run_query(queryString)

Results
[('name', 'joint-stock company'), ('nameC', 'Czech Republic'), ('tot', '832800000000'), ('max', '286200000000'), ('min', '7100000000'), ('avg', '138800000000')]
[('name', 'juridical person'), ('nameC', 'Czech Republic'), ('tot', '832800000000'), ('max', '286200000000'), ('min', '7100000000'), ('avg', '138800000000')]
[('name', 'juridical person'), ('nameC', 'Denmark'), ('tot', '427356000000'), ('max', '289600000000'), ('min', '25256000000'), ('avg', '142452000000')]
[('name', 'joint-stock company'), ('nameC', 'Denmark'), ('tot', '427356000000'), ('max', '289600000000'), ('min', '25256000000'), ('avg', '142452000000')]
[('name', 'aktiebolag'), ('nameC', 'Finland'), ('tot', '415000000000'), ('max', '415000000000'), ('min', '415000000000'), ('avg', '415000000000')]
[('name', 'Aktiengesellschaft'), ('nameC', 'Germany'), ('tot', '250462660100'), ('max', '139800000000'), ('min', '111760000'), ('avg', '41743776683.333333333333333')]
[('name', 'juridical person'), ('nameC', 'Germany'),

27

In [15]:
# ANSWER to 5 (For every country, total assets)
queryString = """
SELECT DISTINCT ?name ?nameC (sum(?mcap) as ?tot) (max(?mcap) as ?max) (min(?mcap) as ?min) (avg(?mcap) as ?avg)
WHERE {
    ?y wdt:P17 ?country  .
    ?country wdt:P463 wd:Q458 .
    ?y wdt:P31 wd:Q4830453 .
    ?y wdt:P1454 ?x .
    ?x wdt:P279 ?legalform .
    ?y wdt:P2403 ?mcap .
    ?legalform <http://schema.org/name> ?name .
    ?country  <http://schema.org/name> ?nameC .
    filter(!regex(?name,"business entity"))
}group by ?name ?nameC
order by desc (?tot)
"""

print("Results")
run_query(queryString)

Results
[('name', 'juridical person'), ('nameC', 'Czech Republic'), ('tot', '12796785034570'), ('max', '1458650000000'), ('min', '0'), ('avg', '16056192013.262233375156838')]
[('name', 'joint-stock company'), ('nameC', 'Czech Republic'), ('tot', '11493197007680'), ('max', '1458650000000'), ('min', '3165000'), ('avg', '23697313417.896907216494845')]
[('name', 'limited company'), ('nameC', 'Hungary'), ('tot', '4823786000000'), ('max', '4103786000000'), ('min', '720000000000'), ('avg', '2411893000000')]
[('name', 'public company'), ('nameC', 'United Kingdom'), ('tot', '3471337625000'), ('max', '2374986000000'), ('min', '682825000'), ('avg', '495905375000')]
[('name', 'private limited liability company'), ('nameC', 'Czech Republic'), ('tot', '1543228456890'), ('max', '75540700000'), ('min', '0'), ('avg', '4962149379.067524115755627')]
[('name', 'joint-stock company'), ('nameC', 'Spain'), ('tot', '1345895083000'), ('max', '1340000000000'), ('min', '5895083000'), ('avg', '672947541500')]
[('

97

***

# Task 5.3

*Which business in each country owns more businesses in other E.U. countries?*

In [6]:
# Companies that work in other states
queryString = """
select ?x (count(distinct ?countries) as ?tot) sample(?name)
where {
  ?x wdt:P31 wd:Q4830453 .
  ?x wdt:P17 ?country .
  ?country wdt:P463 wd:Q458 .
  ?x wdt:P355|wdt:P1830*/wdt:P17 ?countries .
  ?countries wdt:P463 wd:Q458 .
  filter(?countries!=?country)
  ?x <http://schema.org/name> ?name .
} group by ?x
order by desc (?tot)
limit 50
"""
print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q2357706'), ('tot', '13'), ('callret-2', 'Lycamobile')]
[('x', 'http://www.wikidata.org/entity/Q113215'), ('tot', '10'), ('callret-2', 'Carlsberg Group')]
[('x', 'http://www.wikidata.org/entity/Q1344573'), ('tot', '9'), ('callret-2', 'Foyer S.A.')]
[('x', 'http://www.wikidata.org/entity/Q1412968'), ('tot', '8'), ('callret-2', 'NRJ Group')]
[('x', 'http://www.wikidata.org/entity/Q608518'), ('tot', '7'), ('callret-2', 'Unibail Rodamco Westfield')]
[('x', 'http://www.wikidata.org/entity/Q128738'), ('tot', '7'), ('callret-2', 'Anheuser-Busch InBev')]
[('x', 'http://www.wikidata.org/entity/Q1550912'), ('tot', '7'), ('callret-2', 'GMV Innovating Solutions')]
[('x', 'http://www.wikidata.org/entity/Q407009'), ('tot', '6'), ('callret-2', '3')]
[('x', 'http://www.wikidata.org/entity/Q903805'), ('tot', '6'), ('callret-2', 'hema')]
[('x', 'http://www.wikidata.org/entity/Q59163923'), ('tot', '6'), ('callret-2', 'EP Investment')]
[('x', 'http://www.wiki

50

I noticed that Vodafone has less susidiaries that what I've seen before so I had to check what was wrong with the query

In [54]:
# Check on vodafone because there should be more
queryString = """
select distinct ?name
where {
  wd:Q122141 wdt:P355|wdt:P1830*/wdt:P17 ?countries .
  ?countries wdt:P463 wd:Q458 .
 ?countries <http://schema.org/name> ?name .
}
"""
print("Results")
run_query(queryString)

Results
[('name', 'United Kingdom')]
[('name', 'Germany')]
[('name', 'Italy')]
[('name', 'Greece')]
[('name', 'Portugal')]


5

Checking in the results I had before in the notebook I've seen that there are for sure other countries like, for example, Hungary

In [55]:
# Check for vodafone hungary
queryString = """
SELECT ?x ?name
 WHERE { 
    wd:Q108169117 wdt:P17 ?x .
    ?x <http://schema.org/name> ?name .
} 

"""
# It exists so something is of with the paths
print("Results")
run_query(queryString)

Results
[('x', 'http://www.wikidata.org/entity/Q28'), ('name', 'Hungary')]


1

In [57]:
# Check on vodafone with different path
queryString = """
select distinct ?name
where {
  wd:Q122141 wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
  ?countries wdt:P463 wd:Q458 .
 ?countries <http://schema.org/name> ?name .
}
"""
print("Results")
run_query(queryString)

Results
[('name', 'France')]
[('name', 'United Kingdom')]
[('name', 'Germany')]
[('name', 'Czech Republic')]
[('name', 'Slovakia')]
[('name', 'Romania')]
[('name', 'Malta')]
[('name', 'Ireland')]
[('name', 'Hungary')]
[('name', 'Spain')]
[('name', 'Belgium')]
[('name', 'Poland')]
[('name', 'Italy')]
[('name', 'Greece')]
[('name', 'Portugal')]


15

In [60]:
# Companies that work in other states (with new path)
queryString = """
select ?name (count(distinct ?countries) as ?tot)
where {
  ?x wdt:P31 wd:Q4830453 .
  ?x wdt:P17 ?country .
  ?country wdt:P463 wd:Q458 .
  ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
  ?countries wdt:P463 wd:Q458 .
  filter(?countries!=?country)
  ?x <http://schema.org/name> ?name .
} group by ?name
order by desc (?tot)
limit 50
"""
# there are 2 vodafone now, but I guess it depends on the level of granurality on which I want to stop
print("Results")
run_query(queryString)

Results
[('name', 'Volkswagen Group'), ('tot', '14')]
[('name', 'Porsche Automobil Holding SE'), ('tot', '14')]
[('name', 'Vodafone'), ('tot', '14')]
[('name', 'Hannoversche Beteiligungsgesellschaft'), ('tot', '14')]
[('name', 'Lycamobile'), ('tot', '13')]
[('name', 'Siemens'), ('tot', '11')]
[('name', 'Nokia'), ('tot', '11')]
[('name', 'Carlsberg Group'), ('tot', '11')]
[('name', 'Deutsche Telekom'), ('tot', '10')]
[('name', 'Foyer S.A.'), ('tot', '9')]
[('name', 'Energetický a průmyslový holding'), ('tot', '8')]
[('name', 'NRJ Group'), ('tot', '8')]
[('name', 'Vinci'), ('tot', '8')]
[('name', 'ArcelorMittal'), ('tot', '8')]
[('name', 'RELX Group'), ('tot', '8')]
[('name', 'Phoenix Pharmahandel'), ('tot', '7')]
[('name', 'Anheuser-Busch InBev'), ('tot', '7')]
[('name', 'BNP Paribas'), ('tot', '7')]
[('name', 'GMV Innovating Solutions'), ('tot', '7')]
[('name', 'Vivendi'), ('tot', '7')]
[('name', 'Vodafone Germany'), ('tot', '7')]
[('name', 'Unibail Rodamco Westfield'), ('tot', '7')]
[

50

There is a general "Vodafone" but also a "Vodafone Germany"

In [61]:
# I can try to specify the fact that the ones that i want to be shown must have no parent
queryString = """
select ?name (count(distinct ?countries) as ?tot)
where {
  ?x wdt:P31 wd:Q4830453 .
  ?x wdt:P17 ?country .
  ?country wdt:P463 wd:Q458 .
  ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
  ?countries wdt:P463 wd:Q458 .
  filter(?countries!=?country)
  filter not exists { ?x wdt:P749 ?zzz}
  ?x <http://schema.org/name> ?name .
} group by ?name
order by desc (?tot)
limit 50
"""
# nice
print("Results")
run_query(queryString)

Results
[('name', 'Volkswagen Group'), ('tot', '14')]
[('name', 'Porsche Automobil Holding SE'), ('tot', '14')]
[('name', 'Vodafone'), ('tot', '14')]
[('name', 'Hannoversche Beteiligungsgesellschaft'), ('tot', '14')]
[('name', 'Lycamobile'), ('tot', '13')]
[('name', 'Siemens'), ('tot', '11')]
[('name', 'Nokia'), ('tot', '11')]
[('name', 'Carlsberg Group'), ('tot', '11')]
[('name', 'Deutsche Telekom'), ('tot', '10')]
[('name', 'Foyer S.A.'), ('tot', '9')]
[('name', 'Energetický a průmyslový holding'), ('tot', '8')]
[('name', 'NRJ Group'), ('tot', '8')]
[('name', 'Vinci'), ('tot', '8')]
[('name', 'ArcelorMittal'), ('tot', '8')]
[('name', 'RELX Group'), ('tot', '8')]
[('name', 'Phoenix Pharmahandel'), ('tot', '7')]
[('name', 'Anheuser-Busch InBev'), ('tot', '7')]
[('name', 'BNP Paribas'), ('tot', '7')]
[('name', 'GMV Innovating Solutions'), ('tot', '7')]
[('name', 'Vivendi'), ('tot', '7')]
[('name', 'Unibail Rodamco Westfield'), ('tot', '7')]
[('name', 'Liberty Global'), ('tot', '7')]
[('

50

In [7]:
# I want to specify the fact that the ones that i want to be shown must have no parent
queryString = """

    select  ?name ?nameC (count(distinct ?countries) as ?tot)
    where {
      ?x wdt:P31 wd:Q4830453 .
      ?x wdt:P17 ?country .
      ?country wdt:P463 wd:Q458 .
      ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
      ?countries wdt:P463 wd:Q458 .
      filter(?countries!=?country)
      filter not exists { ?x wdt:P749 ?zzz}
      ?x <http://schema.org/name> ?name .
      ?country <http://schema.org/name> ?nameC .
    } group by ?name ?nameC
    order by desc (?tot)
    limit 50

"""
# nice
print("Results")
run_query(queryString)

Results
[('name', 'Volkswagen Group'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Vodafone'), ('nameC', 'United Kingdom'), ('tot', '14')]
[('name', 'Hannoversche Beteiligungsgesellschaft'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Porsche Automobil Holding SE'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Lycamobile'), ('nameC', 'Portugal'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'United Kingdom'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'France'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Romania'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Denmark'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Italy'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Ireland'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Austria'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Spain'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Sweden'), ('tot', '12')]
[('name', 'Lycamobile'), ('nameC', 'Poland'), ('tot', '12')]
[('na

50

In [69]:
# Now it works also with the countries but apparently P17 is not the right property. Checking back P159 might be what i need 
queryString = """

    select  ?name ?nameC (count(distinct ?countries) as ?tot)
    where {
      ?x wdt:P31 wd:Q4830453 .
      ?x wdt:P159/wdt:P17 ?country .
      ?country wdt:P463 wd:Q458 .
      ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
      ?countries wdt:P463 wd:Q458 .
      filter(?countries!=?country)
      filter not exists { ?x wdt:P749 ?zzz}
      ?x <http://schema.org/name> ?name .
      ?country <http://schema.org/name> ?nameC .
    } group by ?name ?nameC
    order by desc (?tot)
    limit 50

"""
# nice
print("Results")
run_query(queryString)

Results
[('name', 'Volkswagen Group'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Vodafone'), ('nameC', 'United Kingdom'), ('tot', '14')]
[('name', 'Hannoversche Beteiligungsgesellschaft'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Porsche Automobil Holding SE'), ('nameC', 'Germany'), ('tot', '14')]
[('name', 'Lycamobile'), ('nameC', 'United Kingdom'), ('tot', '12')]
[('name', 'Siemens'), ('nameC', 'Germany'), ('tot', '11')]
[('name', 'Nokia'), ('nameC', 'Finland'), ('tot', '11')]
[('name', 'Deutsche Telekom'), ('nameC', 'Germany'), ('tot', '10')]
[('name', 'Heineken'), ('nameC', 'Kingdom of the Netherlands'), ('tot', '10')]
[('name', 'PPF Group'), ('nameC', 'Kingdom of the Netherlands'), ('tot', '9')]
[('name', 'Foyer S.A.'), ('nameC', 'Luxembourg'), ('tot', '8')]
[('name', 'RELX Group'), ('nameC', 'Kingdom of the Netherlands'), ('tot', '8')]
[('name', 'Tyco International'), ('nameC', 'Ireland'), ('tot', '8')]
[('name', 'LeasePlan Corporation'), ('nameC', 'Kingdom of the Ne

50

In [75]:
# Group by states
queryString = """
select (max(?tot) as ?max) ?nameC where{
    select  ?name ?nameC (count(distinct ?countries) as ?tot)
    where {
      ?x wdt:P31 wd:Q4830453 .
      ?x wdt:P159/wdt:P17 ?country .
      ?country wdt:P463 wd:Q458 .
      ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
      ?countries wdt:P463 wd:Q458 .
      filter(?countries!=?country)
      filter not exists { ?x wdt:P749 ?zzz}
      ?x <http://schema.org/name> ?name .
      ?country <http://schema.org/name> ?nameC .
    } group by ?name ?nameC
    order by desc (?tot)
    limit 100
} group by ?nameC 
order by desc (?max)
"""
# nice
print("Results")
run_query(queryString)

Results
[('max', '14'), ('nameC', 'United Kingdom')]
[('max', '14'), ('nameC', 'Germany')]
[('max', '11'), ('nameC', 'Finland')]
[('max', '10'), ('nameC', 'Kingdom of the Netherlands')]
[('max', '8'), ('nameC', 'Czech Republic')]
[('max', '8'), ('nameC', 'France')]
[('max', '8'), ('nameC', 'Luxembourg')]
[('max', '8'), ('nameC', 'Ireland')]
[('max', '7'), ('nameC', 'Belgium')]
[('max', '6'), ('nameC', 'Spain')]
[('max', '6'), ('nameC', 'Italy')]
[('max', '6'), ('nameC', 'Sweden')]
[('max', '5'), ('nameC', 'Hungary')]
[('max', '5'), ('nameC', 'Austria')]
[('max', '4'), ('nameC', 'Cyprus')]


15

Finally putting everything together

In [82]:
# ANSWER to 5.3
queryString = """
select (max(?tot) as ?max) sample(?name) ?nameC where{
    {select  ?name ?nameC (count(distinct ?countries) as ?tot)
    where {
      ?x wdt:P31 wd:Q4830453 .
      ?x wdt:P159/wdt:P17 ?country .
      ?country wdt:P463 wd:Q458 .
      ?x wdt:P355*/wdt:P17|wdt:P1830*/wdt:P17 ?countries .
      ?countries wdt:P463 wd:Q458 .
      filter(?countries!=?country)
      filter not exists { ?x wdt:P749 ?zzz}
      ?x <http://schema.org/name> ?name .
      ?country <http://schema.org/name> ?nameC .
    } group by ?name ?nameC
    order by desc (?tot)
}
} group by ?nameC 
order by desc (?max)
"""
# nice
print("Results")
run_query(queryString)

Results
[('max', '14'), ('callret-1', 'Vodafone'), ('nameC', 'United Kingdom')]
[('max', '14'), ('callret-1', 'Volkswagen Group'), ('nameC', 'Germany')]
[('max', '11'), ('callret-1', 'Nokia'), ('nameC', 'Finland')]
[('max', '10'), ('callret-1', 'Heineken'), ('nameC', 'Kingdom of the Netherlands')]
[('max', '8'), ('callret-1', 'Energetický a průmyslový holding'), ('nameC', 'Czech Republic')]
[('max', '8'), ('callret-1', 'Tyco International'), ('nameC', 'Ireland')]
[('max', '8'), ('callret-1', 'Foyer S.A.'), ('nameC', 'Luxembourg')]
[('max', '8'), ('callret-1', 'Vinci'), ('nameC', 'France')]
[('max', '7'), ('callret-1', 'Anheuser-Busch InBev'), ('nameC', 'Belgium')]
[('max', '6'), ('callret-1', 'Embracer Group'), ('nameC', 'Sweden')]
[('max', '6'), ('callret-1', 'GMV Innovating Solutions'), ('nameC', 'Spain')]
[('max', '6'), ('callret-1', 'Assicurazioni Generali'), ('nameC', 'Italy')]
[('max', '5'), ('callret-1', 'Müller'), ('nameC', 'Austria')]
[('max', '5'), ('callret-1', 'MOL Group'),

27

***

# Task 5.4

*What can we say about industry sectors in various countries?*

Other than the things analyzed so far, I think that is interesting to analyze in what business are the majority of companies in a country

In [8]:
# ANSWER to 7, Most varied industry in each state
queryString = """
select (max(?tot) as ?max) sample(?name) ?nameC where{
    {select  ?name ?nameC (count(distinct ?x) as ?tot)
    where {
      ?x wdt:P31 wd:Q4830453 .
      ?x wdt:P159*/wdt:P17 ?country .
      ?country wdt:P463 wd:Q458 .
      ?x wdt:P355/wdt:P452|wdt:P1830*/wdt:P452 ?countries .
      filter not exists { ?x wdt:P749 ?zzz}
      ?countries <http://schema.org/name> ?name .
      ?country <http://schema.org/name> ?nameC .
    } group by ?name ?nameC
    order by desc (?tot)
}
} group by ?nameC 
order by desc (?max)
"""
# nice
print("Results")
run_query(queryString)

Results
[('max', '478'), ('callret-1', 'Crop production'), ('nameC', 'United Kingdom')]
[('max', '244'), ('callret-1', 'food and tobacco industry'), ('nameC', 'Germany')]
[('max', '91'), ('callret-1', 'textile industry'), ('nameC', 'Belgium')]
[('max', '88'), ('callret-1', 'Financial service activities, except insurance and pension funding'), ('nameC', 'France')]
[('max', '48'), ('callret-1', 'automotive industry'), ('nameC', 'Italy')]
[('max', '36'), ('callret-1', 'Crop production'), ('nameC', 'Kingdom of the Netherlands')]
[('max', '28'), ('callret-1', 'retail'), ('nameC', 'Poland')]
[('max', '27'), ('callret-1', 'Finance'), ('nameC', 'Czech Republic')]
[('max', '27'), ('callret-1', 'financial services'), ('nameC', 'Spain')]
[('max', '24'), ('callret-1', 'retail'), ('nameC', 'Ireland')]
[('max', '21'), ('callret-1', 'retail'), ('nameC', 'Sweden')]
[('max', '21'), ('callret-1', 'retail'), ('nameC', 'Denmark')]
[('max', '15'), ('callret-1', 'Financial service activities, except insuran

28

28 results and not 27 because it is still considers the UK. Before I had only 27 because Romania wasn't considered. This means that there are not independent Romanian companies that have presence in other states.