# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p sc:name ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-e5a4ae57ad-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Companies Workflow Series ("IT Companies explorative search") 

Consider the following exploratory information need:

> Compare companies across different sectors in U.K., U.S., and Canada, consider number of employees, companies owned or acquired, and revenue or assets

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`      | country       | predicate |
| `wd:Q4830453`  | Business      | node      |
| `wd:Q13977`    | Bloomberg L.P.| node |
| `wd:Q502121`   | BlackBerry    | node |
| `wd:Q16`        | Canada        | node |
| `wd:Q145`      | U.K.          | node |
| `wd:Q30`       | U.S.A.        | node |


Also consider

```
?p wdt:P17 wd:Q16  . 
?p wdt:P31 wd:Q4830453  . 
```

is the BGP to retrieve all **canadian businesses**

## Workload Goals

1. Identify the BGP for obtaining number of employees of a company and other relevant numerical attributes

2. Identify the BGP to retrieve all companies owned by a company

3. Is there some company that owns companies in other countries?

4. Companies have different 'legal forms', compare the number of companies divided in different legal forms

5. Analyze the number of employees  and other relevant numeric attributes
 
   5.1 What are the top-10 companies for a given attribute?
   
   5.2 For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each category and country?
   
   5.3 Which business in each country owns more businesses in other countries?


In [2]:
# start your workflow here

In [4]:
queryString = """
SELECT COUNT(*)
WHERE { 

?p wdt:P17 wd:Q16  . 
?p wdt:P31 wd:Q4830453  . 
} 
"""

print("Results")
run_query(queryString)

Results
[('callret-0', '2482')]


1

## Task 1
Identify the BGP for obtaining number of employees of a company and other relevant numerical attributes

Starting from the companies I look for the relations which object is numeric

In [4]:
queryString = """
SELECT DISTINCT ?pr ?name
WHERE { 

?p wdt:P31 wd:Q4830453 ;
    ?pr ?o.
?pr sc:name ?name .
FILTER(isNumeric(?o))
}
"""

print("Results")
run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P1128'), ('name', 'employees')]
[('pr', 'http://www.wikidata.org/prop/direct/P1661'), ('name', 'Alexa rank')]
[('pr', 'http://www.wikidata.org/prop/direct/P2139'), ('name', 'total revenue')]
[('pr', 'http://www.wikidata.org/prop/direct/P2295'), ('name', 'net profit')]
[('pr', 'http://www.wikidata.org/prop/direct/P2403'), ('name', 'total assets')]
[('pr', 'http://www.wikidata.org/prop/direct/P3362'), ('name', 'operating income')]
[('pr', 'http://www.wikidata.org/prop/direct/P8687'), ('name', 'social media followers')]
[('pr', 'http://www.wikidata.org/prop/direct/P2137'), ('name', 'total equity')]
[('pr', 'http://www.wikidata.org/prop/direct/P2138'), ('name', 'total liabilities')]
[('pr', 'http://www.wikidata.org/prop/direct/P8340'), ('name', 'estimated value')]
[('pr', 'http://www.wikidata.org/prop/direct/P9279'), ('name', 'Egapro gender equality index')]
[('pr', 'http://www.wikidata.org/prop/direct/P2043'), ('name', 'length')]
[('pr'

72

For example we can look for employees, revenue, net profit, market cap and total assets.

In [5]:
queryString = """
SELECT ?name ?employees ?revenues ?net_profit ?market_cap ?tot_assets
WHERE { 

?p wdt:P31 wd:Q4830453.
 
?p  wdt:P1128 ?employees;
    wdt:P2139 ?revenues;
    wdt:P2295 ?net_profit;
    wdt:P2226 ?market_cap;
    wdt:P2403 ?tot_assets.

?p sc:name ?name .
}
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('name', 'HHLA'), ('employees', '5528'), ('revenues', '1177700000'), ('net_profit', '105000000'), ('market_cap', '1239900000'), ('tot_assets', '2610019000')]
[('name', 'Vinci'), ('employees', '222397'), ('revenues', '48053000000'), ('net_profit', '3408000000'), ('market_cap', '60000000000'), ('tot_assets', '91102000000')]
[('name', 'ČEZ Group'), ('employees', '31385'), ('revenues', '184486000000'), ('net_profit', '10500000000'), ('market_cap', '265300000000'), ('tot_assets', '707443000000')]
[('name', 'ČEZ Group'), ('employees', '31385'), ('revenues', '184486000000'), ('net_profit', '10500000000'), ('market_cap', '286200000000'), ('tot_assets', '707443000000')]
[('name', 'S.P. Korolev Rocket and Space Corporation Energia'), ('employees', '7791'), ('revenues', '42373811000'), ('net_profit', '1232438000'), ('market_cap', '6858000000'), ('tot_assets', '114995477000')]
[('name', 'Telefónica'), ('employees', '121853'), ('revenues', '48693000000'), ('net_profit', '3950000000'), ('ma

20

I found that for the same company I have more results, probably due to the fact that we have data of different years

## Task 2
Identify the BGP to retrieve all companies owned by a company

Starting from the companies, I look for predicate similar to "own"

In [6]:
queryString = """
SELECT DISTINCT ?pr ?name
WHERE { 

?p wdt:P31 wd:Q4830453.
?p ?pr ?o.

?pr sc:name ?name .
FILTER (!isLiteral(?o)).

FILTER (REGEX(?name,".*wn.*"))
}
ORDER BY (?name)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P4945'), ('name', 'download link')]
[('pr', 'http://www.wikidata.org/prop/direct/P127'), ('name', 'owned by')]
[('pr', 'http://www.wikidata.org/prop/direct/P1830'), ('name', 'owner of')]


3

Found that `P127` gives me who owned this company. Probably it is more efficient to start from a company and look for the property `P1830` that tell me the companies owned by the company with I started with.

In [7]:
queryString = """
SELECT ?company (count(DISTINCT ?o) AS ?companies_owned)
WHERE { 

?p wdt:P31 wd:Q4830453.
?p wdt:P1830 ?o.

?p sc:name ?company .

}
GROUP BY ?p ?company
ORDER BY DESC (?companies_owned)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('company', 'The Walt Disney Company'), ('companies_owned', '224')]
[('company', 'Google'), ('companies_owned', '136')]
[('company', 'EFE Trenes de Chile'), ('companies_owned', '133')]
[('company', 'Bell Media'), ('companies_owned', '125')]
[('company', 'Azienda Trasporti Milanesi'), ('companies_owned', '109')]
[('company', 'DSB'), ('companies_owned', '101')]
[('company', 'WarnerMedia'), ('companies_owned', '93')]
[('company', 'Microsoft'), ('companies_owned', '81')]
[('company', 'Sony'), ('companies_owned', '70')]
[('company', 'BBC'), ('companies_owned', '70')]
[('company', 'ATAC'), ('companies_owned', '69')]
[('company', 'Viacom'), ('companies_owned', '56')]
[('company', 'RAI'), ('companies_owned', '53')]
[('company', 'Budapesti Közlekedési Zrt.'), ('companies_owned', '53')]
[('company', 'AG für Verkehrswesen'), ('companies_owned', '51')]
[('company', 'Yahoo'), ('companies_owned', '47')]
[('company', 'Discovery Inc.'), ('companies_owned', '46')]
[('company', 'White Star Line

50

## Task 3
Is there some company that owns companies in other countries?

Starting from the country of the company I look for the companies owned and I look for the related country. I perform an ask query first.

In [8]:
queryString = """
ASK WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P17 ?c;
    wdt:P1830 ?o.
?o wdt:P17 ?own_c.
FILTER(?c != ?own_c).

}
"""

print("Results")
run_ask_query(queryString)

Results


{'head': {'link': []}, 'boolean': True}

There is at least one company that owned companies in other countries. Let return the companies that owned the most companies in other countries.

In [9]:
queryString = """
SELECT ?company (count(DISTINCT ?o) AS ?companies_owned)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P17 ?c;
    wdt:P1830 ?o.
?o wdt:P17 ?own_c.
FILTER(?c != ?own_c).

?p sc:name ?company .

}
GROUP BY ?p ?company
ORDER BY DESC (?companies_owned)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('company', 'The Walt Disney Company'), ('companies_owned', '66')]
[('company', 'Sony'), ('companies_owned', '29')]
[('company', 'AG für Verkehrswesen'), ('companies_owned', '25')]
[('company', 'Anheuser-Busch InBev'), ('companies_owned', '25')]
[('company', 'WarnerMedia'), ('companies_owned', '23')]
[('company', 'Viacom'), ('companies_owned', '20')]
[('company', 'Discovery Inc.'), ('companies_owned', '17')]
[('company', 'Heineken'), ('companies_owned', '16')]
[('company', 'LVMH'), ('companies_owned', '13')]
[('company', 'Daimler AG'), ('companies_owned', '13')]
[('company', 'Carlsberg Group'), ('companies_owned', '13')]
[('company', 'NexGen'), ('companies_owned', '13')]
[('company', 'Ford Motor Company'), ('companies_owned', '12')]
[('company', 'Modern Times Group'), ('companies_owned', '11')]
[('company', 'BNP Paribas'), ('companies_owned', '10')]
[('company', 'Unibail Rodamco Westfield'), ('companies_owned', '10')]
[('company', 'News Corporation'), ('companies_owned', '10')

50

## Task 4 
Companies have different 'legal forms', compare the number of companies divided in different legal forms

Look for the predicate legal forms or something similar.

In [10]:
queryString = """
SELECT DISTINCT ?pr ?name
WHERE { 

?p wdt:P31 wd:Q4830453.
?p ?pr ?o.

?pr sc:name ?name .

FILTER (REGEX(?name,".*egal.*"))
}
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P1278'), ('name', 'Legal Entity Identifier')]
[('pr', 'http://www.wikidata.org/prop/direct/P1454'), ('name', 'legal form')]
[('pr', 'http://www.wikidata.org/prop/direct/P1031'), ('name', 'legal citation of this text')]


3

Found that legal form is the predicate `P1454`

In [11]:
queryString = """
SELECT DISTINCT ?legal_form (COUNT(DISTINCT ?p) AS ?companies)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P1454 ?o.

?o sc:name ?legal_form .

}
GROUP BY ?legal_form
ORDER BY DESC (?companies)
LIMIT 100
"""

print("Results")
run_query(queryString)

Results
[('legal_form', 'společnost s ručením omezeným'), ('companies', '13059')]
[('legal_form', 'akciová společnost'), ('companies', '5551')]
[('legal_form', 'joint-stock company'), ('companies', '5168')]
[('legal_form', 'GmbH'), ('companies', '2823')]
[('legal_form', 'Spoločnosť s ručením obmedzeným'), ('companies', '2302')]
[('legal_form', 'Aktiengesellschaft'), ('companies', '1842')]
[('legal_form', 'public company'), ('companies', '1407')]
[('legal_form', 'aksjeselskap'), ('companies', '1312')]
[('legal_form', 'kabushiki gaisha'), ('companies', '1228')]
[('legal_form', 'S.A.'), ('companies', '1134')]
[('legal_form', 'GmbH & Co. KG'), ('companies', '1127')]
[('legal_form', 'private limited liability company'), ('companies', '1112')]
[('legal_form', 'privately held company'), ('companies', '968')]
[('legal_form', 'Gesellschaft mit beschränkter Haftung'), ('companies', '867')]
[('legal_form', 'akciová spoločnosť'), ('companies', '838')]
[('legal_form', 'open joint-stock company'), (

100

## Task 5
Analyze the number of employees and other relevant numeric attributes

#### 5.1 What are the top-10 companies for a given attribute?

Rerun the first query to have all numeric attributes. 

In [12]:
queryString = """
SELECT DISTINCT ?pr ?name
WHERE { 

?p wdt:P31 wd:Q4830453 ;
    ?pr ?o.
?pr sc:name ?name .
FILTER(isNumeric(?o))
}
"""

print("Results")
run_query(queryString)

Results
[('pr', 'http://www.wikidata.org/prop/direct/P1128'), ('name', 'employees')]
[('pr', 'http://www.wikidata.org/prop/direct/P1661'), ('name', 'Alexa rank')]
[('pr', 'http://www.wikidata.org/prop/direct/P2139'), ('name', 'total revenue')]
[('pr', 'http://www.wikidata.org/prop/direct/P2295'), ('name', 'net profit')]
[('pr', 'http://www.wikidata.org/prop/direct/P2403'), ('name', 'total assets')]
[('pr', 'http://www.wikidata.org/prop/direct/P3362'), ('name', 'operating income')]
[('pr', 'http://www.wikidata.org/prop/direct/P8687'), ('name', 'social media followers')]
[('pr', 'http://www.wikidata.org/prop/direct/P2137'), ('name', 'total equity')]
[('pr', 'http://www.wikidata.org/prop/direct/P2138'), ('name', 'total liabilities')]
[('pr', 'http://www.wikidata.org/prop/direct/P8340'), ('name', 'estimated value')]
[('pr', 'http://www.wikidata.org/prop/direct/P9279'), ('name', 'Egapro gender equality index')]
[('pr', 'http://www.wikidata.org/prop/direct/P2043'), ('name', 'length')]
[('pr'

72

Now we can look for the single attributes and perform a query with only this attributw, limit to 10 the result set and order by descendant order, that is the answer of the question 5.1. I will do some example for some numeric attributes (employees and total revenue) but then it is possible to do the same with all the predicates listed above.

In [13]:
queryString = """
SELECT ?name ?employees
WHERE { 

?p wdt:P31 wd:Q4830453;
   wdt:P1128 ?employees.

?p sc:name ?name .
FILTER (isNumeric(?employees))
}
ORDER BY DESC (?employees)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('name', 'Walmart'), ('employees', '2500000')]
[('name', 'China National Petroleum Corporation'), ('employees', '1589508')]
[('name', 'State Grid Corporation of China'), ('employees', '1581000')]
[('name', 'Amazon'), ('employees', '1298000')]
[('name', 'Randstad N.V.'), ('employees', '658580')]
[('name', 'Foxconn'), ('employees', '618460')]
[('name', 'G4S'), ('employees', '618260')]
[('name', 'RAO UES'), ('employees', '577000')]
[('name', 'DHL'), ('employees', '570000')]
[('name', 'Deutsche Post AG'), ('employees', '547459')]


10

In [14]:
queryString = """
SELECT ?name ?revenues
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2139 ?revenues.

?p sc:name ?name .
}
ORDER BY DESC (?revenues)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('name', 'Samsung Electronics'), ('revenues', '201870000000000')]
[('name', 'Bank Rakyat Indonesia'), ('revenues', '111582000000000')]
[('name', 'RAO UES'), ('revenues', '40377000000000')]
[('name', 'Toyota'), ('revenues', '28403118000000')]
[('name', 'Indofood Agri Resources Limited'), ('revenues', '13650388000000')]
[('name', 'SoftBank'), ('revenues', '9158765000000')]
[('name', 'Hanjin Shipping'), ('revenues', '7669598000000')]
[('name', 'Panasonic Corporation'), ('revenues', '7343707000000')]
[('name', 'Gazprom'), ('revenues', '6321559000000')]
[('name', 'Yukos'), ('revenues', '5860000000000')]


10

#### 5.2 For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each category and country?

From the queries about numeric attributes done before, I found some predicates useful to answer this query. The predicates are:
- `P2139`: total revenues;
- `P2295`: net profit;
- `P3362`: operating income;
- `P8340`: estimated value;
- `P2133`: total debt;
- `P2226`: market capitalization;
- `P2403`: total assets.

For each of this predicate I will perform a query that show the min, max and average

In [15]:
#query for the revenues
queryString = """
SELECT DISTINCT ?country (min(?revenues) AS ?min) (max(?revenues) AS ?max) (avg(?revenues) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2139 ?revenues;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?revenues))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'Indonesia'), ('min', '111582000000000'), ('max', '111582000000000'), ('avg', '111582000000000')]
[('country', 'South Korea'), ('min', '700000000'), ('max', '201870000000000'), ('avg', '19408289289454.545454545454545')]
[('country', 'Singapore'), ('min', '102400000'), ('max', '13650388000000'), ('avg', '2739130932000')]
[('country', 'Japan'), ('min', '180000000'), ('max', '28403118000000'), ('avg', '2061081295925.925925925925926')]
[('country', 'Iraq'), ('min', '1361329000000'), ('max', '1361329000000'), ('avg', '1361329000000')]
[('country', 'Taiwan'), ('min', '234285354'), ('max', '4358733357000'), ('avg', '822597048764.857142857142857')]
[('country', 'Hong Kong'), ('min', '566497000000'), ('max', '566497000000'), ('avg', '566497000000')]
[('country', 'Nigeria'), ('min', '449800000000'), ('max', '449800000000'), ('avg', '449800000000')]
[('country', "People's Republic of China"), ('min', '80000000'), ('max', '2742779810000'), ('avg', '402983084099.816363636363636

20

In [16]:
#query for the net profit
queryString = """
SELECT DISTINCT ?country (min(?net) AS ?min) (max(?net) AS ?max) (avg(?net) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2295 ?net;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?net))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'South Korea'), ('min', '182000000'), ('max', '22730000000000'), ('avg', '5691187331750')]
[('country', 'Saudi Arabia'), ('min', '330693000000'), ('max', '330693000000'), ('avg', '330693000000')]
[('country', 'Japan'), ('min', '-5114000000'), ('max', '2312694000000'), ('avg', '247274351941.778111111111111')]
[('country', 'Iraq'), ('min', '243618000000'), ('max', '243618000000'), ('avg', '243618000000')]
[('country', 'Hong Kong'), ('min', '78188000000'), ('max', '78188000000'), ('avg', '78188000000')]
[('country', "People's Republic of China"), ('min', '38668000'), ('max', '317685000000'), ('avg', '51582444678.618965517241379')]
[('country', 'Crimean Peninsula'), ('min', '37611743000'), ('max', '37611743000'), ('avg', '37611743000')]
[('country', 'Taiwan'), ('min', '2632565'), ('max', '151357164000'), ('avg', '27938595094.166666666666667')]
[('country', 'Cayman Islands'), ('min', '18881000'), ('max', '31400000000'), ('avg', '15709440500')]
[('country', 'Russia'), ('

20

In [17]:
#query for the operating income
queryString = """
SELECT DISTINCT ?country (min(?op_income) AS ?min) (max(?op_income) AS ?max) (avg(?op_income) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P3362 ?op_income;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?op_income))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'South Korea'), ('min', '97000000'), ('max', '29240000000000'), ('avg', '7348055899750')]
[('country', 'Saudi Arabia'), ('min', '674871000000'), ('max', '674871000000'), ('avg', '674871000000')]
[('country', 'Japan'), ('min', '-5104000000'), ('max', '2853971000000'), ('avg', '613255182222.222222222222222')]
[('country', 'Hungary'), ('min', '307905000000'), ('max', '307905000000'), ('avg', '307905000000')]
[('country', "People's Republic of China"), ('min', '33685000'), ('max', '390822000000'), ('avg', '71484820490.857142857142857')]
[('country', 'Russia'), ('min', '-48465118000'), ('max', '1702100000000'), ('avg', '51122627077.777777777777778')]
[('country', 'Taiwan'), ('min', '24716786'), ('max', '174939501000'), ('avg', '50573411196.5')]
[('country', 'Cayman Islands'), ('min', '49985000'), ('max', '69712000000'), ('avg', '34880992500')]
[('country', 'Australia'), ('min', '15996000000'), ('max', '15996000000'), ('avg', '15996000000')]
[('country', 'Turkey'), ('min

20

In [18]:
#query for the estimated value
queryString = """
SELECT DISTINCT ?country (min(?value) AS ?min) (max(?value) AS ?max) (avg(?value) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P8340 ?value;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?value))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', "People's Republic of China"), ('min', '2000000000'), ('max', '105000000000'), ('avg', '21042857142.857142857142857')]
[('country', 'United States of America'), ('min', '1000000000'), ('max', '105000000000'), ('avg', '5858108108.108108108108108')]
[('country', 'Sweden'), ('min', '2500000000'), ('max', '8500000000'), ('avg', '5500000000')]
[('country', 'India'), ('min', '1000000000'), ('max', '15000000000'), ('avg', '3914285714.285714285714286')]
[('country', 'Netherlands'), ('min', '2300000000'), ('max', '2300000000'), ('avg', '2300000000')]
[('country', 'Germany'), ('min', '1000000000'), ('max', '3100000000'), ('avg', '1880000000')]
[('country', 'Singapore'), ('min', '1000000000'), ('max', '2500000000'), ('avg', '1600000000')]
[('country', 'Israel'), ('min', '1500000000'), ('max', '1500000000'), ('avg', '1500000000')]
[('country', 'Malaysia'), ('min', '1300000000'), ('max', '1600000000'), ('avg', '1450000000')]
[('country', 'United Kingdom'), ('min', '1000000000')

17

In [19]:
#query for the total debt
queryString = """
SELECT DISTINCT ?country (min(?debt) AS ?min) (max(?debt) AS ?max) (avg(?debt) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2133 ?debt;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?debt))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'United States of America'), ('min', '1683900000'), ('max', '33717000000'), ('avg', '15571521428.571428571428571')]
[('country', 'France'), ('min', '1622100'), ('max', '14001000000'), ('avg', '5569155525')]
[('country', 'Spain'), ('min', '442000000'), ('max', '442000000'), ('avg', '442000000')]


3

In [20]:
#query for the market capitalization
queryString = """
SELECT DISTINCT ?country (min(?cap) AS ?min) (max(?cap) AS ?max) (avg(?cap) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2226 ?cap;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?cap))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'Saudi Arabia'), ('min', '2458000000000'), ('max', '2458000000000'), ('avg', '2458000000000')]
[('country', 'Russia'), ('min', '6858000000'), ('max', '3100000000000'), ('avg', '731755800000')]
[('country', 'United States of America'), ('min', '500000'), ('max', '2470000000000'), ('avg', '411943146923')]
[('country', "People's Republic of China"), ('min', '57703130'), ('max', '486000000000'), ('avg', '243028851565')]
[('country', 'Finland'), ('min', '26257000000'), ('max', '415000000000'), ('avg', '220628500000')]
[('country', 'Switzerland'), ('min', '172750000000'), ('max', '237363000000'), ('avg', '205056500000')]
[('country', 'Japan'), ('min', '8414000000'), ('max', '500000000000'), ('avg', '176578250000')]
[('country', 'Denmark'), ('min', '25256000000'), ('max', '289600000000'), ('avg', '142452000000')]
[('country', 'Czech Republic'), ('min', '7100000000'), ('max', '286200000000'), ('avg', '138800000000')]
[('country', 'United Kingdom'), ('min', '47000000000'), 

20

In [21]:
#query for the total assets
queryString = """
SELECT DISTINCT ?country (min(?assets) AS ?min) (max(?assets) AS ?max) (avg(?assets) AS ?avg)
WHERE { 

?p wdt:P31 wd:Q4830453;
    wdt:P2403 ?assets;
    wdt:P17 ?c.

?c sc:name ?country .
FILTER (isNumeric(?assets))
}
GROUP BY ?c ?country
ORDER BY DESC (?avg)
LIMIT 20
"""

print("Results")
run_query(queryString)

Results
[('country', 'South Korea'), ('min', '6450000000'), ('max', '262174300000000'), ('avg', '53747392301544.4')]
[('country', 'Japan'), ('min', '1412070000'), ('max', '47427597000000'), ('avg', '10170190707000')]
[('country', 'Hong Kong'), ('min', '8289924000000'), ('max', '8289924000000'), ('avg', '8289924000000')]
[('country', "People's Republic of China"), ('min', '3727000000'), ('max', '33345058000000'), ('avg', '4439947726959.48875')]
[('country', 'Hungary'), ('min', '720000000000'), ('max', '4103786000000'), ('avg', '2411893000000')]
[('country', 'Russia'), ('min', '50858000'), ('max', '36016000000000'), ('avg', '1292729712609.523809523809524')]
[('country', 'Cayman Islands'), ('min', '1496525000'), ('max', '2301159000000'), ('avg', '1151327762500')]
[('country', 'Norway'), ('min', '42988000'), ('max', '2793294000000'), ('avg', '740150872000')]
[('country', 'Croatia'), ('min', '675233145000'), ('max', '675233145000'), ('avg', '675233145000')]
[('country', 'Finland'), ('min', 

20

#### 5.3 Which business in each country owns more businesses in other countries?

Firstable group by company to discover the companies owned and append the country

In [22]:
queryString = """
SELECT ?company ?country (COUNT(DISTINCT ?o) AS ?companies_owned)
WHERE { 

    ?p wdt:P31 wd:Q4830453;
        wdt:P17 ?c;
        wdt:P1830 ?o.
    ?o wdt:P17 ?own_c.
    FILTER(?c != ?own_c).

    ?p sc:name ?company .
    ?c sc:name ?country .

}
GROUP BY ?p ?company ?c ?country
ORDER BY DESC (?companies_owned)

LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'The Walt Disney Company'), ('country', 'United States of America'), ('companies_owned', '66')]
[('company', 'Sony'), ('country', 'Japan'), ('companies_owned', '29')]
[('company', 'AG für Verkehrswesen'), ('country', 'German Reich'), ('companies_owned', '25')]
[('company', 'WarnerMedia'), ('country', 'United States of America'), ('companies_owned', '23')]
[('company', 'Anheuser-Busch InBev'), ('country', 'Brazil'), ('companies_owned', '22')]
[('company', 'Anheuser-Busch InBev'), ('country', 'Belgium'), ('companies_owned', '21')]
[('company', 'Viacom'), ('country', 'United States of America'), ('companies_owned', '20')]
[('company', 'Discovery Inc.'), ('country', 'United States of America'), ('companies_owned', '17')]
[('company', 'Heineken'), ('country', 'Netherlands'), ('companies_owned', '16')]
[('company', 'LVMH'), ('country', 'France'), ('companies_owned', '13')]


10

Then find for each country the max companies owned

In [23]:
queryString = """
SELECT ?country (max(?companies_owned) AS ?max_owned)
WHERE{
    {
        SELECT ?p ?company ?country (COUNT(DISTINCT ?o) AS ?companies_owned)
        WHERE { 

            ?p wdt:P31 wd:Q4830453;
                wdt:P17 ?c;
                wdt:P1830 ?o.
            ?o wdt:P17 ?own_c.
            FILTER(?c != ?own_c).

            ?p sc:name ?company .
            ?c sc:name ?country .

        }
        GROUP BY ?p ?company ?c ?country
    }
}
GROUP BY ?c ?country
ORDER BY DESC (?max_owned)
LIMIT 10
"""

print("Results")
run_query(queryString)

Results
[('country', 'United States of America'), ('max_owned', '66')]
[('country', 'Japan'), ('max_owned', '29')]
[('country', 'German Reich'), ('max_owned', '25')]
[('country', 'Brazil'), ('max_owned', '22')]
[('country', 'Belgium'), ('max_owned', '21')]
[('country', 'Netherlands'), ('max_owned', '16')]
[('country', 'Luxembourg'), ('max_owned', '13')]
[('country', 'Denmark'), ('max_owned', '13')]
[('country', 'Germany'), ('max_owned', '13')]
[('country', 'France'), ('max_owned', '13')]


10

Put together the two queries above and filter for the same owned companies and same country. In this way I found the company in that country that own more companies.

In [24]:
queryString = """
SELECT ?country ?company1 ?max_owned{
    {
        SELECT ?country (max(?companies_owned) AS ?max_owned)
        WHERE{
            {
                SELECT ?p ?company ?country (COUNT(DISTINCT ?o) AS ?companies_owned)
                WHERE { 

                    ?p wdt:P31 wd:Q4830453;
                        wdt:P17 ?c;
                        wdt:P1830 ?o.
                    ?o wdt:P17 ?own_c.
                    FILTER(?c != ?own_c).

                    ?p sc:name ?company .
                    ?c sc:name ?country .

                }
                GROUP BY ?p ?company ?c ?country
            }
        }
        GROUP BY ?c ?country
        ORDER BY DESC (?max_owned)
    }
    {
        SELECT ?p1 ?company1 ?country1 (COUNT(DISTINCT ?o1) AS ?inside_companies_owned)
        WHERE { 

            ?p1 wdt:P31 wd:Q4830453;
                wdt:P17 ?c1;
                wdt:P1830 ?o1.
            ?o1 wdt:P17 ?own_c1.
            FILTER(?c1 != ?own_c1).

            ?p1 sc:name ?company1 .
            ?c1 sc:name ?country1 .

        }
        GROUP BY ?p1 ?company1 ?c1 ?country1
    }
    FILTER (?inside_companies_owned = ?max_owned && ?country1 = ?country)
}
ORDER BY DESC (?max_owned)
LIMIT 50
"""

print("Results")
run_query(queryString)

Results
[('country', 'United States of America'), ('company1', 'The Walt Disney Company'), ('max_owned', '66')]
[('country', 'Japan'), ('company1', 'Sony'), ('max_owned', '29')]
[('country', 'German Reich'), ('company1', 'AG für Verkehrswesen'), ('max_owned', '25')]
[('country', 'Brazil'), ('company1', 'Anheuser-Busch InBev'), ('max_owned', '22')]
[('country', 'Belgium'), ('company1', 'Anheuser-Busch InBev'), ('max_owned', '21')]
[('country', 'Netherlands'), ('company1', 'Heineken'), ('max_owned', '16')]
[('country', 'France'), ('company1', 'LVMH'), ('max_owned', '13')]
[('country', 'Denmark'), ('company1', 'Carlsberg Group'), ('max_owned', '13')]
[('country', 'Germany'), ('company1', 'Daimler AG'), ('max_owned', '13')]
[('country', 'Luxembourg'), ('company1', 'NexGen'), ('max_owned', '13')]
[('country', 'Sweden'), ('company1', 'Modern Times Group'), ('max_owned', '11')]
[('country', 'South Africa'), ('company1', 'AngloGold Ashanti'), ('max_owned', '10')]
[('country', 'United Kingdom')

50