# General instructions

The goal of the project is to materialize a set of **exploratory workloads** over a real-world, large-scale,  open-domain KG: [WikiData](https://www.wikidata.org/wiki/Wikidata:Main_Page)

An exploratory workload  is composed by a set of queries, where each query is related to the information obtained previously.

An exploratory workload starts with a usually vague, open ended question, and does not assume the person issuing the workload has a clear understanding of the data contained in the target database or its structure.

Remeber that:

1. All the queries must run in the python notebook
2. You can use classes and properties only if you find them via a SPARQL query that must be present in the notebook
3. You do not delete useless queries. Keep everything that is synthatically valid 

```
?p <http://schema.org/name> ?name .
```
    
is the BGP returning a human-readable name of a property or a class in Wikidata.
    
    

In [1]:
## SETUP used later

from SPARQLWrapper import SPARQLWrapper, JSON


prefixString = """
##-808f3957ad-##
PREFIX wd: <http://www.wikidata.org/entity/> 
PREFIX wdt: <http://www.wikidata.org/prop/direct/> 
PREFIX sc: <http://schema.org/>
"""

# select and construct queries
def run_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
       results = sparql.query()
       json_results = results.convert()
       if len(json_results['results']['bindings'])==0:
          print("Empty")
          return 0
    
       for bindings in json_results['results']['bindings']:
          print( [ (var, value['value'])  for var, value in bindings.items() ] )

       return len(json_results['results']['bindings'])

    except Exception as e :
        print("The operation failed", e)
    
# ASk queries
def run_ask_query(queryString):
    to_run = prefixString + "\n" + queryString

    sparql = SPARQLWrapper("http://a256-gc1-02.srv.aau.dk:5820/sparql")
    sparql.setTimeout(300)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(to_run)

    try :
        return sparql.query().convert()

    except Exception as e :
        print("The operation failed", e)


# Companies Workflow Series ("IT Companies explorative search") 

Consider the following exploratory information need:

> Compare companies across different sectors in U.K., U.S., and Canada, consider number of employees, companies owned or acquired, and revenue or assets

## Useful URIs for the current workflow


The following are given:

| IRI           | Description   | Role      |
| -----------   | -----------   |-----------|
| `wdt:P1647`   | subproperty   | predicate |
| `wdt:P31`     | instance of   | predicate |
| `wdt:P279`    | subclass      | predicate |
| `wdt:P17`      | country       | predicate |
| `wd:Q4830453`  | Business      | node      |
| `wd:Q13977`    | Bloomberg L.P.| node |
| `wd:Q502121`   | BlackBerry    | node |
| `wd:Q16`        | Canada        | node |
| `wd:Q145`      | U.K.          | node |
| `wd:Q30`       | U.S.A.        | node |


Also consider

```
?p wdt:P17 wd:Q16  . 
?p wdt:P31 wd:Q4830453  . 
```

is the BGP to retrieve all **canadian businesses**

## Workload Goals

1. Identify the BGP for obtaining number of employees of a company and other relevant numerical attributes

2. Identify the BGP to retrieve all companies owned by a company

3. Is there some company that owns companies in other countries?

4. Companies have different 'legal forms', compare the number of companies divided in different legal forms

5. Analyze the number of employees  and other relevant numeric attributes
 
   5.1 What are the top-10 companies for a given attribute?
   
   5.2 For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each category and country?
   
   5.3 Which business in each country owns more businesses in other countries?


# 1. Identify the BGP for obtaining number of employees of a company and other relevant numerical attributes

Take ten casual canadian businesses, just for test.

In [2]:
queryString = """
select * where {
    ?p wdt:P17 wd:Q16  . 
    ?p wdt:P31 wd:Q4830453 . 
    
    ?p <http://schema.org/name> ?name .
} limit 10
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/entity/Q17107879'), ('name', 'Martin & Vleminckx')]
[('p', 'http://www.wikidata.org/entity/Q3347600'), ('name', 'OMERS')]
[('p', 'http://www.wikidata.org/entity/Q4724292'), ('name', 'Algoma Central')]
[('p', 'http://www.wikidata.org/entity/Q1647539'), ('name', 'Royal Canadian Mint')]
[('p', 'http://www.wikidata.org/entity/Q17012723'), ('name', 'Nova Scotia Light and Power')]
[('p', 'http://www.wikidata.org/entity/Q3499025'), ('name', 'Fresh TV')]
[('p', 'http://www.wikidata.org/entity/Q6934764'), ('name', 'Multimatic Motorsports')]
[('p', 'http://www.wikidata.org/entity/Q10439578'), ('name', 'Calgary Sports and Entertainment')]
[('p', 'http://www.wikidata.org/entity/Q7059529'), ('name', 'Northland Power')]
[('p', 'http://www.wikidata.org/entity/Q29123207'), ('name', 'Stryker (Canada)')]


10

I see the properties for Bloomberg L.P. (**Q13977**) and BlackBerry (**Q502121**)

In [3]:
queryString = """
select ?companyName ?p ?pName ?o ?oName where {
    values ?company {wd:Q13977 wd:Q502121}
    
    ?company ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?company <http://schema.org/name> ?companyName .
    optional { ?o <http://schema.org/name> ?oName . } .
} order by desc(?company)
"""

print("Results")
run_query(queryString)

Results
[('companyName', 'BlackBerry'), ('p', 'http://www.wikidata.org/prop/direct/P1056'), ('pName', 'product or material produced'), ('o', 'http://www.wikidata.org/entity/Q1780509'), ('oName', 'communication device')]
[('companyName', 'BlackBerry'), ('p', 'http://www.wikidata.org/prop/direct/P1056'), ('pName', 'product or material produced'), ('o', 'http://www.wikidata.org/entity/Q7397'), ('oName', 'software')]
[('companyName', 'BlackBerry'), ('p', 'http://www.wikidata.org/prop/direct/P1056'), ('pName', 'product or material produced'), ('o', 'http://www.wikidata.org/entity/Q17517'), ('oName', 'mobile phone')]
[('companyName', 'BlackBerry'), ('p', 'http://www.wikidata.org/prop/direct/P112'), ('pName', 'founded by'), ('o', 'http://www.wikidata.org/entity/Q1379779'), ('oName', 'Mike Lazaridis')]
[('companyName', 'BlackBerry'), ('p', 'http://www.wikidata.org/prop/direct/P1454'), ('pName', 'legal form'), ('o', 'http://www.wikidata.org/entity/Q422074'), ('oName', 'corporation')]
[('company

159

The number of employees is given by the property **P1128**, so the BGP I want is:

```
?business wdt:P1128 ?numEmployees .
```

I can list all the numeric properties releated businesses filtering them with "isNumeric"

In [4]:
queryString = """
select distinct ?p ?pName where {
    ?business ?p ?o ;
              wdt:P31 wd:Q4830453 .
    
    ?p <http://schema.org/name> ?pName .
    
    filter isNumeric(?o) .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1128'), ('pName', 'employees')]
[('p', 'http://www.wikidata.org/prop/direct/P1661'), ('pName', 'Alexa rank')]
[('p', 'http://www.wikidata.org/prop/direct/P2139'), ('pName', 'total revenue')]
[('p', 'http://www.wikidata.org/prop/direct/P2295'), ('pName', 'net profit')]
[('p', 'http://www.wikidata.org/prop/direct/P2403'), ('pName', 'total assets')]
[('p', 'http://www.wikidata.org/prop/direct/P3362'), ('pName', 'operating income')]
[('p', 'http://www.wikidata.org/prop/direct/P8687'), ('pName', 'social media followers')]
[('p', 'http://www.wikidata.org/prop/direct/P2137'), ('pName', 'total equity')]
[('p', 'http://www.wikidata.org/prop/direct/P2138'), ('pName', 'total liabilities')]
[('p', 'http://www.wikidata.org/prop/direct/P8340'), ('pName', 'estimated value')]
[('p', 'http://www.wikidata.org/prop/direct/P9279'), ('pName', 'Egapro gender equality index')]
[('p', 'http://www.wikidata.org/prop/direct/P2043'), ('pName', 'length')]
[('p',

72

And so we can get other relevant numerical attributes like the *net profit (**P2295**)*, the *market capitalization (**P2226**)*, the *estimated value (**P8340**)*, the *equality index (**P9279**)*, the *total debpt (**P2133**)* and so on. The BGP for there properties is:

```
?business wdt:{property code} ?value .
```

## 2. Identify the BGP to retrieve all companies owned by a company

From the list of properties of Bloomberg L.P. I see that the property *owner of* has the code **P1830**. I can also see the properties of an owned company, like Bloomberg News, in order to see if there is a property "owned".

In [5]:
queryString = """
select ?p ?pName ?o ?oName where {
    wd:Q14270642 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    optional { ?o <http://schema.org/name> ?oName . } .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q192283'), ('oName', 'news agency')]
[('p', 'http://www.wikidata.org/prop/direct/P452'), ('pName', 'industry'), ('o', 'http://www.wikidata.org/entity/Q192283'), ('oName', 'news agency')]
[('p', 'http://www.wikidata.org/prop/direct/P17'), ('pName', 'country'), ('o', 'http://www.wikidata.org/entity/Q30'), ('oName', 'United States of America')]
[('p', 'http://www.wikidata.org/prop/direct/P159'), ('pName', 'headquarters location'), ('o', 'http://www.wikidata.org/entity/Q60'), ('oName', 'New York City')]
[('p', 'http://www.wikidata.org/prop/direct/P740'), ('pName', 'location of formation'), ('o', 'http://www.wikidata.org/entity/Q60'), ('oName', 'New York City')]
[('p', 'http://www.wikidata.org/prop/direct/P749'), ('pName', 'parent organization'), ('o', 'http://www.wikidata.org/entity/Q13977'), ('oName', 'Bloomberg L.P.')]
[('p', 'http://www.wikidata.org/prop/direct/P127

17

There is a property *owned by* with the code **P127**. The final BGP for all the companies owned by a business is:

```
{ ?business wdt:P1830 ?owned . } union { ?owned wdt:P127 ?business . }
```

The following query is an example for the companies owned by Bloomberg L.P.

In [6]:
queryString = """
select distinct ?owned ?ownedName where {
    { wd:Q13977 wdt:P1830+ ?owned . } union { ?owned wdt:P127+ wd:Q13977 . } .

    ?owned <http://schema.org/name> ?ownedName .
}
"""

print("Results")
run_query(queryString)

Results
[('owned', 'http://www.wikidata.org/entity/Q13975'), ('ownedName', 'Bloomberg Television')]
[('owned', 'http://www.wikidata.org/entity/Q14270642'), ('ownedName', 'Bloomberg News')]
[('owned', 'http://www.wikidata.org/entity/Q15524972'), ('ownedName', 'Bloomberg TV Indonesia Jakarta')]
[('owned', 'http://www.wikidata.org/entity/Q25245121'), ('ownedName', 'Bloomberg Radio')]
[('owned', 'http://www.wikidata.org/entity/Q3564619'), ('ownedName', 'WBBR')]
[('owned', 'http://www.wikidata.org/entity/Q46996829'), ('ownedName', 'Bloomberg London')]
[('owned', 'http://www.wikidata.org/entity/Q4928189'), ('ownedName', 'Bloomberg Law')]
[('owned', 'http://www.wikidata.org/entity/Q4998403'), ('ownedName', 'Bloomberg Industry Group')]
[('owned', 'http://www.wikidata.org/entity/Q7953259'), ('ownedName', 'WNBP')]
[('owned', 'http://www.wikidata.org/entity/Q875078'), ('ownedName', 'Bloomberg Tower')]
[('owned', 'http://www.wikidata.org/entity/Q93357917'), ('ownedName', 'BLOOMBERG SEF LLC')]
[('o

13

## 3. Is there some company that owns companies in other countries?

I can answer this questions with an ask query, taking the companies that have a country different from their owned companies.

In [7]:
queryString = """
ask where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}
    
    { ?company wdt:P1830+ ?owned . } union { ?owned wdt:P127+ ?company . } .

    ?company wdt:P17 ?companyCountry .
    ?owned wdt:P17 ?ownedCountry .

    filter (?companyCountry != ?ownedCountry)
}
"""

print("Results")
run_ask_query(queryString)

Results


{'head': {'link': []}, 'boolean': True}

So there is some company that owns companies in other countries.

## 4. Companies have different 'legal forms', compare the number of companies divided in different legal forms

From the properties of Bloomberg L.P. I see that the legal form is described by the property **P1454**.

In [8]:
queryString = """
select distinct ?legalForm ?legalFormName (count(distinct ?company) as ?numCompanies) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P1454 ?legalForm .

    ?legalForm <http://schema.org/name> ?legalFormName .
} group by ?legalForm ?legalFormName
order by desc(?numCompanies)
"""

print("Results")
run_query(queryString)

Results
[('legalForm', 'http://www.wikidata.org/entity/Q891723'), ('legalFormName', 'public company'), ('numCompanies', '490')]
[('legalForm', 'http://www.wikidata.org/entity/Q134161'), ('legalFormName', 'joint-stock company'), ('numCompanies', '392')]
[('legalForm', 'http://www.wikidata.org/entity/Q1589009'), ('legalFormName', 'privately held company'), ('numCompanies', '342')]
[('legalForm', 'http://www.wikidata.org/entity/Q5225895'), ('legalFormName', 'public limited company'), ('numCompanies', '255')]
[('legalForm', 'http://www.wikidata.org/entity/Q6832945'), ('legalFormName', 'private company limited by shares'), ('numCompanies', '231')]
[('legalForm', 'http://www.wikidata.org/entity/Q149789'), ('legalFormName', 'limited liability company'), ('numCompanies', '197')]
[('legalForm', 'http://www.wikidata.org/entity/Q783794'), ('legalFormName', 'company'), ('numCompanies', '91')]
[('legalForm', 'http://www.wikidata.org/entity/Q658255'), ('legalFormName', 'subsidiary'), ('numCompanies'

110

I can see that some companies has a legal form that doesn't have any sense, like "United Kingdom", or are localized for a specific country. So I want to retrieve the general legal forms. In order to do this, I need to know the entity for "legal form" and the path, if any, from the legal form name and its generalized version.

I start with the properties of the legal form *joint-stock company (**Q134161**)*.

In [9]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q134161 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source'), ('o', 'http://www.wikidata.org/entity/Q19180675'), ('oName', 'Small Brockhaus and Efron Encyclopedic Dictionary')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source'), ('o', 'http://www.wikidata.org/entity/Q2041543'), ('oName', 'Ottův slovník naučný')]
[('p', 'http://www.wikidata.org/prop/direct/P1343'), ('pName', 'described by source'), ('o', 'http://www.wikidata.org/entity/Q602358'), ('oName', 'Brockhaus and Efron Encyclopedic Dictionary')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q33685'), ('oName', 'limited company')]
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q1269299'), ('oName', 'type of business entity')]
[('p', 'http://www.wikidata.org/prop/direct/P92'), ('pName', 'main regulatory text'), ('o', 'http://www

6

That is subclass of *limited company (**Q33685**)*.

In [10]:
queryString = """
select distinct ?p ?pName ?o ?oName where {
    wd:Q33685 ?p ?o .
    
    ?p <http://schema.org/name> ?pName .
    ?o <http://schema.org/name> ?oName .
}
"""

print("Results")
run_query(queryString)

Results
[('p', 'http://www.wikidata.org/prop/direct/P31'), ('pName', 'instance of'), ('o', 'http://www.wikidata.org/entity/Q12047392'), ('oName', 'legal form')]
[('p', 'http://www.wikidata.org/prop/direct/P279'), ('pName', 'subclass of'), ('o', 'http://www.wikidata.org/entity/Q567521'), ('oName', 'commercial company')]
[('p', 'http://www.wikidata.org/prop/direct/P461'), ('pName', 'opposite of'), ('o', 'http://www.wikidata.org/entity/Q17152511'), ('oName', 'unlimited company')]


3

That is instance of *legal form (**Q12047392**)*.

So the BGP to use to get the general legal from of a company is

```
?company wdt:P1454/wdt:P279*/wdt:P31* ?legalForm .
?legalForm wdt:P31 wd:Q12047392 .
```

In [11]:
queryString = """
select distinct ?legalForm ?legalFormName (count(distinct ?company) as ?numCompanies) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P1454/wdt:P279*/wdt:P31* ?legalForm .
    ?legalForm wdt:P31 wd:Q12047392 .

    ?legalForm <http://schema.org/name> ?legalFormName .
} group by ?legalForm ?legalFormName
order by desc(?numCompanies)
"""

print("Results")
run_query(queryString)

Results
[('legalForm', 'http://www.wikidata.org/entity/Q155076'), ('legalFormName', 'juridical person'), ('numCompanies', '2031')]
[('legalForm', 'http://www.wikidata.org/entity/Q33685'), ('legalFormName', 'limited company'), ('numCompanies', '1962')]
[('legalForm', 'http://www.wikidata.org/entity/Q1589009'), ('legalFormName', 'privately held company'), ('numCompanies', '934')]
[('legalForm', 'http://www.wikidata.org/entity/Q891723'), ('legalFormName', 'public company'), ('numCompanies', '748')]
[('legalForm', 'http://www.wikidata.org/entity/Q167037'), ('legalFormName', 'corporation'), ('numCompanies', '211')]
[('legalForm', 'http://www.wikidata.org/entity/Q166280'), ('legalFormName', 'S.A.'), ('numCompanies', '77')]
[('legalForm', 'http://www.wikidata.org/entity/Q728646'), ('legalFormName', 'partnership'), ('numCompanies', '69')]
[('legalForm', 'http://www.wikidata.org/entity/Q163740'), ('legalFormName', 'nonprofit organization'), ('numCompanies', '46')]
[('legalForm', 'http://www.wik

18

## 5. Analyze the number of employees and other relevant numeric attributes

### 5.1 What are the top-10 companies for a given attribute?

#### Top-10 companies for number of empolyees

In [12]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P1128 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q483551'), ('companyName', 'Walmart'), ('maxValue', '2500000')]
[('company', 'http://www.wikidata.org/entity/Q3884'), ('companyName', 'Amazon'), ('maxValue', '1298000')]
[('company', 'http://www.wikidata.org/entity/Q1322045'), ('companyName', 'G4S'), ('maxValue', '618260')]
[('company', 'http://www.wikidata.org/entity/Q155026'), ('companyName', 'United Parcel Service'), ('maxValue', '434000')]
[('company', 'http://www.wikidata.org/entity/Q487494'), ('companyName', 'Tesco'), ('maxValue', '423092')]
[('company', 'http://www.wikidata.org/entity/Q30645592'), ('companyName', 'Yum China'), ('maxValue', '420000')]
[('company', 'http://www.wikidata.org/entity/Q1759032'), ('companyName', 'R. J. Reynolds Tobacco Company'), ('maxValue', '400000')]
[('company', 'http://www.wikidata.org/entity/Q864407'), ('companyName', 'The Home Depot'), ('maxValue', '400000')]
[('company', 'http://www.wikidata.org/entity/Q459477'), ('companyName', 'FedEx'), ('m

10

#### Top-10 companies for market capitalization

In [13]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P2226 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q312'), ('companyName', 'Apple Inc.'), ('maxValue', '2470000000000')]
[('company', 'http://www.wikidata.org/entity/Q2283'), ('companyName', 'Microsoft'), ('maxValue', '2270000000000')]
[('company', 'http://www.wikidata.org/entity/Q20800404'), ('companyName', 'Alphabet Inc.'), ('maxValue', '1900000000000')]
[('company', 'http://www.wikidata.org/entity/Q3884'), ('companyName', 'Amazon'), ('maxValue', '1670000000000')]
[('company', 'http://www.wikidata.org/entity/Q478214'), ('companyName', 'Tesla, Inc.'), ('maxValue', '751910000000')]
[('company', 'http://www.wikidata.org/entity/Q217583'), ('companyName', 'Berkshire Hathaway'), ('maxValue', '650340000000')]
[('company', 'http://www.wikidata.org/entity/Q182477'), ('companyName', 'Nvidia'), ('maxValue', '548070000000')]
[('company', 'http://www.wikidata.org/entity/Q328840'), ('companyName', 'Visa Inc.'), ('maxValue', '494680000000')]
[('company', 'http://www.wikidata.org/entity/Q333718'),

10

#### Top-10 companies for total assets

In [14]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P2403 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q621096'), ('companyName', 'Fannie Mae'), ('maxValue', '3960490000000')]
[('company', 'http://www.wikidata.org/entity/Q192314'), ('companyName', 'JPMorgan Chase'), ('maxValue', '2490972000000')]
[('company', 'http://www.wikidata.org/entity/Q190464'), ('companyName', 'HSBC'), ('maxValue', '2374986000000')]
[('company', 'http://www.wikidata.org/entity/Q487907'), ('companyName', 'Bank of America'), ('maxValue', '2144000000000')]
[('company', 'http://www.wikidata.org/entity/Q251546'), ('companyName', 'Novatek'), ('maxValue', '2059178000000')]
[('company', 'http://www.wikidata.org/entity/Q935969'), ('companyName', 'Freddie Mac'), ('maxValue', '2049776000000')]
[('company', 'http://www.wikidata.org/entity/Q219508'), ('companyName', 'Citigroup'), ('maxValue', '1792077000000')]
[('company', 'http://www.wikidata.org/entity/Q217583'), ('companyName', 'Berkshire Hathaway'), ('maxValue', '873729000000')]
[('company', 'http://www.wikidata.org/ent

10

#### Top-10 companies for total net profit

In [15]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P2295 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q251546'), ('companyName', 'Novatek'), ('maxValue', '78586000000')]
[('company', 'http://www.wikidata.org/entity/Q312'), ('companyName', 'Apple Inc.'), ('maxValue', '57411000000')]
[('company', 'http://www.wikidata.org/entity/Q217583'), ('companyName', 'Berkshire Hathaway'), ('maxValue', '42521000000')]
[('company', 'http://www.wikidata.org/entity/Q20800404'), ('companyName', 'Alphabet Inc.'), ('maxValue', '40269000000')]
[('company', 'http://www.wikidata.org/entity/Q2283'), ('companyName', 'Microsoft'), ('maxValue', '39240000000')]
[('company', 'http://www.wikidata.org/entity/Q380'), ('companyName', 'Facebook, Inc.'), ('maxValue', '29146000000')]
[('company', 'http://www.wikidata.org/entity/Q192314'), ('companyName', 'JPMorgan Chase'), ('maxValue', '24733000000')]
[('company', 'http://www.wikidata.org/entity/Q3884'), ('companyName', 'Amazon'), ('maxValue', '21300000000')]
[('company', 'http://www.wikidata.org/entity/Q248'), ('compan

10

#### Top-10 companies for social media followers

In [16]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P8687 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q35339'), ('companyName', 'WWE'), ('maxValue', '78800000')]
[('company', 'http://www.wikidata.org/entity/Q1390577'), ('companyName', 'Twitter, Inc.'), ('maxValue', '58852256')]
[('company', 'http://www.wikidata.org/entity/Q18656'), ('companyName', 'Manchester United F.C.'), ('maxValue', '24503883')]
[('company', 'http://www.wikidata.org/entity/Q7973038'), ('companyName', 'WatchMojo.com'), ('maxValue', '23700000')]
[('company', 'http://www.wikidata.org/entity/Q907311'), ('companyName', 'Netflix'), ('maxValue', '20400000')]
[('company', 'http://www.wikidata.org/entity/Q5003490'), ('companyName', 'BuzzFeed'), ('maxValue', '20300000')]
[('company', 'http://www.wikidata.org/entity/Q166032'), ('companyName', 'The Washington Post'), ('maxValue', '17041168')]
[('company', 'http://www.wikidata.org/entity/Q207708'), ('companyName', 'IGN'), ('maxValue', '15700000')]
[('company', 'http://www.wikidata.org/entity/Q193701'), ('companyName', 'SpaceX

10

#### Top-10 companies for number of companies owned

In [17]:
queryString = """
select ?company ?companyName (count(distinct ?owned) as ?value) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P8687 ?value .
             
    { ?company wdt:P1830 ?owned . } union { ?owned wdt:P127 ?company . } .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?value)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q5194189'), ('companyName', 'Cumulus Media'), ('value', '439')]
[('company', 'http://www.wikidata.org/entity/Q95'), ('companyName', 'Google'), ('value', '263')]
[('company', 'http://www.wikidata.org/entity/Q2118336'), ('companyName', 'Punch Pubs'), ('value', '250')]
[('company', 'http://www.wikidata.org/entity/Q5380195'), ('companyName', 'Audacy, Inc.'), ('value', '241')]
[('company', 'http://www.wikidata.org/entity/Q6773982'), ('companyName', "Marston's"), ('value', '207')]
[('company', 'http://www.wikidata.org/entity/Q1345971'), ('companyName', 'Gannett'), ('value', '146')]
[('company', 'http://www.wikidata.org/entity/Q5431118'), ('companyName', 'Salem Media Group'), ('value', '114')]
[('company', 'http://www.wikidata.org/entity/Q3277465'), ('companyName', 'Sirius XM Radio'), ('value', '106')]
[('company', 'http://www.wikidata.org/entity/Q2283'), ('companyName', 'Microsoft'), ('value', '106')]
[('company', 'http://www.wikidata.org/

10

#### Top-10 companies for total revenue

In [18]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P2139 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q251546'), ('companyName', 'Novatek'), ('maxValue', '711812000000')]
[('company', 'http://www.wikidata.org/entity/Q1382653'), ('companyName', 'Evraz'), ('maxValue', '631214100000')]
[('company', 'http://www.wikidata.org/entity/Q483551'), ('companyName', 'Walmart'), ('maxValue', '559151000000')]
[('company', 'http://www.wikidata.org/entity/Q3884'), ('companyName', 'Amazon'), ('maxValue', '386064000000')]
[('company', 'http://www.wikidata.org/entity/Q154950'), ('companyName', 'Royal Dutch Shell'), ('maxValue', '344877000000')]
[('company', 'http://www.wikidata.org/entity/Q152057'), ('companyName', 'BP'), ('maxValue', '278397000000')]
[('company', 'http://www.wikidata.org/entity/Q312'), ('companyName', 'Apple Inc.'), ('maxValue', '274515000000')]
[('company', 'http://www.wikidata.org/entity/Q624375'), ('companyName', 'CVS Health'), ('maxValue', '268706000000')]
[('company', 'http://www.wikidata.org/entity/Q217583'), ('companyName', 'Ber

10

#### Top-10 companies for estimated values

In [19]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P8340 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q1496384'), ('companyName', 'Koch Industries'), ('maxValue', '105000000000')]
[('company', 'http://www.wikidata.org/entity/Q780442'), ('companyName', 'Uber Technology'), ('maxValue', '62000000000')]
[('company', 'http://www.wikidata.org/entity/Q7624104'), ('companyName', 'Stripe'), ('maxValue', '36000000000')]
[('company', 'http://www.wikidata.org/entity/Q63327'), ('companyName', 'Airbnb'), ('maxValue', '25500000000')]
[('company', 'http://www.wikidata.org/entity/Q2047336'), ('companyName', 'Palantir Technologies'), ('maxValue', '20500000000')]
[('company', 'http://www.wikidata.org/entity/Q27022329'), ('companyName', 'Snap Inc.'), ('maxValue', '16000000000')]
[('company', 'http://www.wikidata.org/entity/Q193701'), ('companyName', 'SpaceX'), ('maxValue', '12000000000')]
[('company', 'http://www.wikidata.org/entity/Q255381'), ('companyName', 'Pinterest'), ('maxValue', '11000000000')]
[('company', 'http://www.wikidata.org/entity/Q199950

10

#### Top-10 companies for operating income

In [20]:
queryString = """
select ?company ?companyName (max(?value) as ?maxValue) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P3362 ?value .

    ?company <http://schema.org/name> ?companyName .
} group by ?company ?companyName
order by desc(?maxValue)
limit 10
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q251546'), ('companyName', 'Novatek'), ('maxValue', '113012000000')]
[('company', 'http://www.wikidata.org/entity/Q312'), ('companyName', 'Apple Inc.'), ('maxValue', '70898000000')]
[('company', 'http://www.wikidata.org/entity/Q217583'), ('companyName', 'Berkshire Hathaway'), ('maxValue', '55693000000')]
[('company', 'http://www.wikidata.org/entity/Q2283'), ('companyName', 'Microsoft'), ('maxValue', '42959000000')]
[('company', 'http://www.wikidata.org/entity/Q20800404'), ('companyName', 'Alphabet Inc.'), ('maxValue', '41224000000')]
[('company', 'http://www.wikidata.org/entity/Q156238'), ('companyName', 'ExxonMobil'), ('maxValue', '34082000000')]
[('company', 'http://www.wikidata.org/entity/Q192314'), ('companyName', 'JPMorgan Chase'), ('maxValue', '30702000000')]
[('company', 'http://www.wikidata.org/entity/Q467752'), ('companyName', 'Verizon'), ('maxValue', '28798000000')]
[('company', 'http://www.wikidata.org/entity/Q54173'), ('c

10

### 5.2 For which companies is defined some form of income or market capitalization or total assets? What is the min, max, and average in each category and country?

From the list of numeric properties associated to the companies, I've found that for some companies are defined the *operating income (**P3362**)*, the *total assets (**P2403**)* and the *market capitalization (**P2226**)*. I can see for which companies is defined some form of income or market capitalization or total assets with the query below.

In [21]:
queryString = """
select ?company ?companyName where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             ?category ?value .

    ?company <http://schema.org/name> ?companyName .
    ?category <http://schema.org/name> ?categoryName .
    
    filter (?category in (wdt:P2403, wdt:P3362, wdt:P2226))
} group by ?company ?companyName
"""

print("Results")
run_query(queryString)

Results
[('company', 'http://www.wikidata.org/entity/Q99872290'), ('companyName', 'Manchester United Plc')]
[('company', 'http://www.wikidata.org/entity/Q7240'), ('companyName', 'Lockheed Martin')]
[('company', 'http://www.wikidata.org/entity/Q128896'), ('companyName', 'Advanced Micro Devices')]
[('company', 'http://www.wikidata.org/entity/Q17081612'), ('companyName', 'Moderna')]
[('company', 'http://www.wikidata.org/entity/Q17489012'), ('companyName', 'Stock Spirits Group')]
[('company', 'http://www.wikidata.org/entity/Q66662001'), ('companyName', 'Chisholm Hunter')]
[('company', 'http://www.wikidata.org/entity/Q219508'), ('companyName', 'Citigroup')]
[('company', 'http://www.wikidata.org/entity/Q30873'), ('companyName', 'Dell Inc.')]
[('company', 'http://www.wikidata.org/entity/Q1562826'), ('companyName', 'VEON')]
[('company', 'http://www.wikidata.org/entity/Q570473'), ('companyName', 'McKesson Corporation')]
[('company', 'http://www.wikidata.org/entity/Q1531473'), ('companyName', 'P

154

In [22]:
queryString = """
select ?countryName
    (min(?capitalization) as ?minCapitalization) (max(?capitalization) as ?maxCapitalization) (avg(?capitalization) as ?avgCapitalization)
    (min(?income) as ?minIncome) (max(?income) as ?maxIncome) (avg(?income) as ?avgIncome)
    (min(?assets) as ?minAssets) (max(?assets) as ?maxAssets) (avg(?assets) as ?avgAssets) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry .
    optional { ?company wdt:P2403 ?assets . } .
    optional { ?company wdt:P3362 ?income . } .
    optional { ?company wdt:P2226 ?capitalization . } .

    ?company <http://schema.org/name> ?companyName .
    ?companyCountry <http://schema.org/name> ?countryName .
    
    filter exists { ?company wdt:P2403|wdt:P3362|wdt:P2226 ?assets } .
} group by ?countryName
order by desc(?avgCapitalization) desc(?avgIncome) desc(?avgAssets)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'United States of America'), ('minCapitalization', '1530000000'), ('maxCapitalization', '2470000000000'), ('avgCapitalization', '471593972252.27'), ('minIncome', '-4408000000'), ('maxIncome', '113012000000'), ('avgIncome', '12880049628.166666666666667'), ('minAssets', '1'), ('maxAssets', '3960490000000'), ('avgAssets', '147764660474.470411067193676')]
[('countryName', 'United Kingdom'), ('minCapitalization', '185400000000'), ('maxCapitalization', '185400000000'), ('avgCapitalization', '185400000000'), ('minIncome', '-1161000000'), ('maxIncome', '14792000000'), ('avgIncome', '3574466367.75'), ('minAssets', '135646163'), ('maxAssets', '2374986000000'), ('avgAssets', '319181544833')]
[('countryName', 'Canada'), ('minCapitalization', '6882000000'), ('maxCapitalization', '45900000000'), ('avgCapitalization', '32894000000'), ('minIncome', '-149000000'), ('maxIncome', '2498000000'), ('avgIncome', '1426000000'), ('minAssets', '1640000000'), ('maxAssets', '17881000000')

3

In order to analyze the categories of the companies, I first need a BGP for the category. Analyzing the properties for BlackBerry and Bloomberg L.P. I notice that the "category" of the business is under the property *industry **P452***. I print the 20 more popular categories.

In [23]:
queryString = """
select ?industryCategory ?industryCategoryName (count(distinct ?company) as ?numCompanies) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P452 ?industryCategory .

    ?company <http://schema.org/name> ?companyName .
    ?companyCountry <http://schema.org/name> ?countryName .
    ?industryCategory <http://schema.org/name> ?industryCategoryName .
} group by ?industryCategory ?industryCategoryName
order by desc(?numCompanies)
limit 20
"""

print("Results")
run_query(queryString)

Results
[('industryCategory', 'http://www.wikidata.org/entity/Q126793'), ('industryCategoryName', 'retail'), ('numCompanies', '1000')]
[('industryCategory', 'http://www.wikidata.org/entity/Q837171'), ('industryCategoryName', 'financial services'), ('numCompanies', '572')]
[('industryCategory', 'http://www.wikidata.org/entity/Q1283714'), ('industryCategoryName', 'Crop production'), ('numCompanies', '480')]
[('industryCategory', 'http://www.wikidata.org/entity/Q941594'), ('industryCategoryName', 'video game industry'), ('numCompanies', '414')]
[('industryCategory', 'http://www.wikidata.org/entity/Q190117'), ('industryCategoryName', 'automotive industry'), ('numCompanies', '280')]
[('industryCategory', 'http://www.wikidata.org/entity/Q7397'), ('industryCategoryName', 'software'), ('numCompanies', '267')]
[('industryCategory', 'http://www.wikidata.org/entity/Q187939'), ('industryCategoryName', 'manufacturing'), ('numCompanies', '251')]
[('industryCategory', 'http://www.wikidata.org/entity/

20

And then I can obtain the min, max, and average of capitalization, income and assets for each category.

In [24]:
queryString = """
select ?categoryName
    (min(?capitalization) as ?minCapitalization) (max(?capitalization) as ?maxCapitalization) (avg(?capitalization) as ?avgCapitalization)
    (min(?income) as ?minIncome) (max(?income) as ?maxIncome) (avg(?income) as ?avgIncome)
    (min(?assets) as ?minAssets) (max(?assets) as ?maxAssets) (avg(?assets) as ?avgAssets) where {
    values ?companyCountry {wd:Q16 wd:Q145 wd:Q30}

    ?company wdt:P31 wd:Q4830453 ;
             wdt:P17 ?companyCountry ;
             wdt:P452 ?category .
    optional { ?company wdt:P2403 ?assets . } .
    optional { ?company wdt:P3362 ?income . } .
    optional { ?company wdt:P2226 ?capitalization . } .

    ?category <http://schema.org/name> ?categoryName .
    
    filter exists { ?company wdt:P2403|wdt:P3362|wdt:P2226 ?assets } .
} group by ?categoryName
order by desc(?avgCapitalization) desc(?avgIncome) desc(?avgAssets)
"""

print("Results")
run_query(queryString)

Results
[('categoryName', 'mobile phone industry'), ('minCapitalization', '1000000000000'), ('maxCapitalization', '2470000000000'), ('avgCapitalization', '1829000000000'), ('minIncome', '66288000000'), ('maxIncome', '70898000000'), ('avgIncome', '68593000000'), ('minAssets', '323888000000'), ('maxAssets', '323888000000'), ('avgAssets', '323888000000')]
[('categoryName', 'digital distribution'), ('minCapitalization', '1000000000000'), ('maxCapitalization', '2470000000000'), ('avgCapitalization', '1829000000000'), ('minIncome', '66288000000'), ('maxIncome', '70898000000'), ('avgIncome', '68593000000'), ('minAssets', '323888000000'), ('maxAssets', '323888000000'), ('avgAssets', '323888000000')]
[('categoryName', 'consumer electronics'), ('minCapitalization', '1000000000000'), ('maxCapitalization', '2470000000000'), ('avgCapitalization', '1829000000000'), ('minIncome', '-19620000'), ('maxIncome', '70898000000'), ('avgIncome', '63315106153.846153846153846'), ('minAssets', '360480000'), ('ma

114

### 5.3 Which business in each country owns more businesses in other countries?

In [25]:
queryString = """

select ?countryName ?companyName ?numOwned {
    {
        select ?country (max(?numOwned) as ?maxOwned) where {
            {
                select ?country ?company (count(distinct ?owned) as ?numOwned) {
                    values ?country {wd:Q16 wd:Q145 wd:Q30}
                    ?company wdt:P31 wd:Q4830453 ;
                             wdt:P17 ?country ;
                             wdt:P8687 ?value .
                    ?owned wdt:P17 ?ownedCountry .
                    { ?company wdt:P1830 ?owned . } union { ?owned wdt:P127 ?company . } .
                    filter (?ownedCountry != ?country)
                } group by ?country ?company
            }
        } group by ?country
    } .
    {
        select ?country ?company (count(distinct ?owned) as ?numOwned) {
            values ?country {wd:Q16 wd:Q145 wd:Q30}
            ?company wdt:P31 wd:Q4830453 ;
                     wdt:P17 ?country ;
                     wdt:P8687 ?value .
            ?owned wdt:P17 ?ownedCountry .
            { ?company wdt:P1830 ?owned . } union { ?owned wdt:P127 ?company . } .
            filter (?ownedCountry != ?country)
        }  group by ?country ?company
    }
    
    ?country <http://schema.org/name> ?countryName .
    ?company <http://schema.org/name> ?companyName .
    
    filter (?maxOwned = ?numOwned) .
} order by desc(?numOwned)
"""

print("Results")
run_query(queryString)

Results
[('countryName', 'United States of America'), ('companyName', 'BlackRock'), ('numOwned', '58')]
[('countryName', 'United Kingdom'), ('companyName', 'Vodafone'), ('numOwned', '17')]
[('countryName', 'Canada'), ('companyName', 'Barrick Gold'), ('numOwned', '6')]


3