# DBpedia Currency Preprocessing

The DBpedia company dataset has a revenue entity of which its attributes are in several different currencies. Therefore, these currencies need to be converted into USD (which we are using as a base currency in all other datasets).

Since we don't know at what point in time the revenue was reported, we cannot convert each currency at the exchange rate in that point of time. The conversion was done on the 26th October 2022. When running this notebook, the code will take the latest currency conversion rate between each currency and USD for conversion.

In [1]:
import pandas as pd

In [5]:
dbpedia_json = pd.read_json("data/dbpedia.json")

In [6]:
dbpedia_json.head()

Unnamed: 0,head,results
link,[],
vars,"[name, industry_label_list, revenue, netIncome...",
distinct,,False
ordered,,True
bindings,,"[{'name': {'type': 'literal', 'xml:lang': 'en'..."


In [7]:
dbpedia_json = dbpedia_json.loc["bindings","results"]

In [8]:
type(dbpedia_json)

list

In [9]:
dbpedia_json = pd.DataFrame(dbpedia_json)

In [10]:
dbpedia_json.head()

Unnamed: 0,name,industry_label_list,revenue,numEmployees,locationCity_list,tradedAs_list,ceo_name_list,founded_list,netIncome,equity,stockSymbol,foundingDate
0,"{'type': 'literal', 'xml:lang': 'en', 'value':...","{'type': 'literal', 'value': 'Fast food'}","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}",,,,
1,"{'type': 'literal', 'xml:lang': 'en', 'value':...","{'type': 'literal', 'value': 'Consultant,Infor...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}",,,,
2,"{'type': 'literal', 'xml:lang': 'en', 'value':...","{'type': 'literal', 'value': 'Entertainment'}","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}",,,,
3,"{'type': 'literal', 'xml:lang': 'en', 'value':...","{'type': 'literal', 'value': 'Petroleum indust...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}",,,,
4,"{'type': 'literal', 'xml:lang': 'en', 'value':...","{'type': 'literal', 'value': 'Sport industry'}","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'typed-literal', 'datatype': 'http://...","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}","{'type': 'literal', 'value': ''}",,,,


## Create a revenue dictionary for easier processing.

In [11]:
revenue_dict = dbpedia_json["revenue"].to_dict()

In [12]:
revenue_dict[0]

{'type': 'typed-literal',
 'datatype': 'http://dbpedia.org/datatype/canadianDollar',
 'value': '1.34E9'}

The datatype is a string which is a URL to the dbpedia Entity of the currency. To get the currency name we need to take the substring of the URL.

In [13]:
len(revenue_dict)

6918

We have 6918 revenue entities.

In [14]:
type(revenue_dict)

dict

In [15]:
revenue_dict

{0: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/canadianDollar',
  'value': '1.34E9'},
 1: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/euro',
  'value': '2.269E9'},
 2: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/usDollar',
  'value': '1.2424E9'},
 3: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/usDollar',
  'value': '4.308E9'},
 4: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/euro',
  'value': '1.18E8'},
 5: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/usDollar',
  'value': '1.0E10'},
 6: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/usDollar',
  'value': '2.587E10'},
 7: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/poundSterling',
  'value': '67.2'},
 8: {'type': 'typed-literal',
  'datatype': 'http://dbpedia.org/datatype/euro',
  'value': '8110000.0'},
 9: {'type': 'typed-literal',
  'datatyp

In [16]:
i = 0
currency_list = []
for key, value in revenue_dict.items():
    currency = revenue_dict[key]['datatype']
    currency_list.append(currency)

In [17]:
len(set(currency_list))

71

We have 71 distinct currencies.

In [18]:
set(currency_list)

{'http://dbpedia.org/datatype/albanianLek',
 'http://dbpedia.org/datatype/algerianDinar',
 'http://dbpedia.org/datatype/argentinePeso',
 'http://dbpedia.org/datatype/australianDollar',
 'http://dbpedia.org/datatype/azerbaijaniManat',
 'http://dbpedia.org/datatype/bahrainiDinar',
 'http://dbpedia.org/datatype/bangladeshiTaka',
 'http://dbpedia.org/datatype/botswanaPula',
 'http://dbpedia.org/datatype/bulgarianLev',
 'http://dbpedia.org/datatype/canadianDollar',
 'http://dbpedia.org/datatype/chileanPeso',
 'http://dbpedia.org/datatype/colombianPeso',
 'http://dbpedia.org/datatype/croatianKuna',
 'http://dbpedia.org/datatype/czechKoruna',
 'http://dbpedia.org/datatype/danishKrone',
 'http://dbpedia.org/datatype/djiboutianFranc',
 'http://dbpedia.org/datatype/ethiopianBirr',
 'http://dbpedia.org/datatype/euro',
 'http://dbpedia.org/datatype/gambianDalasi',
 'http://dbpedia.org/datatype/ghanaianCedi',
 'http://dbpedia.org/datatype/hongKongDollar',
 'http://dbpedia.org/datatype/hungarianFori

To get the exchange rate for a currency pair we need the currency codes which will be send to a service which retrieves the exchange rate. To get the codes we use a compiled list of currency names and codes from [this Github repository](https://github.com/OpenBookPrices/country-data/blob/master/data/currencies.csv) (--> https://github.com/OpenBookPrices/country-data/blob/master/data/currencies.csv).

In [20]:
currency_codes = pd.read_csv("data/currencies.csv")

In [21]:
currency_codes

Unnamed: 0,code,decimals,name,number
0,AED,2,United Arab Emirates dirham,784
1,AFN,2,Afghan afghani,971
2,ALL,2,Albanian lek,8
3,AMD,2,Armenian dram,51
4,ANG,2,Netherlands Antillean guilder,532
...,...,...,...,...
173,XTS,.,Code reserved for testing purposes,963
174,XXX,.,No currency,999
175,YER,2,Yemeni rial,886
176,ZAR,2,South African rand,710


In [22]:
import json

In [23]:
with open(file="data/currencies.json", mode="r") as data_file:
    currency_codes_json = json.load(data_file)

In [24]:
currency_codes_json

[{'code': 'AED',
  'decimals': 2,
  'name': 'United Arab Emirates dirham',
  'number': '784'},
 {'code': 'AFN', 'decimals': 2, 'name': 'Afghan afghani', 'number': '971'},
 {'code': 'ALL', 'decimals': 2, 'name': 'Albanian lek', 'number': '8'},
 {'code': 'AMD', 'decimals': 2, 'name': 'Armenian dram', 'number': '51'},
 {'code': 'ANG',
  'decimals': 2,
  'name': 'Netherlands Antillean guilder',
  'number': '532'},
 {'code': 'AOA', 'decimals': 2, 'name': 'Angolan kwanza', 'number': '973'},
 {'code': 'ARS', 'decimals': 2, 'name': 'Argentine peso', 'number': '32'},
 {'code': 'AUD', 'decimals': 2, 'name': 'Australian dollar', 'number': '36'},
 {'code': 'AWG', 'decimals': 2, 'name': 'Aruban florin', 'number': '533'},
 {'code': 'AZN', 'decimals': 2, 'name': 'Azerbaijani manat', 'number': '944'},
 {'code': 'BAM',
  'decimals': 2,
  'name': 'Bosnia and Herzegovina convertible mark',
  'number': '977'},
 {'code': 'BBD', 'decimals': 2, 'name': 'Barbados dollar', 'number': '52'},
 {'code': 'BDT', 'de

In [25]:
currency_codes_json[0]['name'].lower().replace(" ", "")

'unitedarabemiratesdirham'

In [26]:
for currency in currency_codes_json:
    print(currency['name'])

United Arab Emirates dirham
Afghan afghani
Albanian lek
Armenian dram
Netherlands Antillean guilder
Angolan kwanza
Argentine peso
Australian dollar
Aruban florin
Azerbaijani manat
Bosnia and Herzegovina convertible mark
Barbados dollar
Bangladeshi taka
Bulgarian lev
Bahraini dinar
Burundian franc
Bermudian dollar (customarily known as Bermuda dollar)
Brunei dollar
Boliviano
Bolivian Mvdol (funds code)
Brazilian real
Bahamian dollar
Bhutanese ngultrum
Botswana pula
Belarusian ruble
Belize dollar
Canadian dollar
Congolese franc
WIR Euro (complementary currency)
Swiss franc
WIR Franc (complementary currency)
Unidad de Fomento (funds code)
Chilean peso
Chinese yuan
Colombian peso
Unidad de Valor Real
Costa Rican colon
Cuban convertible peso
Cuban peso
Cape Verde escudo
Czech koruna
Djiboutian franc
Danish krone
Dominican peso
Algerian dinar
Egyptian pound
Eritrean nakfa
Ethiopian birr
Euro
Fiji dollar
Falkland Islands pound
Pound sterling
Georgian lari
Ghanaian cedi
Gibraltar pound
Gambian

## Get currency codes

The following function retrieves the currency codes for a searched currency name from the currency code dataset.

In [27]:
def get_currency_code(searchFor):
    for currency in currency_codes_json:
        currency_name = currency['name'].lower().replace(" ", "")
        if searchFor == currency_name:
            return currency['code']
        elif searchFor == "usdollar":
            return "USD"
            # print(searchFor)
            # print(currency['code'])
            # print(currency['name'])

In [28]:
get_currency_code("euro")

'EUR'

## Currency conversion

We use forex-python to get the exchange rate for a currency pair and convert the revenue to USD.

In [29]:
!pip install forex-python 



In [30]:
from forex_python.converter import CurrencyRates
import datetime

In [31]:
c = CurrencyRates()

In [32]:
converted_revenue = []

In [33]:
for key, value in revenue_dict.items():
    currency_dbpedia = revenue_dict[key]['datatype'].rsplit('/', 1)[-1].lower()
    currency_code = get_currency_code(currency_dbpedia)
    

    if currency_code == "USD":
        rev_converted = float(revenue_dict[key]['value'])
    elif currency_code == None:
        rev_converted = None

    else:
        try:
            currency_rate = c.get_rate(currency_code, "USD")
            print("currency code: {currency_code} ///// currency rate: {currency_rate}".format(currency_code=currency_code, currency_rate=currency_rate))
            
            rev_converted = float(revenue_dict[key]['value']) * currency_rate # is not float but string in scientific notations
        except:
            rev_converted = None



    print(rev_converted)
    
    converted_revenue.append(rev_converted)
    
    # print(rev_converted)

    # print(currency_code)
    # print(currency_dbpedia)
    #print(type(revenue_dict[key]['value']))


currency code: CAD ///// currency rate: 0.738723466981132
989889445.7547169
currency code: EUR ///// currency rate: 1.0023
2274218700.0
1242400000.0
4308000000.0
currency code: EUR ///// currency rate: 1.0023
118271400.0
10000000000.0
25870000000.0
currency code: GBP ///// currency rate: 1.1573502072676467
77.77393392838586
currency code: EUR ///// currency rate: 1.0023
8128653.0
60000000.0
13000000000.0
currency code: JPY ///// currency rate: 0.006803556882975835
51.18315843062721
1849.0
currency code: EUR ///// currency rate: 1.0023
10023000.0
currency code: EUR ///// currency rate: 1.0023
456046500.0
30800000000.0
1296.0
1200000000.0
661000000.0
currency code: GBP ///// currency rate: 1.1573502072676467
633880708.52049
10113000000.0
currency code: EUR ///// currency rate: 1.0023
18843240000.0
5148000000.0
-72000000.0
currency code: EUR ///// currency rate: 1.0023
501150000.0
14700000000.0
40000000000.0
currency code: NOK ///// currency rate: 0.09692673680953118
2177.0714354788797
cu

In [34]:
len(converted_revenue)

6918

In [35]:
converted_revenue

[989889445.7547169,
 2274218700.0,
 1242400000.0,
 4308000000.0,
 118271400.0,
 10000000000.0,
 25870000000.0,
 77.77393392838586,
 8128653.0,
 60000000.0,
 13000000000.0,
 51.18315843062721,
 1849.0,
 10023000.0,
 456046500.0,
 30800000000.0,
 1296.0,
 1200000000.0,
 661000000.0,
 633880708.52049,
 10113000000.0,
 18843240000.0,
 5148000000.0,
 -72000000.0,
 501150000.0,
 14700000000.0,
 40000000000.0,
 2177.0714354788797,
 13430221286.994299,
 1.59,
 316000000.0,
 55808064000.0,
 186197271.0,
 8264000000.0,
 2535819000.0,
 2369096996.8703527,
 1050410400.0,
 420000000.0,
 109480000000.0,
 720000000.0,
 155000000.0,
 4880000.0,
 1067449500.0,
 982400000.0,
 79000000.0,
 8600000000.0,
 1800000000.0,
 243858.0,
 13520000000.0,
 2220.3,
 100000000.0,
 0.0380952380952381,
 1516128771.5206172,
 9696000000.0,
 703509000.0,
 51.2,
 9944.0,
 6200000000.0,
 3100000000.0,
 1288957800.0,
 20046000.0,
 181100000.0,
 1390000000.0,
 37185330000.0,
 2216170400.943396,
 782094690.0,
 41896140000.0,
 

In [36]:
sum(x is None for x in converted_revenue)

228

In [37]:
dbpedia_csv = pd.read_csv("data/dbpedia_final.csv")

In [38]:
dbpedia_csv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6918 entries, 0 to 6917
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   name                 6918 non-null   object 
 1   industry_label_list  6918 non-null   object 
 2   revenue              6918 non-null   float64
 3   netIncome            9 non-null      float64
 4   equity               9 non-null      float64
 5   numEmployees         6918 non-null   int64  
 6   locationCity_list    9 non-null      object 
 7   stockSymbol          9 non-null      object 
 8   tradedAs_list        9 non-null      object 
 9   ceo_name_list        9 non-null      object 
 10  foundingDate         9 non-null      object 
 11  founded_list         9 non-null      object 
dtypes: float64(3), int64(1), object(8)
memory usage: 648.7+ KB


Add the converted revenue to the from dbpedia initially retrieved csv.

In [39]:
revenue_df = pd.DataFrame(converted_revenue, columns=['revenue_usd'])

In [40]:
dbpedia_csv['revenue_usd'] = converted_revenue

In [41]:
dbpedia_csv.head()

Unnamed: 0,name,industry_label_list,revenue,netIncome,equity,numEmployees,locationCity_list,stockSymbol,tradedAs_list,ceo_name_list,foundingDate,founded_list,revenue_usd
0,A&W (Canada),Fast food,1340000000.0,,,20000,,,,,,,989889400.0
1,ALTEN,"Consultant,Information technology,Outsourcing",2269000000.0,,,33700,,,,,,,2274219000.0
2,AMC Theatres,Entertainment,1242400000.0,,,4408,,,,,,,1242400000.0
3,APA Corporation,Petroleum industry,4308000000.0,,,3163,,,,,,,4308000000.0
4,AFC Ajax N.V.,Sport industry,118000000.0,,,339,,,,,,,118271400.0


In [43]:
dbpedia_csv.to_csv("data/dbpedia_preprocessed.csv")