# Final project: Global macroeconomic data

### Are we about to enter the next big world economical crisis?

Goals: Compare current trends in world economic indicators to the trends in previous crisis periods to assess whether the world is about to enter or perhaps already in the next global recession. Predict next year's (2024) indicators.

Main indicator of recession:  real per capita global GDP

Further important indicators of economical activity: Inflation (Consumer Price Index), unemployment, government debt, house prices, current account balance, government expenditure, industrial production, trade, capital flows, business confidence


### Databases

Econdb

World Bank

International Monetary Fund


### Notebooks

1-Data acquisition

2-Data cleaning and preparation

3- Machine Learning: Time Series and Regression

4- Machine Learning: Deep Learning Classification


In [1]:
import pandas as pd
import requests
import json
from pandas.io.json import json_normalize
import getpass
from time import sleep
import random
import time
from time import sleep
from random import randint

# Data acquisition

## Web scraping: macroeconomy by country

Econdb provides country data for several countries from different platforms, but no global data.
I will use it to extract information about G20 countries and see what I can do with it.

In [2]:
# In the Econdb database, the indicators are identified by codes followed by the country code

# Main indicators found in this database:
# Real gross domestic product (RGDP)
# Consumer Price Index (CPI)
# Unemployment (URATE)
# Government debt (GDEBT)
# House prices (HOU)
# Current account balance (CA)
# Governenment expenditure  (GSPE)
# Industrial production (IP)

# G20 countries:
# Argentina (AR)
# Australia (AU)
# Brazil (BR)
# Canada (CA)
# China (CN)
# France (FR)
# Germany (DE)
# India (IN)
# Indonesia (ID)
# Italy (IT)
# Japan (JP)
# Mexico (MX)
# Russia (RU)
# Saudi Arabia (SA)
# South Africa (ZA)
# South Korea (KR)
# Turkey (TR)
# United Kingdom (UK)
# United States (US)


In [3]:
# The database provides data as CSV. The code of indicator+country is part of the URL (in this example CPIUK) 

API_TOKEN = 'b42767206d76188148990fdb986c35cf4121d717'

df = pd.read_csv(
    'https://www.econdb.com/api/series/CPIUK/?token=%s&format=csv' % API_TOKEN,
    index_col='Date', parse_dates=['Date'])

df

Unnamed: 0_level_0,CPIUK
Date,Unnamed: 1_level_1
1988-01-01,48.4
1988-02-01,48.6
1988-03-01,48.7
1988-04-01,49.3
1988-05-01,49.5
...,...
2023-04-01,130.4
2023-05-01,131.3
2023-06-01,131.5
2023-07-01,130.9


In [4]:
#function to get a list of countries and indicator and scrape it from Econdb
def indicator_by_country(indicators_list, countries_list):
    indicators_country = pd.DataFrame()
    indicators_df = pd.DataFrame()
    for c in range(len(countries_list)):
        for i in range(len(indicators_list)):
            #exports as CSV
            url = 'https://www.econdb.com/api/series/'+indicators_list[i]+countries_list[c]+'/?token=%s&format=csv'
            try:
                #get CSV into a df and concatenate with the existing data after every country/indicator
                df = pd.read_csv(url)
                indicators_country = pd.concat([indicators_country, df], axis=1)
            except:
                pass
            wait_time = randint(1,4)
            sleep(wait_time)
    indicators_df= pd.concat([indicators_df, indicators_country], axis=1)
    
    return indicators_df

In [5]:
indicators = ['RGDP', 'CPI','URATE','GDEBT','HOU','CA','GSPE','IP']
countries = ['AR','AU','BR','CA','CN','FR','DE','IN','ID','IT','JP','MX','RU','SA','ZA','KR','TR','UK','US']

In [6]:
%%time
indicators_df = indicator_by_country(indicators,countries)

CPU times: total: 1.98 s
Wall time: 7min 29s


In [12]:
# indicators_df.head()

In [13]:
# indicators_df.tail()

In [14]:
# indicators_df.shape

In [11]:
indicators_df.to_csv(r'C:\Users\menno\IH-Labs\final-project\01_CSV files\indicators_countries.csv', index=False)

## API: World Bank Database

The World Bank Database provides global and country information about all main economic indicators. The data is available for years between 1960 and 2023.

To be consistent with the previous data set, we are going to extract G20 countries, but also Global data.

In [16]:
# There is a python package for this API

# !pip install wbgapi
import wbgapi as wb

In [17]:
# Example of query: GDP per capta, global (world)
# One needs to know the indicator code and the country code
df = wb.data.DataFrame(['NY.GDP.PCAP.CD'], 'WLD', time=range(1950,2023), columns='series')
df

Unnamed: 0_level_0,NY.GDP.PCAP.CD
time,Unnamed: 1_level_1
YR1960,456.826375
YR1961,471.686740
YR1962,495.967407
YR1963,523.401899
YR1964,561.424407
...,...
YR2018,11290.099975
YR2019,11330.453670
YR2020,10895.735233
YR2021,12282.000071


In [18]:
# Checking the periods of time available in the database
wb.time.info()

id,value
YR1960,1960
YR1961,1961
YR1962,1962
YR1963,1963
YR1964,1964
YR1965,1965
YR1966,1966
YR1967,1967
YR1968,1968
YR1969,1969


In [19]:
#function to get lists of indicators codes, indicators names and countries codes and scrape the data
def indicators_world_bank(indicators_id, indicators_name, countries_list):
    indicators_df = pd.DataFrame()
    for c in range(len(countries_list)):
        for i in range(len(indicators_id)):
            try:
                #Get indicator for the country
                indicator_country = wb.data.DataFrame([indicators_id[i]], countries_list[c], range(1960,2023), columns='series')
                #change column name to make it human-friendly
                column_name = countries_list[c]+'_'+indicators_name[i]
                indicator_country.columns= ([column_name])
                #concatenate the data
                indicators_df= pd.concat([indicators_df, indicator_country], axis=1)
            except:
                pass
            wait_time = randint(1,4)
            sleep(wait_time)
        # progress indication
        print(countries_list[c],' done')    
    return indicators_df 


In [20]:
# Getting a list of countries codes
codes = {}

countries_list = ['World', 'Argentina', 'Australia','Brazil','Canada','China','France','Germany','India','Indonesia','Italy','Japan',
               'Mexico', 'Russia','Saudi Arabia','South Africa','South Korea','Turkey','United Kingdom','United States']
for country in countries_list:
    codes[country] = wb.economy.coder(country)

codes['World'] = 'WLD' # Search did not return a code, I got this code from the website
codes = pd.DataFrame.from_dict(codes, orient='index').reset_index()
codes.columns=(['country', 'code'])
codes

Unnamed: 0,country,code
0,World,WLD
1,Argentina,ARG
2,Australia,AUS
3,Brazil,BRA
4,Canada,CAN
5,China,CHN
6,France,FRA
7,Germany,DEU
8,India,IND
9,Indonesia,IDN


In [21]:
countries_list = codes['code'].to_list()
countries_list

['WLD',
 'ARG',
 'AUS',
 'BRA',
 'CAN',
 'CHN',
 'FRA',
 'DEU',
 'IND',
 'IDN',
 'ITA',
 'JPN',
 'MEX',
 'RUS',
 'SAU',
 'ZAF',
 'KOR',
 'TUR',
 'GBR',
 'USA']

In [22]:
# Indicators found in the world bank database and their respective identification code 
indicators = {'gdp_per_capita_constant_us$':'NY.GDP.PCAP.KD',
              'gdp_per_capita_%_growth':'NY.GDP.PCAP.KD.ZG',
              'consumer_price_index_%':'FP.CPI.TOTL.ZG',
              'unemployment_%_of_total_labor_force':'SL.UEM.TOTL.ZS',
              'government_debt_total_local_currency':'GC.DOD.TOTL.CN',
              'government_debt_total_%_of_gdp':'GC.DOD.TOTL.GD.ZS',
              'government_expense_local_currency': 'GC.XPN.TOTL.CN',
              'government_expense_%_gdp':'GC.XPN.TOTL.GD.ZS',
              'current_account_balance_us$':'BN.CAB.XOKA.CD',
              'current_account_balance_%_gdp':'BN.CAB.XOKA.GN.ZS',
              'industry_value_added_us$': 'GC.XPN.TOTL.GD.ZS',
              'industry_value_added_%_growth':'NV.IND.TOTL.KD.ZG',
              'net_trade_goods_services_us$':'BN.GSR.GNFS.CD',
              'net_financial_flows_multilateral_us$':'DT.NFL.MLAT.CD',
            'gdp_%_growth':'NY.GDP.MKTP.KD.ZG'}

In [23]:
indicators_df = pd.DataFrame.from_dict(indicators, orient='index').reset_index()
indicators_df.columns=(['indicator', 'id'])
indicators_df

Unnamed: 0,indicator,id
0,gdp_per_capita_constant_us$,NY.GDP.PCAP.KD
1,gdp_per_capita_%_growth,NY.GDP.PCAP.KD.ZG
2,consumer_price_index_%,FP.CPI.TOTL.ZG
3,unemployment_%_of_total_labor_force,SL.UEM.TOTL.ZS
4,government_debt_total_local_currency,GC.DOD.TOTL.CN
5,government_debt_total_%_of_gdp,GC.DOD.TOTL.GD.ZS
6,government_expense_local_currency,GC.XPN.TOTL.CN
7,government_expense_%_gdp,GC.XPN.TOTL.GD.ZS
8,current_account_balance_us$,BN.CAB.XOKA.CD
9,current_account_balance_%_gdp,BN.CAB.XOKA.GN.ZS


In [24]:
# testing the function
test_df = indicators_df[:2]
indicators_id = test_df['id'].to_list()
indicators_name = test_df['indicator'].to_list()
test_countries = ['JPN', 'WLD']
test_indicators = indicators_world_bank(indicators_id, indicators_name, test_countries)

test_indicators

JPN  done
WLD  done


Unnamed: 0_level_0,JPN_gdp_per_capita_constant_us$,JPN_gdp_per_capita_%_growth,WLD_gdp_per_capita_constant_us$,WLD_gdp_per_capita_%_growth
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
YR1960,6260.675970,,3608.126850,
YR1961,6952.109597,11.044073,3694.968086,2.406823
YR1962,7501.445444,7.901714,3823.614985,3.481678
YR1963,8055.041395,7.379857,3937.998747,2.991508
YR1964,8902.495553,10.520792,4109.998137,4.367685
...,...,...,...,...
YR2018,36138.529082,0.771168,10777.870452,2.161637
YR2019,36043.784401,-0.262171,10941.964488,1.522509
YR2020,34603.265056,-3.996582,10499.647111,-4.042395
YR2021,35507.578588,2.613376,11037.294044,5.120619


In [25]:
indicators_id = indicators_df['id'].to_list()
indicators_name = indicators_df['indicator'].to_list()

In [26]:
%%time
indicators_worldbank = indicators_world_bank(indicators_id, indicators_name, countries_list)

WLD  done
ARG  done
AUS  done
BRA  done
CAN  done
CHN  done
FRA  done
DEU  done
IND  done
IDN  done
ITA  done
JPN  done
MEX  done
RUS  done
SAU  done
ZAF  done
KOR  done
TUR  done
GBR  done
USA  done
CPU times: total: 18.6 s
Wall time: 16min 16s


In [27]:
indicators_worldbank= indicators_worldbank.reset_index(drop=True)
indicators_worldbank

Unnamed: 0,WLD_gdp_per_capita_constant_us$,WLD_gdp_per_capita_%_growth,WLD_consumer_price_index_%,WLD_unemployment_%_of_total_labor_force,WLD_government_debt_total_local_currency,WLD_government_debt_total_%_of_gdp,WLD_government_expense_local_currency,WLD_government_expense_%_gdp,WLD_current_account_balance_us$,WLD_industry_value_added_us$,...,USA_government_debt_total_local_currency,USA_government_debt_total_%_of_gdp,USA_government_expense_local_currency,USA_government_expense_%_gdp,USA_current_account_balance_us$,USA_industry_value_added_us$,USA_industry_value_added_%_growth,USA_net_trade_goods_services_us$,USA_net_financial_flows_multilateral_us$,USA_gdp_%_growth
0,3608.126850,,,,,,,,,,...,,,,,,,,,,
1,3694.968086,2.406823,,,,,,,,,...,,,,,,,,,,2.300000
2,3823.614985,3.481678,,,,,,,,,...,,,,,,,,,,6.100000
3,3937.998747,2.991508,,,,,,,,,...,,,,,,,,,,4.400000
4,4109.998137,4.367685,,,,,,,,,...,,,,,,,,,,5.800000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,10777.870452,2.161637,2.438737,5.699076,,,,26.301876,,26.301876,...,2.034025e+13,99.060976,4.583096e+12,22.320573,-4.398510e+11,22.320573,3.120148,-5.786000e+11,,2.945385
59,10941.964488,1.522509,2.206073,5.543161,,,,26.944593,,26.944593,...,2.155310e+13,100.805042,4.859070e+12,22.726136,-4.459550e+11,22.726136,2.286841,-5.596750e+11,,2.294439
60,10499.647111,-4.042395,1.928955,6.898273,,,,33.696136,,33.696136,...,2.658516e+13,126.232494,6.780402e+12,32.194915,-6.197020e+11,32.194915,-3.520053,-6.539900e+11,,-2.767803
61,11037.294044,5.120619,3.475403,6.201871,,,,32.167331,,32.167331,...,2.806530e+13,120.374017,7.310239e+12,31.354124,-8.463540e+11,31.354124,3.254431,-8.450500e+11,,5.945485


In [28]:
indicators_worldbank.to_csv(r'C:\Users\menno\IH-Labs\final-project\01_CSV files\indicators_worldbank.csv', index=False)

## Database Download:  IMF

The international Monetary Fund offers the possibility of downloading their entired databases directly from the website.

They contain countries and global data since 1980 and provide estimates for 2022-2027.

In [34]:
# Database for individual countries
df_countries = pd.read_excel(r'C:\Users\menno\IH-Labs\final-project\01_CSV files\WEOApr2023all.xlsx')
df_countries

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Subject Notes,Units,Scale,Country/Series-specific Notes,1980,...,2020,2021,2022,2023,2024,2025,2026,2027,2028,Estimates Start After
0,512,AFG,NGDP_R,Afghanistan,"Gross domestic product, constant prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,1288.869,,,,,,,,,2020.0
1,512,AFG,NGDP_RPCH,Afghanistan,"Gross domestic product, constant prices",Annual percentages of constant price GDP are y...,Percent change,,"See notes for: Gross domestic product, consta...",,...,-2.351,,,,,,,,,2020.0
2,512,AFG,NGDP,Afghanistan,"Gross domestic product, current prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,1547.289,,,,,,,,,2020.0
3,512,AFG,NGDPD,Afghanistan,"Gross domestic product, current prices",Values are based upon GDP in national currency...,U.S. dollars,Billions,"See notes for: Gross domestic product, curren...",,...,20.136,,,,,,,,,2020.0
4,512,AFG,PPPGDP,Afghanistan,"Gross domestic product, current prices",These data form the basis for the country weig...,Purchasing power parity; international dollars,Billions,"See notes for: Gross domestic product, curren...",,...,81.007,,,,,,,,,2020.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8621,698,ZWE,NGDP_FY,Zimbabwe,Gross domestic product corresponding to fiscal...,Gross domestic product corresponding to fiscal...,National currency,Billions,Source: Ministry of Finance or Treasury Latest...,,...,1378.795,3185.624,12889.976,40449.579,103721.176,240299.084,456210.642,682456.905,904204.582,2021.0
8622,698,ZWE,BCA,Zimbabwe,Current account balance,Current account is all transactions other than...,U.S. dollars,Billions,Source: Reserve Bank of Zimbabwe and Ministry ...,-0.301,...,0.678,0.348,0.267,0.12,0.203,0.2,0.184,0.154,0.143,2021.0
8623,698,ZWE,BCA_NGDPD,Zimbabwe,Current account balance,Current account is all transactions other than...,Percent of GDP,,"See notes for: Gross domestic product, curren...",,...,2.524,0.968,0.808,0.402,0.762,0.714,0.646,0.578,0.515,2021.0
8624,,,,,,,,,,,...,,,,,,,,,,


In [37]:
# Database for country groups
df_country_groups = pd.read_excel(r'C:\Users\menno\IH-Labs\final-project\01_CSV files\WEOApr2023alla.xlsx')
df_country_groups

Unnamed: 0,WEO Country Group Code,WEO Subject Code,Country Group Name,Subject Descriptor,Subject Notes,Units,Scale,Country/Series-specific Notes,1980,1981,...,2020,2021,2022,2023,2024,2025,2026,2027,2028,Estimates Start After
0,1,NGDP_RPCH,World,"Gross domestic product, constant prices",,Percent change,,,2.059,1.944,...,-2.805,6.275,3.416,2.831,3.023,3.157,3.160,3.065,3.049,
1,1,NGDP_RPCHMK,World,"Gross domestic product, constant prices",,Percent change (market exchange rates),,,1.638,1.992,...,-3.218,6.040,3.009,2.369,2.437,2.686,2.708,2.619,2.593,
2,1,NGDPD,World,"Gross domestic product, current prices",,U.S. dollars,Billions,,11236.068,11496.407,...,84894.923,96314.464,100218.398,105568.776,110763.878,116449.845,122425.775,128505.325,134950.483,
3,1,PPPGDP,World,"Gross domestic product, current prices",,Purchasing power parity; international dollars,Billions,,13428.478,14959.641,...,133368.115,147909.745,163510.095,174471.283,183653.986,193002.212,202751.822,212711.061,223271.938,
4,1,NGDP_D,World,"Gross domestic product, deflator",,"Index, 2000=100",,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1661,603,PZINC,Sub-Saharan Africa,"Zinc, high grade 98% pure, US$ per metric tonne",,U.S. dollars,Units,,,,...,,,,,,,,,,
1662,603,PLEAD,Sub-Saharan Africa,"Lead, 99.97% pure, LME spot price, CIF Europea...",,U.S. dollars,Units,,,,...,,,,,,,,,,
1663,603,PURAN,Sub-Saharan Africa,"Uranium, u3o8 restricted price, Nuexco exchang...",,U.S. dollars,Units,,,,...,,,,,,,,,,
1664,,,,,,,,,,,...,,,,,,,,,,


In [38]:
df = pd.concat([df_countries, df_country_groups], axis=0)
df

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Subject Notes,Units,Scale,Country/Series-specific Notes,1980,...,2022,2023,2024,2025,2026,2027,2028,Estimates Start After,WEO Country Group Code,Country Group Name
0,512,AFG,NGDP_R,Afghanistan,"Gross domestic product, constant prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,,,,,,,,2020.0,,
1,512,AFG,NGDP_RPCH,Afghanistan,"Gross domestic product, constant prices",Annual percentages of constant price GDP are y...,Percent change,,"See notes for: Gross domestic product, consta...",,...,,,,,,,,2020.0,,
2,512,AFG,NGDP,Afghanistan,"Gross domestic product, current prices",Expressed in billions of national currency uni...,National currency,Billions,Source: National Statistics Office Latest actu...,,...,,,,,,,,2020.0,,
3,512,AFG,NGDPD,Afghanistan,"Gross domestic product, current prices",Values are based upon GDP in national currency...,U.S. dollars,Billions,"See notes for: Gross domestic product, curren...",,...,,,,,,,,2020.0,,
4,512,AFG,PPPGDP,Afghanistan,"Gross domestic product, current prices",These data form the basis for the country weig...,Purchasing power parity; international dollars,Billions,"See notes for: Gross domestic product, curren...",,...,,,,,,,,2020.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1661,,,PZINC,,"Zinc, high grade 98% pure, US$ per metric tonne",,U.S. dollars,Units,,,...,,,,,,,,,603,Sub-Saharan Africa
1662,,,PLEAD,,"Lead, 99.97% pure, LME spot price, CIF Europea...",,U.S. dollars,Units,,,...,,,,,,,,,603,Sub-Saharan Africa
1663,,,PURAN,,"Uranium, u3o8 restricted price, Nuexco exchang...",,U.S. dollars,Units,,,...,,,,,,,,,603,Sub-Saharan Africa
1664,,,,,,,,,,,...,,,,,,,,,,


In [39]:
# As I already have consolidated data until 2022 from the World Bank, I want to use the IMF data because of their estimates
# for 2023. Therefore, I am filtering the dataset to show me only the countries where there is an estimate for this year.

df_with_2023 = df[df[2023].isna()==False]
df_with_2023

Unnamed: 0,WEO Country Code,ISO,WEO Subject Code,Country,Subject Descriptor,Subject Notes,Units,Scale,Country/Series-specific Notes,1980,...,2022,2023,2024,2025,2026,2027,2028,Estimates Start After,WEO Country Group Code,Country Group Name
44,914,ALB,NGDP_R,Albania,"Gross domestic product, constant prices",Expressed in billions of national currency uni...,National currency,Billions,Source: IMF Staff Estimates. Official national...,311.514,...,910.293,930.774,961.862,994.565,1028.38,1063.345,1099.499,2021.0,,
45,914,ALB,NGDP_RPCH,Albania,"Gross domestic product, constant prices",Annual percentages of constant price GDP are y...,Percent change,,"See notes for: Gross domestic product, consta...",2.684,...,3.74,2.25,3.34,3.4,3.4,3.4,3.4,2021.0,,
46,914,ALB,NGDP,Albania,"Gross domestic product, current prices",Expressed in billions of national currency uni...,National currency,Billions,Source: IMF Staff Estimates. Official national...,18.489,...,2073.967,2189.584,2310.315,2432.559,2562.478,2699.746,2844.763,2021.0,,
47,914,ALB,NGDPD,Albania,"Gross domestic product, current prices",Values are based upon GDP in national currency...,U.S. dollars,Billions,"See notes for: Gross domestic product, curren...",1.946,...,18.509,20.177,20.847,21.932,23.086,24.217,25.46,2021.0,,
48,914,ALB,PPPGDP,Albania,"Gross domestic product, current prices",These data form the basis for the country weig...,Purchasing power parity; international dollars,Billions,"See notes for: Gross domestic product, curren...",5.759,...,51.237,54.392,57.454,60.536,63.76,67.126,70.716,2021.0,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1584,,,DSI_NGDPD,,"External debt, total debt service, interest",,Percent of GDP,,,1.04,...,0.93,1.063,1.097,,,,,,603,Sub-Saharan Africa
1585,,,DSI_BX,,"External debt, total debt service, interest",,Percent of exports of goods and services,,,,...,3.724,4.437,4.696,,,,,,603,Sub-Saharan Africa
1586,,,DSP,,"External debt, total debt service, amortization",,U.S. dollars,Billions,,7.193,...,89.137,122.346,100.093,,,,,,603,Sub-Saharan Africa
1587,,,DSP_NGDPD,,"External debt, total debt service, amortization",,Percent of GDP,,,2.266,...,4.431,5.812,4.431,,,,,,603,Sub-Saharan Africa


In [41]:
df_with_2023.to_csv(r'C:\Users\menno\IH-Labs\final-project\01_CSV files\indicators_imf.csv', index=False)