## GDP Growth Rate

In [2]:
import pandas as pd
import numpy as np

- **path_ir**: path of the GDP growth rate file for an individual country (the data can only be downloaded one country at a time)
- **country_code**: country code used throughout (refer to README)
- **fred_code**: alphanumeric code given for each dataset, for each country by the FRED website

In [11]:
def make_clean(path_gdp, country_code, fred_code):
    
    df_gdp = pd.read_csv(path_gdp)
    df_gdp['DATE'] = pd.to_datetime(df_gdp['DATE'])
    df_gdp['month'] = df_gdp['DATE'].dt.month
    df_gdp['year'] = df_gdp['DATE'].dt.year
    df_gdp = df_gdp.rename(columns={fred_code: f'{country_code}_GDP', 'DATE': 'index'})
    df_gdp['shift'] = df_gdp[f'{country_code}_GDP'].shift(1)
    df_gdp = df_gdp.dropna()

    """
    For this section, in order to make the dates uniform, we need to use an interest rate file for reference; any file can be used.
    Take note of the fred_code of the file because we will need it in the 'change_columns' function. For reference, the interest rate 
    file used was Switzerland. 
    """
    #need this for time reference 
    test = pd.read_csv('<path to any interest rate file>')
    test['DATE'] = pd.to_datetime(test['DATE'])
    test['month'] = test['DATE'].dt.month
    test['year'] = test['DATE'].dt.year
    test = test.rename(columns={'DATE': 'index'})

    merged_df = pd.merge(test, df_gdp, left_on=['index'], right_on=['index'], how='outer').ffill().bfill()

    return merged_df


In [6]:
def change_columns(df, country_code):
    
    #replace the fred_code written here with the fred_code from the interest rate file chosen 
    df = df.drop(columns=['IRSTCI01CHM156N', 'month_y', 'year_y', f'{country_code}_GDP'])
    df.rename(columns={'month_x':'month', 'year_x':'year', 'shift':f'{country_code}_GDP'}, inplace=True)
    df[f'{country_code}_GDP'] = df[f'{country_code}_GDP']/100

    return df

In [7]:
def append_df(df, df_gdp):
    
    df_gdp.drop(columns='index', inplace=True)
    df = pd.merge(df, df_gdp, left_on=['month', 'year'], right_on=['month', 'year'])

    return df

In [8]:
def data_combine(path, df_gdp):
    
    exchange_df = pd.read_csv(path) 
    print(exchange_df.shape)

    df_with_gdp = pd.merge(exchange_df, df_gdp, left_on=['month', 'year'], right_on=['month', 'year'])
    df_with_gdp.drop(columns='index', inplace=True)

    return df_with_gdp

In [9]:
#replace country_code & fred_code values with values for the chosen GDP file 

gdp_path = '<path to gdp file>'
country_code = 'USD'
fred_code = 'NAEXKP01USQ657S'

In [12]:
df_gdp = make_clean(gdp_path, country_code, fred_code)
print(df_gdp.isna().sum())

df_gdp = change_columns(df_gdp, country_code)
df_gdp #240 rows


index              0
IRSTCI01CHM156N    0
month_x            0
year_x             0
USD_GDP            0
month_y            0
year_y             0
shift              0
dtype: int64


Unnamed: 0,index,month,year,USD_GDP
0,2000-01-01,1,2000,0.016996
1,2000-02-01,2,2000,0.016996
2,2000-03-01,3,2000,0.016996
3,2000-04-01,4,2000,0.003618
4,2000-05-01,5,2000,0.003618
...,...,...,...,...
235,2019-08-01,8,2019,0.003707
236,2019-09-01,9,2019,0.003707
237,2019-10-01,10,2019,0.006369
238,2019-11-01,11,2019,0.006369


As the data can only be downloaded one country at a time, we have to manually merge the GDP growth rate values into one dataframe. In order to do so, a copy of the first 'df_gdp' is made and set aside as the dataframe (df) to which the other GDP growth rate values will be appended to. From the second 'df_gdp' onward, we will be appending the values to 'df' to get a dataset that only contains the GDP growth rate values.

In [77]:
#use only for the first instance, comment out afterwards
df = df_gdp.copy()

In [78]:
#start using from the second instance
df = append_df(df, df_gdp)

In [79]:
df #240 rows

Unnamed: 0,index,month,year,AUD_GDP,NZD_GDP,GBP_GDP,BRL_GDP,CND_GDP,IDR_GDP,KRW_GDP,MXN_GDP,ZAR_GDP,DKK_GDP,JPY_GDP,NOK_GDP,SEK_GDP,CHF_GDP,USD_GDP
0,2000-01-01,1,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996
1,2000-02-01,2,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996
2,2000-03-01,3,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996
3,2000-04-01,4,2000,0.003780,0.026107,0.008053,0.010680,0.015997,0.031475,0.018951,0.019163,0.011248,0.011461,0.018088,0.014782,0.007820,0.007515,0.003618
4,2000-05-01,5,2000,0.003780,0.026107,0.008053,0.010680,0.015997,0.031475,0.018951,0.019163,0.011248,0.011461,0.018088,0.014782,0.007820,0.007515,0.003618
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
235,2019-08-01,8,2019,0.007581,0.002036,-0.000179,0.004561,0.008014,0.012521,0.010141,-0.001688,0.008167,0.010579,0.004097,0.002212,0.000627,0.005249,0.003707
236,2019-09-01,9,2019,0.007581,0.002036,-0.000179,0.004561,0.008014,0.012521,0.010141,-0.001688,0.008167,0.010579,0.004097,0.002212,0.000627,0.005249,0.003707
237,2019-10-01,10,2019,0.005059,0.006500,0.003258,0.000588,0.002792,0.012057,0.003755,-0.002268,-0.002096,-0.002002,0.000413,-0.000156,0.003000,0.004473,0.006369
238,2019-11-01,11,2019,0.005059,0.006500,0.003258,0.000588,0.002792,0.012057,0.003755,-0.002268,-0.002096,-0.002002,0.000413,-0.000156,0.003000,0.004473,0.006369


In [80]:
df.to_csv('<path to save GDP growth rate dataset>', index=False)

## With IMF data (for China)

As stated in the README file, there is only annual GDP growth rate data for China from FRED. Because China is an important comparison country, we will use data from the IMF. Data collected from the IMF is in a different format and requires more preprocessing than the data collected from FRED. Additionally, the growth rate was explicitly calculated (see 'calc_rate' function) using the standard GDP growth rate formula.

In [81]:
def make_nice_for_rating(df):
    
    df.drop(columns=['Unnamed: 0', 'Unnamed: 2', 'Indicator', 'Scale', 'Base Year'], inplace=True)
    df.drop([0,1,2,3], inplace=True)
    
    return df

In [82]:
def calc_rate(df):
    
    df_rate = pd.DataFrame((df.iloc[:,1] - df.iloc[:,0])/(df.iloc[:,0]))
    for i in list(range(1, df.shape[1])):
        df_rate[i] = pd.DataFrame((df.iloc[:,i] - df.iloc[:,i-1])/(df.iloc[:,i-1]))
    
    return df_rate

In [83]:
def transpose_merge(df, df_rate, index_num):
    
    df = df.transpose().reset_index()
    df.rename(columns={index_num:'gdp deflator'}, inplace=True)
    df_rate = df_rate.transpose()
    df_merge = pd.merge(df, df_rate, left_index=True, right_index=True)
    df_append = pd.DataFrame([['2019Q4', np.nan, np.nan]], columns=['index', 'gdp deflator', index_num])
    df_merge = df_merge.append(df_append, ignore_index=True)
    df_merge = df_merge.ffill()

    return df_merge


In [93]:
def make_clean_imf(df, country_code, index_num):
    
    df['index'] = pd.to_datetime(df['index'])

    df['year'] = df['index'].dt.year
    df['month'] = df['index'].dt.month

    df = df.rename(columns={index_num: f'{country_code}_GDP'})

    #needed for time reference 
    test = pd.read_csv('<path to any interest rate file>')
    test['DATE'] = pd.to_datetime(test['DATE'])
    test['month'] = test['DATE'].dt.month
    test['year'] = test['DATE'].dt.year
    test = test.rename(columns={'DATE': 'index'})

    merged_df = pd.merge(test, df, left_on=['month', 'year'], right_on=['month', 'year'], how='outer').ffill().bfill()
    merged_df.drop(columns=['index_y', 'IRSTCI01CHM156N', 'gdp deflator'], inplace=True)
    merged_df.rename(columns={'index_x':'index'}, inplace=True)

    return merged_df


In [87]:
df_cny = pd.read_excel('<path to file>', header=7)
df_cny = make_nice_for_rating(df_cny)
df_cny



Unnamed: 0,2000Q1,2000Q2,2000Q3,2000Q4,2001Q1,2001Q2,2001Q3,2001Q4,2002Q1,2002Q2,...,2017Q2,2017Q3,2017Q4,2018Q1,2018Q2,2018Q3,2018Q4,2019Q1,2019Q2,2019Q3
4,66.995,67.8131,68.9522,60.8359,69.7262,69.5984,70.5531,61.9875,69.8987,70.4319,...,122.33,123.171,124.99,126.636,126.243,126.554,128.273,128.31,128.768,128.476


In [88]:
df_rate = calc_rate(df_cny)
df_rate

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,69,70,71,72,73,74,75,76,77,78
4,0.0122114,0.0122114,0.0167975,-0.117709,0.146135,-0.00183271,0.0137171,-0.121406,0.127626,0.00762769,...,-0.00256154,0.00687341,0.0147655,0.0131757,-0.00310647,0.00246736,0.0135799,0.000285552,0.00356856,-0.00226119


In [89]:
df_merge = transpose_merge(df_cny, df_rate, 4)
df_merge #80 rows

Unnamed: 0,index,gdp deflator,4
0,2000Q1,66.995042,0.012211
1,2000Q2,67.813143,0.012211
2,2000Q3,68.952233,0.016797
3,2000Q4,60.835936,-0.117709
4,2001Q1,69.726194,0.146135
...,...,...,...
75,2018Q4,128.273059,0.013580
76,2019Q1,128.309688,0.000286
77,2019Q2,128.767569,0.003569
78,2019Q3,128.476400,-0.002261


In [97]:
df_merge_clean = make_clean_imf(df_merge, 'CNY', 4)
df_merge_clean #240 rows

Unnamed: 0,index,month,year,CNY_GDP
0,2000-01-01,1,2000,0.012211
1,2000-02-01,2,2000,0.012211
2,2000-03-01,3,2000,0.012211
3,2000-04-01,4,2000,0.012211
4,2000-05-01,5,2000,0.012211
...,...,...,...,...
235,2019-08-01,8,2019,-0.002261
236,2019-09-01,9,2019,-0.002261
237,2019-10-01,10,2019,-0.002261
238,2019-11-01,11,2019,-0.002261


In [98]:
df = append_df(df, df_merge_clean)

In [99]:
df

Unnamed: 0,index,month,year,AUD_GDP,NZD_GDP,GBP_GDP,BRL_GDP,CND_GDP,IDR_GDP,KRW_GDP,MXN_GDP,ZAR_GDP,DKK_GDP,JPY_GDP,NOK_GDP,SEK_GDP,CHF_GDP,USD_GDP,CNY_GDP
0,2000-01-01,1,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996,0.012211
1,2000-02-01,2,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996,0.012211
2,2000-03-01,3,2000,0.016971,0.013969,0.014767,0.014192,0.014067,0.001072,0.028863,0.010390,0.010997,0.012470,0.000564,0.017204,0.017814,0.023764,0.016996,0.012211
3,2000-04-01,4,2000,0.003780,0.026107,0.008053,0.010680,0.015997,0.031475,0.018951,0.019163,0.011248,0.011461,0.018088,0.014782,0.007820,0.007515,0.003618,0.012211
4,2000-05-01,5,2000,0.003780,0.026107,0.008053,0.010680,0.015997,0.031475,0.018951,0.019163,0.011248,0.011461,0.018088,0.014782,0.007820,0.007515,0.003618,0.012211
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
235,2019-08-01,8,2019,0.007581,0.002036,-0.000179,0.004561,0.008014,0.012521,0.010141,-0.001688,0.008167,0.010579,0.004097,0.002212,0.000627,0.005249,0.003707,-0.002261
236,2019-09-01,9,2019,0.007581,0.002036,-0.000179,0.004561,0.008014,0.012521,0.010141,-0.001688,0.008167,0.010579,0.004097,0.002212,0.000627,0.005249,0.003707,-0.002261
237,2019-10-01,10,2019,0.005059,0.006500,0.003258,0.000588,0.002792,0.012057,0.003755,-0.002268,-0.002096,-0.002002,0.000413,-0.000156,0.003000,0.004473,0.006369,-0.002261
238,2019-11-01,11,2019,0.005059,0.006500,0.003258,0.000588,0.002792,0.012057,0.003755,-0.002268,-0.002096,-0.002002,0.000413,-0.000156,0.003000,0.004473,0.006369,-0.002261


In [100]:
df.to_csv('<path to save GDP growth rate dataset>', index=False)

- **new_path**: path of new main dataset 1, created from the 'load_data_ir' notebook

In [101]:
new_path = '<path of new main dataset 1>'

In [103]:
df_with_gdp = data_combine(new_path, df)
print(df_with_gdp.shape) #4997 rows
print(df_with_gdp.isna().sum())


(4997, 37)
(4997, 53)
Time Series    0
AUD_USD        0
NZD_USD        0
GBP_USD        0
BRL_USD        0
CND_USD        0
CNY_USD        0
IDR_USD        0
KRW_USD        0
MXN_USD        0
ZAR_USD        0
DKK_USD        0
JPY_USD        0
NOK_USD        0
SEK_USD        0
CHF_USD        0
month          0
year           0
USD_USD        0
price_gold     0
fc_year        0
AUD_IR         0
NZD_IR         0
GBP_IR         0
BRL_IR         0
CND_IR         0
CNY_IR         0
IDR_IR         0
KRW_IR         0
MXN_IR         0
ZAR_IR         0
DKK_IR         0
JPY_IR         0
NOK_IR         0
SEK_IR         0
CHF_IR         0
USD_IR         0
AUD_GDP        0
NZD_GDP        0
GBP_GDP        0
BRL_GDP        0
CND_GDP        0
IDR_GDP        0
KRW_GDP        0
MXN_GDP        0
ZAR_GDP        0
DKK_GDP        0
JPY_GDP        0
NOK_GDP        0
SEK_GDP        0
CHF_GDP        0
USD_GDP        0
CNY_GDP        0
dtype: int64


In [158]:
df_with_gdp.to_csv('<path to save the new main dataset 2>', index=False)