<ins>**Hodrick Prescott Filter**</ins>

By Noah Rubin

December 2021

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.filters.hp_filter import hpfilter

%config InlineBackend.figure_format = 'svg'

In [2]:
df = pd.read_csv('gdp_cap.csv')
df

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Aruba,ABW,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,24985.013920,24712.493260,26441.619940,26893.011510,28396.908420,28452.170610,29350.805020,30253.279360,,
1,Africa Eastern and Southern,AFE,GDP per capita (current US$),NY.GDP.PCAP.CD,147.507808,146.910908,156.078705,182.115000,162.232750,180.087426,...,1769.483634,1734.938417,1712.686908,1701.765354,1546.877709,1429.596045,1571.307053,1573.221585,1527.734558,1356.699267
2,Afghanistan,AFG,GDP per capita (current US$),NY.GDP.PCAP.CD,59.773234,59.860900,58.458009,78.706429,82.095307,101.108325,...,591.190030,638.845852,624.315454,614.223342,556.007221,512.012778,516.679862,485.668419,494.179350,516.747871
3,Africa Western and Central,AFW,GDP per capita (current US$),NY.GDP.PCAP.CD,107.932233,113.081647,118.831107,123.442888,131.854402,138.526332,...,1862.308267,1965.118485,2157.481149,2178.368454,1894.310195,1673.835527,1613.473553,1704.139603,1777.918672,1710.073363
4,Angola,AGO,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,4615.468219,5100.097027,5254.881126,5408.411700,4166.979833,3506.073128,4095.810057,3289.643995,2809.626088,1776.166868
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,Kosovo,XKX,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,3540.891789,3410.859780,3704.784221,3902.676013,3520.766449,3759.560246,4009.380987,4384.048892,4416.108358,4346.637931
262,"Yemen, Rep.",YEM,GDP per capita (current US$),NY.GDP.PCAP.CD,,,,,,,...,1374.621401,1446.536472,1607.152173,1674.002572,1601.830063,1152.720966,964.264810,758.145242,,
263,South Africa,ZAF,GDP per capita (current US$),NY.GDP.PCAP.CD,443.009920,454.962013,473.011405,511.497364,548.996058,584.704163,...,8810.930651,8222.197279,7467.079185,6988.808739,6259.839681,5756.965741,6690.939847,7005.095413,6624.761865,5655.867654
264,Zambia,ZMB,GDP per capita (current US$),NY.GDP.PCAP.CD,232.188565,220.042067,212.578449,213.896759,242.384472,303.281741,...,1672.907535,1763.069442,1878.346811,1762.427817,1338.290927,1280.806543,1535.196574,1516.368371,1305.001031,985.132436


**No need to keep country code and indicator name/code**

In [3]:
df.drop(['Country Code', 'Indicator Name', 'Indicator Code'], axis='columns', inplace=True)

**Tidy Data - Hadley Wickham**

Definitions of a tidy dataset can be subjective at times, though most people would agree on the fact that tidy data normally displays the following characteristics

* Each variable has its own column.
* Each observation is its own row.
* Each value must have its own cell

According to [Hadley Wickam](http://hadley.nz/) (a major contributer to various packages in the R programming language):
* There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity
* Tidy datasets are easy to manipulate, model and visualise, and have a specific structure
* Tidy datasets provide a standardised way to link the structure of a dataset (its physical layout)
with its semantics (its meaning).

---

In this dataset, we have year values representing column nanmes, though it'd be more suitable to have it placed all under one variable called 'Year'

[Academic paper on tidy data](https://vita.had.co.nz/papers/tidy-data.pdf)

In [4]:
df = df.melt(id_vars=['Country Name'], var_name='Year', value_name='GDP Per Capita')
df.head(30)  # better

Unnamed: 0,Country Name,Year,GDP Per Capita
0,Aruba,1960,
1,Africa Eastern and Southern,1960,147.507808
2,Afghanistan,1960,59.773234
3,Africa Western and Central,1960,107.932233
4,Angola,1960,
5,Albania,1960,
6,Andorra,1960,
7,Arab World,1960,
8,United Arab Emirates,1960,
9,Argentina,1960,


**Change year (strings) to a date column of dates**
- Will be assuming the years are representing the end of the calendar year

In [5]:
df['Year'] = df.Year.apply(lambda x: pd.to_datetime(f"31-12-{x}"))

**To capture the estimated trend and cyclical component, all values will need to be present in the time series**
- Filter only countries where all values from 1960-2020 are captured

In [None]:
for country in df['Country Name'].unique():
    particular_country = df[df['Country Name'] == country].copy()
    if particular_country['GDP Per Capita'].isna().sum() > 0:
        df.drop(particular_country.index, axis='index', inplace=True)

In [None]:
def apply_hp_filter(dataset, country, column='GDP Per Capita', lamb=100):
    """Applies the HP Filter to a particular country, plotting the results"""
    
    # Extract estimated trend and estimated cycle
    country = dataset.loc[dataset['Country Name'] == country].copy()
    country['Estimated Cycle'], country['Smoothed Trend Estimate HP Filter'] = hpfilter(country[column], lamb=lamb)
    
    fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))
    country[['GDP Per Capita', 'Smoothed Trend Estimate HP Filter']].plot(ax=ax1, color=['black', 'blue'])
    country['Estimated Cycle'].plot(ax=ax2, color='black')
    
    # Titles and labels etc.
    fig.suptitle('HP Filter', fontsize='xx-large')
    ax1.set(title='GDP Per capitas ($USD)', ylabel='GDP Per capitas ($USD)')
    ax2.set(title='Estimated Cyclical Component', ylabel='Estimated Cyclical Component');

In [None]:
apply_hp_filter(df, 'United Kingdom', column='GDP Per Capita', lamb=100)

In [None]:
apply_hp_filter(df, 'Canada', column='GDP Per Capita', lamb=100)

In [None]:
apply_hp_filter(df, 'Zimbabwe', column='GDP Per Capita', lamb=100)

In [None]:
df['Country Name'].unique()

**Take data up to 2020**

In [None]:
to_drop = aus.loc[aus.Year == 'Unnamed: 65'].index

# Drop the last value seen in the table above
aus.drop(to_drop, axis='index', inplace=True)

# Assuming that by year, they mean the end of the calendar year 
aus['Year'] = aus.Year.apply(lambda x: pd.to_datetime(f"31-12-{x}"))

# Set the index to the newly modified dates
aus.set_index('Year', inplace=True)

**Plot GDP per capita over time**

In [None]:
aus['GDP Per Capita'].plot(figsize=(10, 6), title='Australia GDP Per Capita (1960-2020)', color='black', ylabel='GDP ($USD)');

**Apply Hodrick Prescott Filter**

In [None]:
def apply_hp_filter(dataset, series='GDP Per Capita', lamb=100):
    """Applies the HP Filter to a particular country, plotting the results"""
    
    # Extract estimated trend and estimated cycle
    series_to_plot = dataset.loc[dataset['Country Name'] == country, series].copy()
    dataset['Estimated Cycle'], dataset['Smoothed Trend Estimate HP Filter'] = hpfilter(, lamb=lamb)
    
    fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))
    dataset[['GDP Per Capita', 'Smoothed Trend Estimate HP Filter']].plot(ax=ax1, color=['black', 'blue'])
    dataset['Estimated Cycle'].plot(ax=ax2, color='black')
    
    # Titles and labels etc.
    fig.suptitle('HP Filter', fontsize='xx-large')
    ax1.set(title='GDP Per capitas ($USD)', ylabel='GDP Per capitas ($USD)')
    ax2.set(title='Estimated Cyclical Component', ylabel='Estimated Cyclical Component');
    
    
    
    

In [None]:
df

In [None]:

aus['Estimated Cycle'], aus['Smoothed Trend Estimate HP Filter'] = hpfilter(aus['GDP Per Capita'], lamb=100) # lamb = 100 for yearly data

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

aus[['GDP Per Capita', 'Smoothed Trend Estimate HP Filter']].plot(ax=ax1, color=['black', 'blue'])
aus['Estimated Cycle'].plot(ax=ax2, color='black')

# Titles and labels etc.
fig.suptitle('HP Filter', fontsize='xx-large')
ax1.set(title='GDP Per capitas ($USD)', ylabel='GDP Per capitas ($USD)')
ax2.set(title='Estimated Cyclical Component', ylabel='Estimated Cyclical Component');