# BUG: Beveridgean Unemployment Gap
This notebook demonstrates the python implementation of the "Beveridgean Unemployment Gap" by Pascal Michaillat and Emmanuel Saez (M&S). The original code was in MATLAB. See [GitHub link](https://github.com/pascalmichaillat/unemployment-gap) for original. 

## Section 5: Unemployment gap in the United States, 1951–2019

## import packages

In [None]:
import pandas as pd
import numpy as np
import ruptures as rpt

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('fivethirtyeight')

In [None]:
import sys
sys.path.insert(0, '../')
import bug

## Read the data
Here, we read from the [excel file](https://github.com/pascalmichaillat/unemployment-gap/blob/main/code/data.xlsx) provided with the unemployment-gap matlab package.

The goal of this notebook is to re-create analysis and some figures from the Unemployment Gap paper, so that we can verify we are getting the *same* outputs. (*Sameness* allowing for some small differences between the two language implementations)

#### Recession information

In [None]:
df = pd.read_excel('../../code/data.xlsx', sheet_name='Recession dates', header=1, 
                   usecols=['Peak month', 'Trough month'],).drop([0]).reset_index() 
starts =  pd.to_datetime(df['Peak month'])
ends = pd.to_datetime(df['Trough month'])

#### unemployment rate

In [None]:
df = pd.read_excel('../../code/data.xlsx', sheet_name='Monthly data',
                           header=1, usecols=['Unemployment rate (percent)', 'Year', 'Month'],)
# set the index 
dates = pd.PeriodIndex(pd.to_datetime(dict(year=df.Year, month=df.Month, day=15)).dt.to_period('m') ) 
unempl_rate = pd.Series(data=df['Unemployment rate (percent)'].values,
                       index=dates, name='unempl_rate')

#### vacancy info
For 1951–2000, we use the vacancy proxy constructed by Barnichon (2010).

For 2001–2019, we use the number of job openings measured by the Bureau of
Labor Statistics (2020b) in the Job Opening and Labor Turnover Survey,
divided by the civilian labor force constructed by the Bureau of Labor
Statistics (2020a) from the Current Population Survey. 

We then splice
the two series to obtain a vacancy rate for 1951–2019 (Fig. 1(b)).

In [None]:
df = pd.read_excel('../../code/data.xlsx', sheet_name='Monthly data',
                           header=1, usecols=['Vacancy rate (thousands)', 'Year', 'Month'],)
# set the index 
dates = pd.PeriodIndex(pd.to_datetime(dict(year=df.Year, month=df.Month, day=15)).dt.to_period('m') ) 
vac_rate_proxy = pd.Series(data=df['Vacancy rate (thousands)'].values,
                       index=dates, name='vacancy_rate_proxy')

#### labor force level

In [None]:
df = pd.read_excel('../../code/data.xlsx', sheet_name='Monthly data',
                           header=1, usecols=['Labor force level (thousands of persons)', 'Year', 'Month'],)
# set the index 
dates = pd.PeriodIndex(pd.to_datetime(dict(year=df.Year, month=df.Month, day=15)).dt.to_period('m') ) 
labor_level = pd.Series(data=df['Labor force level (thousands of persons)'].values,
                       index=dates, name='labor_force_level')

#### vacancies

In [None]:
df = pd.read_excel('../../code/data.xlsx', sheet_name='Monthly data',
                           header=1, usecols=['Vacancy level (thousands)', 'Year', 'Month'],)
# set the index 
dates = pd.PeriodIndex(pd.to_datetime(dict(year=df.Year, month=df.Month, day=15)).dt.to_period('m') ) 
vacancy_level = pd.Series(data=df['Vacancy level (thousands)'].values,
                       index=dates, name='vacancy_level')

In [None]:
vacancy_rate_2001 = vacancy_level/labor_level
vacancy_rate_splice = pd.concat([vac_rate_proxy.loc[:'2000-12'], vacancy_rate_2001.loc['2001-01':]*100])

# Section 5.1 
## Beveridge Elasticity
From the M&S paper:

"We estimate the Beveridge elasticity in the United States by regressing log vacancy rate (from Fig. 1(b)) on log unemployment rate
(from Fig. 1(a)). The data are quarterly from 1951Q1 to 2019Q4, so
the sample contains 276 observations. Since the Beveridge curve shifts
at multiple points in time, we use the algorithm proposed by Bai and
Perron (1998, 2003) to estimate linear models with multiple structural
breaks."

In [None]:
# create the quarterly rates series
u_q = unempl_rate.resample('Q').mean()/100
u_q = u_q.loc[:u_q.last_valid_index()]

log_u_q = np.log(unempl_rate.resample('Q').mean()/100)
log_u_q = log_u_q.loc[:log_u_q.last_valid_index()]

In [None]:
# create the quarterly rates series
v_q = vacancy_rate_splice.resample('Q').mean()/100
v_q = v_q.loc[:v_q.last_valid_index()]

log_v_q = np.log(vacancy_rate_splice.resample('Q').mean()/100)
log_v_q = log_v_q.loc[:log_v_q.last_valid_index()]

In [None]:
fig = plt.figure(figsize = (7,7))
ax = fig.add_subplot(1, 1, 1)
ax.plot(log_u_q, log_v_q, linewidth=1,)

bug.format_plot(ax, xgrid=False)
plt.xlim(-3.8, -2.1)
plt.ylim(-4.3, -2.95)
plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (quarterly)', fontsize=14)

## structural breaks with Bai-Perron 

### Input parameters

The Matlab implemtation takes a parameter 'epsilon,' which then determines the parameter for maximum number fo breaks (m) in the series. M&S set epsilon=0.15 (which then sets m=5), since that was recommended by B&P. 

The python implementation in the *ruptures* packages takes a min\_size parameter, which we set to `min_size=41`, since this is a result of the epsilon parameter (epsilon\*series length). We also need to set `jump=1` to allow the breakpoints to happen at any point, and not at multiples of jump (where the ruptures default jump is 2).

We have set our default bug implementation to mimic the matlab, such that, if run on the same series of data, the breaks and coeffs returned should match the matlab.

In [None]:
# from the matlab file getBreakDate.m:
mat_breaks = [0, 41, 84, 153, 194, 235, 276]

# from getBeveridgeElasticity.m:
mat_coeffs = [0.8437, 1.0182, 0.8376, 0.9390, 0.9985, 0.8364]

### Get breakpoints
Call the function `bug.get_bp_breakpoints()` with arguments:
  * **log** unemployment rate, required
  * **log** vacancy rate, required
  * `use_bp_defaults` optional, default is True
  * `min_size`, optional, default is None.  A valid parameter (int) to be used with *ruptures* `rpt.Dynp` algorithm. Must be specified if `use_bp_defaults=False`.
  * `n_bkps`, optional, default is None. A valid parameter (int). Must be specified if `use_bp_defaults=False`.

Returns: the estimated breakpoints

In [None]:
est_bkps = bug.get_bp_breakpoints(log_u_q, log_v_q, use_bp_defaults=True,)

In [None]:
est_bkps

#### Visualize with the ruptures pakages
The black lines represent our python computed breakpoints. The pink/blue regions are the matlab computed breakpoints.

As you can see, they line up exactly

In [None]:
y = np.array(log_v_q)
X = np.array(log_u_q).T
signal = np.column_stack((y.reshape(-1, 1), X))
rpt.show.display(signal, mat_breaks, est_bkps, figsize=(10, 6))
plt.title("Log Vacancy (TOP); Log Unemployment (BOTTOM)")
plt.show()

#### Visualize like the M&S paper
# FIGURE 5 
### Beveridge-curve branches in the United States, 1951–2019.

In [None]:
bug.plot_beveridge_curve_segments(log_u_q, log_v_q, est_bkps)

## Get Elasticity as a series 
### (calls the breakpoint estimation internally)
Call the function `bug.compute_beveridge_elasticity()` with arguments:
  * **log** unemployment rate, required
  * **log** vacancy rate, required
  * list of breakpoints, optional, deafult is None. If None, to get the necessary breakpoints, the `bug.get_bp_breakpoints()` function will be called under the hood with option `use_bp_defaults=True`.

In [None]:
bev_e, python_coeffs = bug.compute_beveridge_elasticity(log_u_q, log_v_q)

# FIGURE  6 
## Beveridge elasticity in the United States, 1951–2019.

In [None]:
bug.plot_beveridge_elasticity_series(bev_e, recession_dates=[starts, ends],draw_legend=True)
plt.ylim(0, 1.5)

#### Check python estimated coeffs vs matlab

In [None]:
[round(c[0],4) for c in python_coeffs]

In [None]:
mat_coeffs

### Standard errors of the Elasticity estimate
Here is where our estimates using python will diverge from the results found with the matlab implementation. Why? Because of the choice of robust covaraince estimator.

We can find the exact same breakpoints as the matlab method, and from there we can fit a regression model to each sub-sequence. **We get the same fitted coefficients, why don't we get the same standard errors?** 

Any regression model requires an assumption on the nature of the errors. For time series like this, we want to allow for heteroskedasticy as well as autocorrelation in the error terms. The type of covaraince estimator we want is a 'HAC.'

Bai & Perron suggest using the HAC estimator from Andrews (1991), and this is the approach implemented in the matlab code. This was not the main focus of the work by M&S, so if they settled on using the B&P method for finding the breaks, then they simply also incorporated the B&P suggested defaults, which included Andrews HAC.



Like M&S, we want to be judicious in leveraging exisitng methods for parameter estimation. In the package we use for OLS regression (python `statsmodels`), the robust HAC method that is implemented is the Newey & West (1994) estimator. The Andrews HAC is not available. 

See [Cheung and Lai (1997)](https://people.ucsc.edu/~cheung/WorkingPapers/BandWidthSelectionPowerPPTest_ET1997.pdf) for nice discussion of N-W vs Andrews HAC. 

(TL/DR: Different kernels, different bandwidth estimation)

In [None]:
# from getBeveridgeElasticity.m:
mat_se = [0.066707, 0.068795, 0.11244, 0.14772, 0.057224, 0.056694];

In [None]:
# Our estimates
bev_e['SE'].unique()

### Confidence intervals of break points
Alas, this is one aspect that we have not been able to port into our python code. We hope to work on this in the future.

# Section 5.4
## Unemployment Gap

Call the function `bug.compute_efficient_tightness()` with arguments:
  * Beveridge elasticity, required
  * value of non-work (zeta), optional; default is zeta=0.26
  * recruitment cost (kappa), optional; default is kappa=0.92
  
Theta is labor market tightness

In [None]:
eff_mar_tightness = bug.compute_efficient_tightness(bev_e['E'])
theta = v_q/u_q

# FIGURE 7A
## Efficeient labor market tightness

In [None]:
ax = theta.plot(color='navy', linewidth=2, figsize=(10, 7), label='Actual')
eff_mar_tightness.plot(ax=ax,color='magenta', linewidth=2,label='Efficient')

plt.fill_between(theta.index, eff_mar_tightness, 
                 np.max((theta, eff_mar_tightness), axis=0),color='magenta', alpha=.2)
plt.fill_between(theta.index, eff_mar_tightness, 
                 np.min((theta, eff_mar_tightness), axis=0),color='navy', alpha=.2)

bug.format_plot(ax, recession_dates=[starts, ends], xgrid=True, 
                augment_legend=True, legend_loc=1)
plt.ylim(0, 1.6)
plt.ylabel('Labor Market Tightness', fontsize=12)
plt.title('Labor Market Tightness', fontsize=14)


# FIGURE 7B
## Efficient unemployment rate
Call the function `bug.compute_efficient_unemployment()` with arguments:
  * unemployment rate, required
  * vacancy rate, required
  * Beveridge elasticity, required
  * value of non-work (zeta), optional; default is zeta=0.26
  * recruitment cost (kappa), optional; default is kappa=0.92

In [None]:
eff_unempl = bug.compute_efficient_unemployment(u_q, v_q, bev_e['E'], zeta=0.26, kappa=0.92)

In [None]:
ax = u_q.plot(color='navy', linewidth=2, figsize=(10, 7), label='Actual')
eff_unempl.plot(ax=ax,color='magenta', linewidth=2,label='Efficient')

plt.fill_between(theta.index, eff_unempl, 
                 np.min((u_q, eff_unempl), axis=0),color='magenta', alpha=.2)
plt.fill_between(theta.index, eff_unempl, 
                 np.max((u_q, eff_unempl), axis=0),color='navy', alpha=.2)

bug.format_plot(ax, recession_dates=[starts, ends], xgrid=True, 
                augment_legend=True, legend_loc=4)
plt.ylim(0, .12)

plt.ylabel('Unemployment', fontsize=12)
plt.title('Unemployment', fontsize=14)

# FIGURE 7C
## Unemployment gap
Call the function `bug.compute_unemployment_gap()` with arguments:
  * unemployment rate, required
  * vacancy rate, required
  * Beveridge elasticity, required
  * value of non-work (zeta), optional; default is zeta=0.26
  * recruitment cost (kappa), optional; default is kappa=0.92

In [None]:
unepl_gap = bug.compute_unemployment_gap(u_q, v_q, bev_e['E'], zeta=0.26, kappa=0.92)

In [None]:
ax = unepl_gap.plot(color='navy', linewidth=2, figsize=(10, 7), label='unemployment gap')
plt.axhline(y=0, color='magenta', linewidth=2,)

plt.fill_between(unepl_gap.index, 0, [min(0,g) for g in unepl_gap],color='magenta', alpha=.2)
plt.fill_between(unepl_gap.index, 0, [max(0,g) for g in unepl_gap],color='navy', alpha=.2)

plt.ylim(-.02, .08)
bug.format_plot(ax, recession_dates=[starts, ends], xgrid=True, 
                augment_legend=True, legend_loc=2)
plt.ylabel('Unemployment Gap', fontsize=12)
plt.title('Unemployment Gap', fontsize=14)

# Sec 5.6: Other unemployment Gaps

In [None]:
nairu = pd.read_excel('../../code/data.xlsx', sheet_name='Quarterly data',
                           header=1, usecols=['NAIRU (percent)', 'Year', 'Quarter'],)
nairu['date'] = nairu['Year'].astype(str) +'-Q' + nairu['Quarter'].astype(str)
nairu['NAIRU (percent)'] = nairu['NAIRU (percent)']/100.
nairu['date'] = pd.PeriodIndex(nairu['date'], freq='Q').to_timestamp()
nairu = nairu.set_index('date')

In [None]:
natural = pd.read_excel('../../code/data.xlsx', sheet_name='Quarterly data',
                           header=1, usecols=['Natural rate of unemployment (percent)', 'Year', 'Quarter'],)
natural['date'] = natural['Year'].astype(str) +'-Q' + natural['Quarter'].astype(str)
natural['Natural rate of unemployment (percent)'] = natural['Natural rate of unemployment (percent)']/100.
natural['date'] = pd.PeriodIndex(natural['date'], freq='Q').to_timestamp()
natural = natural.set_index('date')


In [None]:
trend = pd.read_excel('../../code/data.xlsx', sheet_name='Quarterly data',
                           header=1, usecols=['Trend of unemployment rate (percent)', 'Year', 'Quarter'],)
trend['date'] = trend['Year'].astype(str) +'-Q' + trend['Quarter'].astype(str)
trend['Trend of unemployment rate (percent)'] = trend['Trend of unemployment rate (percent)']/100.
trend['date'] = pd.PeriodIndex(trend['date'], freq='Q').to_timestamp()
trend = trend.set_index('date')

# FIGURE 7D
## Alternative unemployment rates

In [None]:
ax = eff_unempl.plot(figsize=(10, 7),color='magenta', linewidth=2,label='Efficient')
nairu['NAIRU (percent)'].plot(color='darkorange', linewidth=2,label='NAIRU')
natural['Natural rate of unemployment (percent)'].plot(color='darkgreen', 
                                                       linestyle='dashed',linewidth=2,
                                                       label='Natural')
trend['Trend of unemployment rate (percent)'].plot(color='k', 
                                                   linestyle='dotted',linewidth=2.5,
                                                   label='Trend')

bug.format_plot(ax, recession_dates=[starts, ends], xgrid=True, 
                augment_legend=True, legend_loc=4)

plt.ylim(0, .1)
plt.title('Alternate Unemployment Rates', fontsize=14)
