# BUG: Beveridgean Unemployment Gap
This series of notebooks demonstrates the python implementation of the "Beveridgean Unemployment Gap" by Pascal Michaillat and Emmanuel Saez (M&S). The original code was in MATLAB. See [GitHub link](https://github.com/pascalmichaillat/unemployment-gap) for original. 

## Using latest data
Here we show how to pull the latest economic data and compute the BUG.

### import packages

In [None]:
import pandas as pd
import numpy as np

In [None]:
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline
matplotlib.style.use('fivethirtyeight')

In [None]:
import sys
sys.path.insert(0, '../')
import bug

## Get the data
For computing the BUG, we need:
  * unemployment rate: u
  * vacancy rate: v
  * beveridge curve elasticity (computed from u, v, and breakpoints on the v/u series)
  * social value of non-work (default is zeta = 0.26)
  * recruting costs (default is kappa = 0.92)
  
For context in the plots, we also want recession information.
<br>

### Data source: 

![image.png](https://fred.stlouisfed.org/images/fred-logo-2x.png)
<br>

The St. Louis Fed has an [API](https://fred.stlouisfed.org/docs/api/fred/series_observations.html) which allows you to pull data programatically.
You can do this yourself with a registered API key. (See [here](https://fred.stlouisfed.org/docs/api/api_key.html) for info.)

*Or, we can use the handy FredReader class from the [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/index.html) package!*

In [None]:
from pandas_datareader.fred import FredReader
default_start_date = '1951-01-01'

### Recession information

In [None]:
recession = FredReader('USREC', start=default_start_date).read()
recession['starts'] = (recession.USREC- recession.USREC.shift(1) ==1)
recession['ends'] = (recession.USREC- recession.USREC.shift(1) ==-1)

In [None]:
starts = recession.index[recession['starts']==1].to_list()
ends = recession.index[recession['ends']==1].to_list()

### unemployment rate

In [None]:
u = FredReader('UNRATE',start=default_start_date ).read()/100.0
u = u.squeeze()
u.head()

In [None]:
ax = u.plot(figsize=(8,5), linewidth=2, color='darkred')
bug.format_plot(ax, recession_dates=[starts, ends], augment_legend=True, legend_loc=2)

plt.ylim(0,.16)
plt.title('Monthly Unemployment Rate')

### vacancy info
For 1951–2000, we use the vacancy proxy "Composite Help-Wanted index" constructed by Barnichon (2010).

For 2001--, we use the number of job openings measured by the BLS in the Job Opening and Labor Turnover Survey (JOLTS), divided by the civilian labor force from the Current Population Survey (CPS). 

We then splice the two series to obtain a vacancy rate.

#### help-wanted index
Ok, so the author has the HWI in a google drive document. Yes, we could download it with python, but the Google Drive API is a bit over-kill (and not really worth the learning curve) if this is the only thing we are using it for.

So, we suggest downloading a copy of this file, and reading it from disk.

For reference, the file link is: [HWI_index.txt](https://drive.google.com/file/d/1s9yGoAt6wfpKaBGkP7xV7Hvs7RVV9deS/view)

Or, if you don't ned data before 2001, you can skip this step, as data for vacancy rate after 2001 is available from FRED.

In [None]:
hwi = pd.read_csv("../new_data/HWI_index.txt", skiprows=6, header=None,delim_whitespace=True)
hwi['date'] = pd.to_datetime(hwi[0].str[:4]+'-'+hwi[0].str[-2:]+'-01' )
vac_proxy = pd.Series(data=pd.to_numeric(hwi[1].values),index=hwi['date'], name='help-wanted index') 
vac_proxy.index.freq = vac_proxy.index.inferred_freq

#### labor force level

In [None]:
labor_lev = FredReader('CLF16OV', start=default_start_date).read().astype(float)

#### vacancies

In [None]:
nf_vac = FredReader('JTSJOL', start=default_start_date).read().astype(float)

In [None]:
vac_rate = nf_vac.JTSJOL/labor_lev.CLF16OV
vac_rate.index.freq = vac_rate.index.inferred_freq

Here we splice the series together.

In [None]:
v = pd.concat([vac_proxy.loc[:'2000-12-01']/100., vac_rate.loc['2001-01-01':]],)

In [None]:
ax = v.plot(figsize=(8,5), linewidth=2, color='darkgreen', label='vacancy')
bug.format_plot(ax, recession_dates=[starts, ends], augment_legend=True, legend_loc=2)

plt.ylim(0,.08)
plt.title('Monthly Vacancy Rate')

### Beveridge Curve

In [None]:
fig = plt.figure(figsize = (7,7))
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.log(u), np.log(v), linewidth=1, color='navy')
bug.format_plot(ax, xgrid=False)

plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (monthly)', fontsize=14)

In [None]:
u_q = u.resample('Q').mean()
v_q = v.resample('Q').mean()

u_q.index = u_q.index.to_period('Q')
v_q.index = v_q.index.to_period('Q')

In [None]:
log_u_q = np.log(u_q)
log_v_q = np.log(v_q)

In [None]:
fig = plt.figure(figsize = (7,7))
ax = fig.add_subplot(1, 1, 1)
ax.plot(log_u_q, log_v_q, linewidth=1,color='grey')

plt.plot(log_u_q.loc['2020Q1':],log_v_q.loc['2020Q1':], 
         linewidth=2, color='darkred', alpha=.7, marker='o',)

plt.annotate('2020Q1', (log_u_q.loc['2020Q1'], log_v_q.loc['2020Q1']),)
plt.annotate('2020Q2', (log_u_q.loc['2020Q2'], log_v_q.loc['2020Q2']))
plt.annotate('2021Q1', (log_u_q.loc['2021Q1'], log_v_q.loc['2021Q1']))
plt.annotate('2022Q1', (log_u_q.loc['2022Q1'], log_v_q.loc['2022Q1']))
plt.annotate('2023Q1', (log_u_q.loc['2023Q1'], log_v_q.loc['2023Q1']))


bug.format_plot(ax, xgrid=False)

plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (quarterly)', fontsize=15)
_=plt.suptitle('Highlighting COVID effects')

## Beverige Elasticity
### finding the v/u breakpoints
#### Bai-Perron suggested parameterization
The breakpoints in the original M&S paper were calculated on the series of log(vacancy) and log(unemployment) rates from 1951Q1 to 2019Q4 (length=276).  

In implementing B-P, M&S set the value of the trimming parameter to 0.15, which then determines the minimun length for detected sub-sequences: floor(0.15×276)=41. 

Furthermore, setting this trimming parameter at 0.15 sets the max number of breaks at 5, as stated in B&P(2003, page 14).

  * Then resulting values were: [0, 41, 84, 153, 194, 235, 276]; 
  * Corresponding to dates: [1951Q1, 1961Q2, 1972Q1, 1989Q2, 1999Q3, 2009Q4, 2019Q4]. 

By convention the first value and last values of the series are also listed as breakpoints. So in this case, we had 5 *internal* breakpoints.

### The new data
OK, so what happens when we use the default B-P parameter values on our longer series 1951Q1 to the present quarter, FY2023 or later ?

#### Can't find more than 5 internal breakpoints
Which means the post-COVID era gets lumped in with the curve starting around 2009--2011 (depends on the dat ayou are fitting), and that seems _wrong_.

In [None]:
bkps_default = bug.get_bp_breakpoints(log_u_q, log_v_q, use_bp_defaults=True)
bkps_default

In [None]:
bug.plot_beveridge_curve_segments(log_u_q, log_v_q, bkps_default,)

### Compute the Bev elasticity given these breakpoints

In [None]:
e, _ = bug.compute_beveridge_elasticity(log_u_q, log_v_q, bkps_in=bkps_default)

In [None]:
bug.plot_beveridge_elasticity_series(e, recession_dates=[starts,ends], draw_legend=True)
plt.ylim(0,2.2)

#### Discussion
So we see in the graph above, that the last period, from 2011Q3 to the end-point has a really wide confidence interval. (In fact, the average std. error across all segments is 0.097.) This is because the model breakpoints are likely mis-specified. 

We *KNOW* there was a huge shock to the US (and world-wide) economy due to COVID at the 2020Q2 period. It's really not reasonable to say that the time (2011Q3 to 2020Q1) and (2020Q2 to present quarter) are the *SAME* regime. 

The ONLY reason those 2 periods end up together is because of the default B-P parameterization, which does not allow a sequence short enough to distinguish the post-COVID time period.

### Re-parameterize
We are going to set the min sequence length as 10, and the number of breakpoints as 6, and see if this gives us Beveridge elasticity estimates with smaller CI.

Bascially, we are guessing the B-P algorithm will identify the prior 5 breakpoints in the 1951-2019 span, and allow for the extra COVID period. 

In [None]:
bkps_new = bug.get_bp_breakpoints(log_u_q, log_v_q, use_bp_defaults=False, n_bkps=6, min_size=10)
bkps_new

In [None]:
bug.plot_beveridge_curve_segments(log_u_q, log_v_q, bkps_new,)

In [None]:
e_new, coeffs = bug.compute_beveridge_elasticity(log_u_q, log_v_q, bkps_in=bkps_new)

In [None]:
bug.plot_beveridge_elasticity_series(e_new, recession_dates=[starts,ends], draw_legend=True)
plt.ylim(0,2.2)

## Beveridge unemployment gap: BUG

In [None]:
gap = bug.compute_unemployment_gap(u_q, v_q, e_new.E)

In [None]:
bug.plot_beveridge_gap_series(gap, internal_bkps=[u_q.index[b] for b in bkps_new[:-1]], 
                              recession_dates=[starts, ends], ) 
plt.ylim(-.02,.12)

In [None]:
bug.plot_beveridge_curve_fits(log_u_q, log_v_q, bkps_new, coeffs, figsize=(7,7))

### Note
The main goal of this notebook was simply to show how to compute the BUG with new data.

We will return to the issue of breakpoint estimation in more depth in the next notebook.