# BUG: Beveridgean Unemployment Gap
This series of notebooks demonstrates the python implementation of the "Beveridgean Unemployment Gap" by Pascal Michaillat and Emmanuel Saez (M&S). The original code was in MATLAB. See [GitHub link](https://github.com/pascalmichaillat/unemployment-gap) for original. 

## Using latest data
Here we show how to pull the latest economic data and compute the BUG.

### import packages

In [None]:
import pandas as pd
import numpy as np
import urllib.request
import json

In [None]:
import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline
matplotlib.style.use('fivethirtyeight')

In [None]:
import sys
sys.path.insert(0, '../bug')
import bug

## Get the data
For computing the BUG, we need:
  * unemployment rate: u
  * vacancy rate: v
  * beveridge curve elasticity (computed from u, v, and breakpoints on the v/u series)
  * social value of non-work (default is zeta = 0.26)
  * recruting costs (default is kappa = 0.92)
  
For context in the plots, we also want recession information.
<br>

### Data source: 

![image.png](https://fred.stlouisfed.org/images/fred-logo-2x.png)
<br>

The St.Louis Fed has an [API](https://fred.stlouisfed.org/docs/api/fred/series_observations.html) which alows you to pull data programatically.

**You will need a registered API key**. See [here](https://fred.stlouisfed.org/docs/api/api_key.html) for info.

In [None]:
# a helper function for pulling json formatted data from FRED
def get_series(series_id, my_key, start_date='1951-01-01'):
    
    link = 'https://api.stlouisfed.org/fred/series/observations?series_id='+series_id+'&observation_start='+start_date+'&file_type=json&api_key='+my_key
    content = urllib.request.urlopen(link).read()
    data = json.loads(content)
    df = pd.DataFrame(data['observations'])[['date', 'value']]
    
    return df
 

In [None]:
## This key is an EXAMPLE ONLY!! 
# You will need to replace this with your actual registered key for the FRED API
my_key = 'abcdef1234567890abcdef'

### Recession information

In [None]:
recession = get_series('USREC', my_key)
recession['date'] = pd.to_datetime(recession['date'] )
recession['value'] = recession['value'].astype(int)
recession.set_index('date', inplace=True)
recession['starts'] = (recession.value- recession.value.shift(1) ==1)
recession['ends'] = (recession.value- recession.value.shift(1) ==-1)

In [None]:
starts = recession.index[recession['starts']==1].to_list()
ends = recession.index[recession['ends']==1].to_list()

In [None]:
ax = recession.plot(color='grey',  linewidth=1, alpha=.6, figsize=(9,6),legend=False )
plt.fill_between(recession.index, 0, recession.value, color='grey', alpha=.6, zorder=-100)

ax.grid(axis='x')
plt.ylim(0,1)
plt.title('NBER based Recession Indicators for the United States')
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')

### unemployment rate

In [None]:
unempl = get_series('UNRATE', my_key)
unempl['date'] = pd.PeriodIndex(pd.to_datetime(unempl['date'] ).dt.to_period('m') )
u = pd.Series(data=unempl['value'].values,index=unempl['date'], name='unempl_rate')
u = pd.to_numeric(u)/100.0

In [None]:
u_q = u.resample('Q').mean()

In [None]:
ax = u.plot(figsize=(9,6), linewidth=2, color='darkred', label='unemployment')

for idx, s in enumerate(starts):
    plt.axvspan(starts[idx], ends[idx], facecolor='grey', alpha=0.6,zorder=-100)


plt.legend()
ax.grid(axis='x')
plt.ylim(0,.18)
plt.title('Monthly Unemployment Rate')
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')

### vacancy info
For 1951–2000, we use the vacancy proxy "Composite Help-Wanted index" constructed by Barnichon (2010).

For 2001--, we use the number of job openings measured by the BLS in the Job Opening and Labor Turnover Survey (JOLTS), divided by the civilian labor force from the Current Population Survey (CPS). 

We then splice the two series to obtain a vacancy rate.

#### help-wanted index
Ok, so the author has the HWI in a google drive document. Yes, we could download it with python, but the Google Drive API is a bit over-kill (and not really worth the learning curve) if this is the only thing we are using it for.

So, we suggest downloading a copy of this file, and reading it from disk.

For reference, the file link is: [HWI_index.txt](https://drive.google.com/file/d/1s9yGoAt6wfpKaBGkP7xV7Hvs7RVV9deS/view)

Or, if you don't ned data before 2001, you can skip this step, as data for vacancy rate after 2001 is available from FRED.

In [None]:
hwi = pd.read_csv("../new_data/HWI_index.txt", skiprows=6, header=None,delim_whitespace=True)
hwi['date'] = pd.PeriodIndex(pd.to_datetime(hwi[0].str[:4]+'-'+hwi[0].str[-2:] ).dt.to_period('m') )
vac_proxy = pd.Series(data=pd.to_numeric(hwi[1].values),index=hwi['date'], name='help-wanted index') 

#### labor force level

In [None]:
labor_lev = get_series('CLF16OV', my_key)
labor_lev['date'] = pd.PeriodIndex(pd.to_datetime(labor_lev['date'] ).dt.to_period('m') )
lfl = pd.Series(data=pd.to_numeric(labor_lev['value'].values),index=labor_lev['date'], name='labor_force_level')

#### vacancies

In [None]:
nf_vac = get_series('JTSJOL', my_key)
nf_vac['date'] = pd.PeriodIndex(pd.to_datetime(nf_vac['date'] ).dt.to_period('m') )
vac = pd.Series(data=pd.to_numeric(nf_vac['value'].values),index=nf_vac['date'], name='nonfarm_vacancies')

In [None]:
vac_rate = vac/lfl

Here we splice the series together.

In [None]:
v = pd.concat([vac_proxy.loc[:'2000-12']/100., vac_rate.loc['2001-01':]],)
v_q = v.resample('Q').mean()

In [None]:
ax = v.plot(figsize=(9,6), linewidth=2, color='darkgreen', label='vacancy')

for idx, s in enumerate(starts):
    plt.axvspan(starts[idx], ends[idx], facecolor='grey', alpha=0.6,zorder=-100)

plt.legend()
ax.grid(axis='x')
plt.ylim(0,.09)
plt.title('Monthly Vacancy Rate')
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')

### Beveridge Curve

In [None]:
plt.figure(figsize = (8,8))
plt.plot(np.log(u), np.log(v), linewidth=1, color='darkblue')

plt.gca().spines["bottom"].set_linewidth(1.5)
plt.gca().spines["bottom"].set_color('k')
plt.gca().spines["left"].set_linewidth(1.5)
plt.gca().spines["left"].set_color('k')

plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (monthly)', fontsize=14)

In [None]:
log_u_q = np.log(u.resample('Q').mean())
log_v_q = np.log(v.resample('Q').mean())

In [None]:
plt.figure(figsize = (8,8))
plt.plot(log_u_q, log_v_q, linewidth=1,color='darkblue')

plt.plot(log_u_q.loc['2020Q1':'2022Q1'],log_v_q.loc['2020Q1':'2022Q1'], 
         linewidth=3, color='darkcyan', alpha=.5)

plt.annotate('2020Q1', (log_u_q.loc['2020Q1'], log_v_q.loc['2020Q1']))
plt.annotate('2022Q1', (log_u_q.loc['2022Q1'], log_v_q.loc['2022Q1']))


plt.gca().spines["bottom"].set_linewidth(1.5)
plt.gca().spines["bottom"].set_color('k')
plt.gca().spines["left"].set_linewidth(1.5)
plt.gca().spines["left"].set_color('k')

plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (quarterly)', fontsize=15)
plt.suptitle('Highlighting COVID effects')

## Beverige Elasticity
### finding the v/u breakpoints
#### Bai-Perron suggested parameterization
The breakpoints in the original M&S paper were calculated on the series of log(vacancy) and log(unemployment) rates from 1951Q1 to 2019Q4 (length=276).  

In implementing B-P, M&S set the value of the trimming parameter to 0.15, which then determines the minimun length for detected sub-sequences: floor(0.15×276)=41. 

Furthermore, setting this trimming parameter at 0.15 sets the max number of breaks at 5, as stated in B&P(2003, page 14).

  * Then resulting values were: [0, 41, 84, 153, 194, 235, 276]; 
  * Corresponding to dates: [1951Q1, 1961Q2, 1972Q1, 1989Q2, 1999Q3, 2009Q4, 2019Q4]. 

By convention the first value and last values of the series are also listed as breakpoints. So in this case, we had 5 *internal* breakpoints.

### The new data
OK, so what happens when we use the default B-P parameter values on our longer series 1951Q1 to 2022Q1 (length=285)?

In [None]:
bkps_default = bug.get_bp_breakpoints(log_u_q, log_v_q, use_bp_defaults=True)
bkps_default

In [None]:
for idx, b in enumerate(bkps_default[:-1]):

    plt.figure(figsize = (6,6))
    plt.plot(log_u_q, log_v_q, linewidth=1, color='grey')
    plt.plot(log_u_q.iloc[bkps_default[idx]:bkps_default[idx+1]],log_v_q.iloc[bkps_default[idx]:bkps_default[idx+1]], 
             linewidth=3, color='teal')
    
    plt.annotate(str(log_u_q.index[bkps_default[idx]]), (log_u_q.iloc[bkps_default[idx]], log_v_q.iloc[bkps_default[idx]]) )
    plt.annotate(str(log_u_q.index[bkps_default[idx+1]-1]), (log_u_q.iloc[bkps_default[idx+1]-1], log_v_q.iloc[bkps_default[idx+1]-1] ) )

    plt.gca().spines["bottom"].set_linewidth(1.5)
    plt.gca().spines["bottom"].set_color('k')
    plt.gca().spines["left"].set_linewidth(1.5)
    plt.gca().spines["left"].set_color('k')
    plt.ylabel('Log Vacancy Rate', fontsize=12)
    plt.xlabel('Log Unemployment Rate', fontsize=12)
    plt.title('Beveridge Curve (quarterly)', fontsize=14)

### Compute the Bev elasticity given these breakpoints

In [None]:
e = bug.compute_beveridge_elasticity(u, v, bkps_in=bkps_default)

In [None]:
ax= e['E'].plot(color='blueviolet', linewidth=2, figsize=(9, 6))
e[['LB', 'UB']].plot(ax=ax, color='blueviolet', linewidth=2,linestyle='dotted',)
plt.fill_between(e.index, e['UB'], e['LB'], color='blueviolet', alpha=.3)


for idx, s in enumerate(starts):
    plt.axvspan(starts[idx], ends[idx], facecolor='grey', alpha=0.6,zorder=-100)

ax.grid(axis='x')
plt.ylim(0,2.2)
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')
plt.ylabel('Beveridge Elasticity', fontsize=12)
plt.title('Beveridge Elasticity', fontsize=14)


In [None]:
e.SE.unique().mean()

#### Discussion
So we see in the graph above, that the last period, from 2011Q3 to 2022Q1 has a really wide confidence interval. (In fact, the average std. error across all segments is 0.097.) This is because the model breakpoints are likely mis-specified. 

We *KNOW* there was a huge shock to the US (and world-wide) economy due to COVID at the 2020Q2 period. It's really not reasonable to say that the time (2011Q3 to 2020Q1) and (2020Q2 to 2022Q1) are the *SAME* regime. 

The ONLY reason those 2 periods end up together is because of the default B-P parameterization, which does not allow a sequence as short as 8 time periods (the current post-COVID period).

### Re-parameterize
We are going to set the min sequence length as 8, and the number of breakpoints as 6, and see if this gives us Beveridge elasticity estimates with smaller CI.

Bascially, we are guessing the B-P algorithm will identify the prior 5 breakpoints in the 1951-2019 span, and allow for the extra COVID period. 

In [None]:
bkps_new = bug.get_bp_breakpoints(log_u_q, log_v_q, use_bp_defaults=False,n_bkps=6, min_size=8)
bkps_new

In [None]:
for idx, b in enumerate(bkps_new[:-1]):

    plt.figure(figsize = (6,6))
    plt.plot(log_u_q, log_v_q, linewidth=1, color='grey')
    plt.plot(log_u_q.iloc[bkps_new[idx]:bkps_new[idx+1]],log_v_q.iloc[bkps_new[idx]:bkps_new[idx+1]], 
             linewidth=3, color='teal')
    
    plt.annotate(str(log_u_q.index[bkps_new[idx]]), (log_u_q.iloc[bkps_new[idx]], log_v_q.iloc[bkps_new[idx]]) )
    plt.annotate(str(log_u_q.index[bkps_new[idx+1]-1]), (log_u_q.iloc[bkps_new[idx+1]-1], log_v_q.iloc[bkps_new[idx+1]-1] ) )

    plt.gca().spines["bottom"].set_linewidth(1.5)
    plt.gca().spines["bottom"].set_color('k')
    plt.gca().spines["left"].set_linewidth(1.5)
    plt.gca().spines["left"].set_color('k')
    plt.ylabel('Log Vacancy Rate', fontsize=12)
    plt.xlabel('Log Unemployment Rate', fontsize=12)
    plt.title('Beveridge Curve (quarterly)', fontsize=14)

In [None]:
e_new = bug.compute_beveridge_elasticity(u, v, bkps_in=bkps_new)

In [None]:
ax= e_new['E'].plot(color='blueviolet', linewidth=2, figsize=(9, 6))
e_new[['LB', 'UB']].plot(ax=ax, color='blueviolet', linewidth=2,linestyle='dotted',)
plt.legend()

for idx, s in enumerate(starts):
    plt.axvspan(starts[idx], ends[idx], facecolor='grey', alpha=0.6,zorder=-100)

plt.fill_between(e_new.index, e_new['UB'], e_new['LB'], color='blueviolet', alpha=.3)

ax.grid(axis='x')
plt.ylim(0,2.2)
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')
plt.ylabel('Beveridge Elasticity', fontsize=12)
plt.title('Beveridge Elasticity', fontsize=14)

In [None]:
e_new.SE.unique().mean()

Notice the smaller confidence intervals. In fact the average std error across *ALL* segments dropped from 0.097 to 0.061!

## Beveridge unemployment gap: BUG

In [None]:
gap = bug.compute_unemployment_gap(u_q, v_q, e_new['E'])

In [None]:
ax = gap.plot(color='navy', linewidth=2, figsize=(9, 6), label='unemployment gap')
plt.axhline(y=0, color='magenta', linewidth=2,)
plt.legend()

for idx, s in enumerate(starts):
    plt.axvspan(starts[idx], ends[idx], facecolor='grey', alpha=0.6,zorder=-100)

ax.grid(axis='x')
plt.ylim(-.02,.1)
ax.spines["bottom"].set_linewidth(1.5)
ax.spines["bottom"].set_color('k')
ax.spines["left"].set_linewidth(1.5)
ax.spines["left"].set_color('k')
plt.ylabel('Unemployment Gap', fontsize=12)
plt.title('Unemployment Gap', fontsize=14)

### Note
The main goal of this notebook was simply to show how to compute the BUG with new data.

We will return to the issue of breakpoint estimation in another notebook.