# Further discussion on breakpoint detection
In this notebook, we explore in more detail the issue of detecting structural breaks in the regression of log(vacancy rate) on the log(unemployment rate).

These breakpoints are needed because the Beveridge Elasticity is computed directly from this regression. 

## monthly series?
The original paper uses quarterly series of log(vacancy rate) and log(unemployment rate). But why? These are actually derived from the monthly values.


## Get our economic data
For computing the BUG, we need:
  * unemployment rate: u
  * vacancy rate: v
  * beveridge curve elasticity (computed from u, v, and breakpoints on the v/u series)
  * social value of non-work (default is zeta = 0.26)
  * recruting costs (default is kappa = 0.92)

The vacancy rate is the most tricky to get. We can compute vacancy rate from the total non-farm job openings and the labor level. Both of those series are available from 2001 onwards. For pre-2001, we would need to splice-in the "composite help-wanted index (cHWI)" values from Barnichon (2010). However, we prefer to avoid this, as the cHWI is not as easilly obtained as the other economic data.

This is another reason to consider the monthly Beveridge Gap, so that we have enough data points (vs a quarterly series) starting in 2001.

<br>

### Data source: 

![image.png](https://fred.stlouisfed.org/images/fred-logo-2x.png)
<br>

The St. Louis Fed has an [API](https://fred.stlouisfed.org/docs/api/fred/series_observations.html) which allows you to pull data programatically.
You can do this yourself with a registered API key. (See [here](https://fred.stlouisfed.org/docs/api/api_key.html) for info.)

*Or, we can use the handy FredReader class from the [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/index.html) package!*

In [None]:
import pandas as pd
import numpy as np
import ruptures as rpt
from kneed import KneeLocator
from pandas_datareader.fred import FredReader
default_start_date = '2001-01-01'

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('fivethirtyeight')

In [None]:
import sys
sys.path.insert(0, '../')
import bug

In [None]:
# Recession information
recession = FredReader('USREC', start=default_start_date).read()
recession['starts'] = (recession.USREC- recession.USREC.shift(1) ==1)
recession['ends'] = (recession.USREC- recession.USREC.shift(1) ==-1)
starts = recession.index[recession['starts']==1].to_list()
ends = recession.index[recession['ends']==1].to_list()

In [None]:
u = FredReader('UNRATE',start=default_start_date ).read()/100.0
u = u.squeeze()
u.index.freq = u.index.inferred_freq

In [None]:
labor_lev = FredReader('CLF16OV', start=default_start_date).read().astype(float)

In [None]:
nf_vac = FredReader('JTSJOL', start=default_start_date).read().astype(float)
v = nf_vac.JTSJOL/labor_lev.CLF16OV
v.index.freq = v.index.inferred_freq

In [None]:
last = min(v.last_valid_index(),u.last_valid_index())
v = v.loc[:last]
u = u.loc[:last]

u.index = u.index.to_period('M')
v.index = v.index.to_period('M')

log_v = np.log(v)
log_u = np.log(u)


### Beveridge Curve

In [None]:
fig = plt.figure(figsize = (7,7))
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.log(u), np.log(v), linewidth=1, color='darkblue')
bug.format_plot(ax, xgrid=False)

plt.ylabel('Log Vacancy Rate', fontsize=12)
plt.xlabel('Log Unemployment Rate', fontsize=12)
plt.title('Beveridge Curve (monthly)', fontsize=14)

### How to decide the number of breaks
Other than just eyeballing?

Bai & Perron (2003) suggest supF type tests; e.g. for no break (*m=0*) vs *m=k* breaks. Further, this can be extended to test of *m* breaks vs *m+1* breaks. The possibility of examining BIC or other information criteria is also discussed. 

The ruptures package does not include hypothesis testing for determining the number of breaks--so we'll have to write some tests for ourselves. 

#### Evaluating the BIC/SSR plots
Find the the "elbow" or "knee" where the curve starts to show diminishing returns.

We can eyeball it, or try the "kneed" package detector

In [None]:
# get the data into the format that the ruptures package likes
signal = np.column_stack((np.array(log_v).reshape(-1, 1), np.vstack((log_u, np.ones(len(log_u)))).T))

In [None]:
Eval = bug.evaluate_num_breaks(signal, max_bkps=8, min_size=16)

In [None]:
Eval.f_stats_zero_v_m

In [None]:
k = KneeLocator(np.arange(1,Eval.max_bkps + 1), [x['F'] for x in Eval.f_stats_zero_v_m[1:]], curve="concave", direction="increasing")
ax = k.plot_knee()
plt.title('F-test: null=0 breaks, alt=m breaks')

In [None]:
Eval.f_stats_running

In [None]:
k = KneeLocator(np.arange(1,Eval.max_bkps +1), [x['F'] for x in Eval.f_stats_running[1:]], curve="concave", direction="increasing")
ax = k.plot_knee()
plt.title('F-test: null=m-1 breaks, alt=m breaks')

In [None]:
k = KneeLocator(np.arange(0,Eval.max_bkps + 1), Eval.ssr, curve="convex", direction="decreasing")
ax = k.plot_knee()
plt.title('SSR')

In [None]:
k = KneeLocator(np.arange(0,Eval.max_bkps + 1), Eval.bic, curve="convex", direction="decreasing")
ax = k.plot_knee()
plt.title('BIC')

In [None]:
k = KneeLocator(np.arange(0,Eval.max_bkps + 1), Eval.lwz, curve="convex", direction="decreasing")
ax = k.plot_knee()
plt.title('LWZ')

### The consensus seems to be 2 breakpoints!!

In [None]:
opt = Eval.bkps[k.knee]
opt

In [None]:
e_opt, coeffs = bug.compute_beveridge_elasticity(log_u, log_v, bkps_in=opt )
gap_opt = bug.compute_unemployment_gap(u, v, e_opt['E'])

In [None]:
bug.plot_beveridge_gap_series(gap_opt, [u.index[b] for b in opt[:-1]], 
                              recession_dates=[starts, ends], )
plt.ylim(-.03,.13)

In [None]:
bug.plot_beveridge_elasticity_series(e_opt, recession_dates=[starts, ends], draw_legend=True)
plt.ylim(0,2)

In [None]:
bug.plot_beveridge_curve_fits(log_u, log_v, opt, coeffs, figsize=(7,7))