# Examining the Choice of Kernel

This notebook examines how the choice of kernel affects a Gaussian Process (GP) forecast of LFPR. By specifying the kernel, one can implicitly determine the structure of the function that the GP forecasts. Previous efforts to forecast LFPR have imposed particular structures *a priori* without examining how this structure affects forecast performance. I consider a variety of different potential structures based on those employed in prior efforts to forecast LFPR and plot the resulting forecasts here. 

## Gaussian Processes for Forecasting

We start by treating the LFPR $y$ within an age-gender-year cell as a noisy observation from an unknown function of covariates
$$y = f(X) + \epsilon$$
where the noise is normally distributed, $\epsilon ~ N(0, \sigma ^ 2)$. The unknown function $f$ is assumed to follow a Gaussian Process with a covariance kernel $K(X,X^{'})$. 

The choice of covariance kernel implicitly imposes structure on $f$. As an example, some kernels are stationary, which implies that these kernels will be mean-reverting, while other kernels are non-stationary. Some kernels, like the squared-exponential kernel, imply infinitely-differentiable functions, while others imply finitely-differentiable or non-differentiable functions. Further, since the sums and products of covariance kernels are covariance kernels themselves, complicated structure can be imposed by composing an overall kernel for the function $f$ from many individual kernels. 

As an example of how to impose structure through kernels, take the analysis of recent trends in LFPR contained in Krueger (2017). He estimates linear trends from 2000-2007 for sixteen demographic groups and projects these trends out through the Great Recession to examine how much of the post-2007 decline could be explained by continuation of these pre-existing trends. This method imposes the structure that 1) trends in LFPR over time are linear, and 2) similar ages are likely to have similar LFPR (and hence similar trends). This structure can be captured by the product of a linear kernel over the time dimension with a squared-exponential kernel over the age dimension. Functions drawn from a prior with this composite covariance kernel inherit this structure. 

Expressing the structure of $f$ through the kernels is beneficial for comparative, expositional, and computational reasons. In addition to Krueger (2017), many statistical models of LFPR can be expressed as a composition of kernels, including the commonly used age, gender, and cohort fixed effects approach. This allows all models to be compared within a unified framework and makes it easier to understand the differences between models. This also eases computation, as many different models can be estimated on the same dataset just by changing the covariance kernel. 

Our method is not the first to approach this type of forecasting problem. Forecasting LFPR is conceptually similar to forecasting the yield curve, as both are high-dimensional but contain a clear structure evident in the data. A long literature has developed methods to forecast the yield curve in a stationary environment. Some methods impose particular functional forms justified by economic reasoning to reduce the dimension of the forecast, as in the Dynamic Nelson-Siegel methods developed in Diebold & Li (2006) and Diebold et al (2006). Other approaches allow for general functional forms, as in the Functional Autoregression method employed by Kowal et al (2017). We follow in the spirit of the latter approaches by allowing for a wide variety of functional forms. However, we depart from this literature by examining a nonstationary process, LFPR, for which previous methods cannot be applied. 

## Comparison of Kernels

I examine the performance of seven different possible kernel structures. Each kernel structure may contain the sum or product of other kernels. I employ only two types of base kernels, Linear and Squared-Exponential (SE). Linear kernels are non-stationary while SE kernels are stationary, their product is non-stationary. Refer to [this page](https://en.wikipedia.org/wiki/Gaussian_process#Usual_covariance_functions) for the exact formula of each kernel.

1. $SE(age,gender)+Linear(year)*SE(age,gender)$ - This kernel estimates linear trends that vary across age/gender, similar to Krueger (2017). 

2. $SE(age,gender)+Linear(year)*SE(year)*SE(age,gender)$ - This kernel estimates linear trends across age/gender, but allows these trends to change slowly over time. Since these kernels will be fit to data from 1976-1999, this may provide a better approximation to Krueger (2017) by extrapolating only the recent trend at each age/gender group. 

3. $SE(age,gender)+Linear(cohort)*SE(cohort)*SE(age,gender)$ - This is conceptually similar to the above, but forecasts trends by cohort rather than time. This is likely to be close to #2, since cohort is a linear function of age and year, but not identical since the SE kernel is nonlinear, which may induce a slightly different structure. 

4. $SE(age,gender)+Linear(year)*SE(year)*SE(age,gender)+SE(cohort)$ - This adds a stationary cohort effect to kernel #2, which may capture additional variation in cohort-specific participation patterns. 

5. $SE(age,gender)+Linear(cohort)*SE(cohort)$ - This kernel approximates the age/gender/cohort fixed effects model. Age and gender contribute additively with a cohort term, the latter of which takes the form of a time-varying linear trend in order to extrapolate to new cohorts. (This approximation relies on the fact that a function implied by the sum of two covariance kernels is equivalent to a sum of functions implied by each covariance kernel separately.)

6. $SE(age,gender)+Linear(year)*SE(year)*SE(age,gender)+Linear(ugap)*SE(age,gender)$ - This adds a cyclical term to kernel #2, linear in the unemployment gap, allowing the coefficient on this term to vary smoothly across age and gender. The unemployment gap is computed as the overall unemployment rate minus a long-term trend estimated non-parametrically with a biweight kernel with bandwidth 120. (The unemployment gap is the same for all age/gender cells, but the coefficient on it is allowed to vary.)

7. $SE(age,gender)+Linear(year)*SE(year)*SE(age,gender)+Linear(ugap)*SE(age,gender)+Linear(L(ugap))*SE(age,gender)+Linear(L^2(ugap))*SE(age,gender)$ - This is the same as kernel #6, but adding in the one and two year lags of the unemployment gap as well.

## Estimation

To estimate and forecast using each kernel, I compile a dataset of LFPR by age, gender, and year from the CPS. Since the LFPR is constrained between 0 and 1, I transform this into an unconstrained outcome by passing each observation through the logistic function. I also normalize the outcome and all covariates into z-scores to facilitate easier estimation. 

I begin by estimating the hyperparameters for each kernel. Each kernel is governed by several hyperparameters controlling the variances and length scales of each component, as well as the overall noise in the data, $\sigma$. To estimate the hyperparameters, I choose the value which maximizes the marginal likelihood over the 1976-1999 period. 

Using the estimate of the hyperparameters, I compute the predictions over the 2000-2016 period, and then un-normalize and exponentiate them. These are plotted along with the full data below. 

In [80]:
execfile("scripts/draw_testkernel_plots_plotly1.py")

In [103]:
execfile("scripts/draw_testkernel_plots_plotly2.py")