# The Search Model

## Identification of the Model without Heterogeneity

The @Mccall Search Model that we introduced as a [prototype model](../models/search.qmd) is an excellent introduction to the identification of structural models. A classic treatment of identification of this model is provided by @flinn1982. 


### Data

Identification arguments *always* begin with an assumption on available data. Even though the model is dynamic, we will show that all we really need is a single cross-section of data:

$$ (E_{n},t_{U,n},W_{n})_{n=0}^{N} $$

where :

- $E_{n}\in\{0,1\}$ indicates the employment status of individual $n$
- If the individual is employed ($E_{n}=1$) we see the wage, $W_{n}$, which is otherwise assumed to be missing.
- If the individual is unemployed ($E_{n}=0$), then we see the duration of unemployment $t_{U,n}$, which is otherwise assumed to be missing.

::: {.callout-note icon="false" collapse="true"}
## Example: CPS Data

::: {#exm-CPS_data}
### Reading the data
Let's take a look at data from the CPS on wages, employment status, and labor market transitions. Here is code to read in the data:


In [None]:
using CSV, DataFrames, DataFramesMeta, Statistics

data = CSV.read("../data/cps_00019.csv",DataFrame)

As you can see from the preview of the data, the data is taken from January-March 2018. Here is a quick snippet of code to see how many observations we have on average per person:


In [None]:
@chain data begin
    groupby(:CPSIDP)
    @combine :T = length(:EMPSTAT)
    @combine :average = mean(:T) :frac_panel = mean(:T.>1)
end

So we see that more than half of the individuals in this sample can be found in more than one month of the data.

The `@chain` macro comes from the package `DataFramesMeta` and is a convenient syntax for composing operations into one block. For example:


In [None]:
#| eval: false
@chain x begin
    func1(y1)
    func2(y2)
    func3(y3)
end

is equivalent to


In [None]:
#| eval: false
func3(func2(func1(x,y1),y2),y3)

:::

::: {#exm-CPS_moments}
### Calculating some moments

You may find the [codebook](/data/CPS-Codebook.html) useful for understanding particular variables. We have already limited the data to individuals who are working (`EMPSTAT`=10), have a job but did not work last week (`EMPSTAT`==12), or are unemployed (`EMPSTAT`==21).

Suppose we wanted to use the panel dimension to measure transition rates. Here is a simple way to do that by simply measuring transitions between January and Feburary.


In [None]:
data[!,:E] .= data.EMPSTAT.<21 #<- code the employment variable

data_jan = @chain data begin
    @subset :MONTH.==1
    @select :CPSIDP :AGE :SEX :EDUC :RACE :E
    @rename :E_lag = :E
end

data_merged = @chain data begin
    @subset :MONTH.==2
    @select :CPSIDP :E
    innerjoin(data_jan,on=:CPSIDP)
end

So now we can calculate the overall transition rate out of unemployment:


In [None]:
@combine data_merged begin
    :EU =  1-mean(:E[:E_lag.==1])
    :UE = mean(:E[:E_lag.==0])
end 

So here we're estimating a very low separation rate and a pretty high hazard rate out of unemployment.
:::
:::{#exm-CPS_hetero}
#### Observable heterogeneity

Next we'll define a very simple education classification (Bachelor's degree or not) and race classification (white vs non-white), and use `groupby` to calculate rates separately by demographics:


In [None]:
@chain data_merged begin
    @transform begin
        :bachelors = :EDUC.>=111
        :nonwhite = :RACE.!=100 
    end
    groupby([:bachelors,:nonwhite,:SEX])
    @combine begin
       :EU =  1-mean(:E[:E_lag.==1])
       :UE = mean(:E[:E_lag.==0])
    end 
end

What do these differences in transition rates tell you about how we should extend the simple model with homogenous parameters?
:::
:::

### Steady State

A key assumption for identification in this case is that the economy is in **steady state**. Let $U_{t}$ be fraction of individuals that are unemployed at time $t$. Let $p_{\tau,t}$ be the fraction of individuals at time $t$ with unemployment duration $\tau$. In general, these objects evolve according to the rules:

\begin{eqnarray}
U_{t+1} = (1-h)U_{t} + \delta (1-U_{t+1}) \\
p_{\tau+1,t+1} = (1-h)p_{\tau,t} \\
p_{0,t+1} = \delta (1-U_{t})
\end{eqnarray}
where $h$ is the "hazard rate" of exiting unemployment, $\lambda(1-F_{W}(w^*))$. Enforcing that these objects are constant between $t$ and $t+1$ (the steady state assumption) gives:

\begin{eqnarray}
U_{t} = \frac{\delta}{\delta+h} \\
p_\tau = \frac{\delta h}{\delta+h}(1-h)^{\tau}
\end{eqnarray}

### Writing Joint Probabilities

With these probabilities in hand, we can write the sampling distribution $\mathbb{P}$ in terms of equilibrium objects:

$$ \mathbb{P}(E,t_U,W) = \left(\frac{h}{h+\delta}F_{W}(W|W>w^*)\right)^{E}\left(\frac{\delta h}{\delta+h}(1-h)^{t_{U}}\right)^{1-E} $$

### Thinking Through Identification

Often it is helpful to break down the joint distribution of observables into unconditional and conditional distributions. Notice here that:

$$ \mathbb{P}(t_{U}|E=0) = h\times(1-h)^{t_{U}} $$

and so the hazard rate $h$ can be inferred from the distribution of unemployment durations.

Next, notice that the probability of being unemployed is 

$$ \mathbb{P}(E=0) = \frac{\delta}{\delta+h} $$

and so $\delta$ is given by $h$ (which we now know) and the fraction of unemployed.

Finally, we see that

$$ \mathbb{P}(W|E=1) = F_{W}(W|W>w^*) $$

Observed wages are equal to the offer distribution *conditional on the wage offer being acceptable*. This tells us that, although the conditional distribution can be identifiable, we do not know the distribution of wage offers that are never accepted (those below $w^*$). Let $\underline{w}$ be the lower bound on the support of the sampling distribution $\mathbb{P}$. Clearly, $w^*$ is identified since:

$$ w^* = \underline{w} $$

Can we identify the deeper structural parameters, $b$, $\lambda$, and $\beta$? Not quite. Let's rewrite the reservation wage equation as:

$$ w^* = b + \beta h\int_{w^*}\frac{1-F(W|W>w^*)}{1-\beta(1-\delta)}dw $$

and we can see that infinitely many combinations of $b$ and $\beta$ can rationalize the same reservation wage. Assuming a plausible value for $\beta$ (you will learn that this is common across many structural estimation exercises), $b$ is identified by this equation.

## Credibility and a Policy Counterfactual

Suppose we used this model to forecast the effect of a tax credit $\tau$ that is proportional to earnings. Under the counterfactual the value function $V$ becomes:

$$ V(w) = (1+\tau)w + \delta U + (1-\delta) V(w) $$

Repeating the previous derivation gives a new reservation wage equation:

$$ w^* = \frac{b}{1+\tau} + \frac{\beta\lambda}{1-\beta(1-\delta)} \int_{w^*}[1-F_{W}(w)]dw $$

and so we see that the effect of the subsidy in the model is equivalent to reducing the flow utility of unemployment. 

It is simple to forecast the effect of the policy counterfactual: we take our estimates and re-calculate the reservation wage. However notice that since the reservation wage will decrease in response to the tax credit, our forecast of the effect depends on an arbitrary parametric assumption that we had to make in order to identify the distribution of wages below the reservation wage $w^*$.

Things look slightly better if we were to impose a tax ($\tau<0$) since this at would use information about the wage distribution that is directly observable, but the underlying identifying assumption is particularly stark: it says that the policy's effect can essentially be inferred from the cross-sectional distribution of wages without any existing policy variation to speak of.

::: {.callout-warning icon="false"}
## Discussion
Does this seem like a credible way to identify the causal impact of the tax subsidy?
:::

::: {.callout-note}
One thing to note about this discussion: while the initial identification discussion seems intuitive and reasonable, it takes a new light once we arrive that the *specific research question of interest*. Conceptually it is important to view the strengths and weaknesses of identification through the lens of desired counterfactuals.
:::

## Identification of the model with an exclusion restriction

Suppose that we have access to a variable $Z$ that enters the model only by moving the flow utility from unemployment. We can re-write the reservation wage equation as:

$$ w^*(z) = b(z) + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{w^*(z)}[1-F_{W}(w)]dw $$

We'll refer to $Z$ as an excluded variable because it effects selection into and out of jobs without moving the distribution of offered wages. Let $\underline{w}(z)$ be the lower bound on the support of the *conditional* sampling distribution of wages $\mathbb{P}(W|E=1,Z)$ and let $\underline{w}_{F}$ be the lower bound on the support of the offer distribution $F_{W}$. 

As before, we know that $w^*(z) = \underline{w}(z)$. Now suppose there is sufficient movement in $Z$ such that the set

$$ \underline{\mathcal{Z}} = \{z: w^*(z)\leq \underline{w}_{F}\} $$

has positive measure. This gives $\underline{w}_{F} = \inf_{z}\underline{w}(z) = \underline{w}$ and the set itself is identified as:

$$ \underline{\mathcal{Z}} = \{z: \underline{w}(z)=\underline{w}\} $$

The unconditional distribution $F_{W}$ is non-parametrically identified by simply looking at the distribution of wages for values of $Z$ where the lower bound of the support achieves this minimum:

Identification of all other parameters can be pursued non-parametrically as a function of $z$.


$$ F_{W}(w) = \mathbb{P}(w | E = 1, z),\ z \in \underline{\mathcal{Z}} $$

:::{.callout-note}
Forecasting the effect of the policy now makes use of *articulated variation*. The model casts variation that scales wages as being equivalent to variation in the value of unemployment $b$, for which we now have a source. This means that we can recreate the experiment by interpolating existing variation in $b$. 
:::

### Further identification by functional form

Suppose that $b(z)$ takes a linear form:

$$ b(z) = b_0 + b_1 z. $$

In this case, for three values ($z,z',z''$) we can write:

$$ w^*(z')-w^*(z) = b_1(z'-z) + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{w^*(z)}^{w*(z')}[1-F_{W}(w)]dw $$
$$ w^*(z'')-w^*(z) = b_1(z''-z) + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{w^*(z)}^{w*(z'')}[1-F_{W}(w)]dw $$

which is sufficient to identify the two unknowns in the equation, $\beta$ and $b_1$. Thus, $\beta$ is identified by additional functional form restrictions.

:::{.callout-warning icon="false"}
## Discussion
- Notice that now the policy counterfactual is given by additional parameters that attempt to match how wages and hazard rates respond to existing policy variation. 
- How do you feel about this approach relative to the case without variation? 
- How do you feel about the model's key implication that variation in $b$ is equivalent to variation in $\tau$?
- What would be an alternative and potentially preferable source of variation?
:::

### Identification with known scale restrictions

There are cases where it would be reasonable to assume a known relationship $b(z)$. For example, if $Z$ is a random policy variable that tells us the generosity of welfare payments. To be consistent with our specification of utility while employed, we should then write:

$$ b(z) = b_0 + Z $$

and here $\beta$ would be identified. 

:::{.callout-note}
Not only is $\beta$ identified in this case, but we can see that it is a key parameter for matching the effect of $Z$, and hence will be a key parameter in determining the response of reservation wages and hazard rates to the policy.
:::


### The Whether vs How of Identification

Let's now assume that we have access to two instruments, $(Z_{1},Z_{2})$. The former shifts $b$, as in the previous example, while the latter shifts wages. $Z_{2}$ could be a plausible instrument for labor demand, or if we wanted to think of $W$ as net wages, $Z_{2}$ could represent a policy variable that sets marginal tax rates. Let's assume that:

$$ \log(W) = \log(\omega) + \log(\mu(Z_{2})),\ Z_{2} \perp \omega $$

where $\omega \sim F_{\omega}$ is the component of wages net of $Z_{2}$. The reservation wage formula can be written in terms of $\omega^*$:

$$ \omega^*(z_1,z_2) = \frac{b(z_1)}{\mu(z_2)} + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{\omega^*(z_1,z_2)}(1-F_{\omega}(\omega))d\omega $$

Fixing $Z_{2}$, let $Z_{1}$ have the same support conditions as in the previous section, implying that $F_{W}(W|Z_2)$ is identified for all $Z_{2}$ in an appropriately defined subset of its support. Notice that the location of the distribution of $F_{\omega}$ and the location of $\mu$ are not separately identified. Hence, we are free to normalize $\mathbb{E}[\omega] = 0$ such that $\log(\mu(Z)) = \mathbb{E}[\log(W)|Z_2]$, and hence $\mu(z)$ is non-parametrically identified by the conditional mean of $\log(W)$ when $Z_{1}$ is in the region of it's support where $\omega^*(Z_1,Z_2)\leq \underline{\omega}$.

In order to introduce the **whether** versus the **how** of identification, let us begin with the observation that the model is *over-identified*: we only need a subset of the available data to identify the model. For example, note that hazard rates out of unemployment are:

$$ h(z_1,z_2) = \lambda (1-F_{\omega}(\omega^*(z_1,z_2))) $$

As we now know, the hazard rate $h(z_1,z_2)$ can be identified directly by the distribution of durations, conditional on $z_1,z_2$. But notice that $h(z'_1,z'_2)/h(z_1,z_2) = (1-F_\omega(\omega^*(z_1,z_2)))/(1-F_{\omega}(\omega^*(z_1,z_2)))$ where the right hand side consists of objects that are directly identified from wage data. 

Now you can go one of two ways: you can enrich the model based on the observation that you have many "spare" features of the data that could be used to identify this extra richness. **Or** you could decide that your model is sufficiently detailed to answer your question of interest (shout out @Marschak1953), and choose the estimation approach that makes your answer most credible. Since the model is undoubtedly misspecified, you will have to make peace with the idea that your model will not match every feature of the data simultaneously. This is the **how** of identification: which features of the data will you use to identify and estimate your model to make the answer to your question of interest most credible?

:::{.callout-important icon="false"}
## The "how" of identification
When your model is over-identified, which features of the data will you use to identify and estimate your model?
:::

Here are two potential tensions from our example:

1. As we discussed above: the shape of the hazard function is identified directly from wage data. Hazard rates are important because they determine unemployment rates, and their responsiveness to $Z_1$ and $Z_2$ determine the how unemployment rates shift with these variables.
2. The reservation wage equation above shows that the model treats changes in $Z_1$ and $Z_2$ through the same mechanism. The model requires that variation in these two variables have an identical effect on reservation wages (and hence hazard rates). This is a strong restriction that would likely not hold in extended versions of the model. Would we want identification to rely on this property if we didn't have to?

How should we approach these tensions? Just as @Marschak1953 did, let's return to the **question of interest**. We want to evaluate a policy counterfactual that varies marginal tax rates. Wouldn't this be most convincing if we show that the model parameters have been chosen to replicate the effects of pre-existing variation that is functionally identical? Some options along these lines:

1. [Minimum Distance] Choose structural parameters to match --- along with more fundamental moments --- moments based on the joint distribution of unemployment durations and $z_2$. 
2. [Indirect Inference] Compute some quasi-experimental estimates of the effect of $Z_2$ on wages and hazard rates, and 
3. [Maximum Likelihood] Maximize the log-likelihood, and validate ex-post that the model can fit observed changes in the hazard rate with respect to $Z_2$, or replicate the results of a quasi-experimental study. This would show that even though the model is overi-identified, it is still using the sources of variation that we view conceptually as being most appropriate.

In the next section of this course, we will review the asymptotic properties of these estimators and discuss practical issues related to implementation.

::: {.callout-warning icon="false"}
## Discussion
- How do you think approach compares to the case where we did not have plausible exogenous variation in wages?
- How do you think it compares to the case where we had exogenous variation in tax rates explicitly (rather than intepolating via articulated variation: wage rates and marginal tax rates have identical effects).
:::

## Identification with unobserved heterogeneity

Now suppose that we allow for $K$ *unobserved types* with population proportions given by $\pi=\{\pi_{k}\}_{k=1}^{K}$ that enter the model through the value of unemployment. Assume that we have an instrument $Z$ that shifts the flow value of unemployment in a known way:

$$ b_{k}(Z) = b_{k} + Z $$

such that we get $K$ latent reservation wages:

$$ w^*_{k}(z) = b_{k} + Z + \frac{\beta\lambda}{1-\beta(1-\delta)}\int_{w^*_{k}(z)}[1-F_{W}(w)]dw $$

This gives a vector of latent hazard rates $h_{k}(z)$ for all values of the instrument $z$. The distribution of hazard rates is itself a mixture:

$$ \mathbb{P}(t_U=t|E=0,Z) = \sum_{k}\tilde{\pi}_{k}h_{k}(z)(1-h_{k}(z))^{t} $$

where $\tilde{\pi}_{k}$ denotes the representation of each type among the unemployed: 

$$\tilde{\pi}_{k} = \frac{\pi_{k}\frac{\delta}{\delta+h_{k}(z)}}{U(z)} $$

@heckman1984 show that, for a fixed $K$, the parameters $(\tilde{\pi}_{k},h_{k}(z))$ are identified from this distribution of durations.

::: {.callout-tip icon="false"}
## Exercise
::: {#exr-search_unobs_hetero}
Complete the identification argument for this model by following these steps:

1. Argue that the vector $\pi$ and $\delta$ can be inverted from $\tilde{\pi}$ and the unemployment rate $U(z)$.
1. State an assumption on the support of $Z$ such that $F_{W}$ is identified in some part of the support of $Z$.
2. Argue that $\lambda$ is identified for $Z$ in this same region of the support.
3. Argue that each $w_{k}^*(z)$ is identified from $h_{k}(z)$ and $F_{W}$.
4. Use the reservation wage equation to argue that each $b_{k}$ and $\beta$ is identified.
:::
:::


<!-- Can we show that the model is not identified and that we need panel data? -->