# **Assignment 3: Estimating a Search Model**

## **Applied Econometrics**

### Conor Bayliss

In this homework, we are going to estimate the parameters of the search model for each demographic group *individually*. That is, you will *not* impose the parametrics restrictions that mapped demographics $X$ to deeper parameters using the `NamedTuples` I used last week.

First, import the necessary packages.

In [5]:
using LinearAlgebra, QuadGK, Distributions, CSV, DataFrames, DataFramesMeta, Statistics, Optim, FastGaussQuadrature, ForwardDiff, Roots

#### **Adapting Code for Automatic Differentiation**

`QuadGK` doesn't play nicely with automatic differentiation since it adjusts the number of nodes adaptively. One solution is to use a fixed number of nodes and weights with `FastGaussQuadrature`. Here is a simple example.

In [7]:
function integrandGL(f,a,b;num_nodes = 10)
    nodes, weights = gausslegendre(num_nodes)
    ∫f = 0.
    for k in eachindex(nodes)
        x = (a+b)/2 + (b-a)/2*nodes[k]
        ∫f += weights[k]*f(x)
    end
    return ∫f*(b-a)/2
end

dS(x;F,β,δ) = (1-cdf(F,x)) / (1-β*(1-δ))
res_wage(wres, b,λ,δ,β,F) = wres - b - β * λ * integrandGL(x -> dS(x;F,β,δ),wres,quantile(F,0.999))
ForwardDiff.derivative(wres -> res_wage(wres,0.,0.5,0.03,0.99,LogNormal(0.,1.)),1.)

7.235638422590369

#### **Re-writing the model solution**

Based on this, we're going to re-write the model solution using this new integration routine. We will also use `Roots` to solve for the reservation wage in a way that will also play nicely with `ForwardDiff`.

In [8]:
res_wage_solution(wres, b,λ,δ,β,F::Distribution) = wres - b - β * λ * integrandGL(x -> dS(x;F,β,δ),wres,quantile(F,0.999))
pars = (;b=-5., λ=0.45,δ=0.03,β=0.99, F=LogNormal(1.,1.))

function solve_res_wage(b,λ,δ,β,F)
    return find_zero(wres -> res_wage_solution(wres,b,λ,δ,β,F),eltype(b)(4.))
end

solve_res_wage(0.,0.4,0.03,0.995,LogNormal())
ForwardDiff.derivative(x -> solve_res_wage(x,0.4,0.03,0.995,LogNormal()),0.)

0.4400135335598148

#### **Cleaning the Data**

The data cleaning is mostly the same as in **Assignment 2**. We add a function which pulls a `NamedTuple` out for a specific demographic group.

In [11]:
data = CSV.read("C:\\Users\\bayle\\Documents\\Github\\metrics\\hw2\\data\\cps_00019.csv",DataFrame)
data = @chain data begin
    @transform :E = :EMPSTAT.<21
    @transform @byrow :wage = begin
        if :PAIDHOUR==0
            return missing
        elseif :PAIDHOUR==2
            if :HOURWAGE<99.99 && :HOURWAGE>0
                return :HOURWAGE
            else
                return missing
            end
        elseif :PAIDHOUR==1
            if :EARNWEEK>0 && :UHRSWORKT<997 && :UHRSWORKT>0
                return :EARNWEEK / :UHRSWORKT
            else
                return missing
            end
        end
    end
    @subset :MONTH.==1
    @select :AGE :SEX :RACE :EDUC :wage :E :DURUNEMP
    @transform begin
        :bachelors = :EDUC.>=111
        :nonwhite = :RACE.!=100 
        :female = :SEX.==2
        :DURUNEMP = round.(:DURUNEMP .* 12/52)
    end
end

# the whole dataset in a named tuple
wage_missing = ismissing.(data.wage)
wage = coalesce.(data.wage,1.)
N = length(data.AGE)
X = [ones(N) data.bachelors data.female data.nonwhite]
# create a named tuple with all variables to conveniently pass to the log-likelihood:
d = (;logwage = log.(wage),wage_missing,E = data.E,tU = data.DURUNEMP, X) #<- you will need to add your demographics as well.

function get_data(data,C,F,R)
    data = @subset data :bachelors.==C :female.==F :nonwhite.==R
    wage_missing = ismissing.(data.wage)
    wage = coalesce.(data.wage,1.)
    N = length(data.AGE)
    # create a named tuple with all variables to conveniently pass to the log-likelihood:
    return d = (;logwage = log.(wage),wage_missing,E = data.E,tU = data.DURUNEMP) 
end

dx = get_data(data,1,0,0) #<- data for white men with a college degree

(logwage = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.516124491189194, 3.872798692268385, 0.0  …  2.7762279256323206, 0.0, 0.0, 0.0, 2.976307324928243, 0.0, 0.0, 3.410759848526933, 0.0, 0.0], wage_missing = Bool[1, 1, 1, 1, 1, 1, 1, 0, 0, 1  …  0, 1, 1, 1, 0, 1, 1, 0, 1, 1], E = Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], tU = [231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0  …  231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0, 231.0])

### **Part 1**

Fix $\sigma_{\zeta}$ (the standard deviation of measurement error in log wages) to 0.05. Following your work from last week (and recitation this week) write a function that calculates the log-likelihood of a single month of data from the CPS given $(h,\delta,\mu,\sigma,w^*)$ where $w^*$ is the reservation wage and $h =$ $\lambda$ x $(1-F_W(w^*;\mu,\sigma))$.

### **Part 2**

Use the log-likelihood to get maximum likelihood estimates of $(\hat{h},\hat{\delta},\hat{\mu},\hat{\sigma},\hat{w^*})$ for *white men with a college degree*. What is the advantage of estimating $h$ and $w^*$ directly instead of $\lambda$ and $b$?

### **Part 3**

Back out the implied maximum likelihood estimates of $\hat{\lambda}$ and $\hat{b}$ as a function of the estimated parameters from **Part 1**.

### **Part 4**

Provide an estimate of the asymptotic variance of $(\hat{h},\hat{\delta},\hat{\mu},\hat{\sigma},\hat{w^*})$ using the standard MLE formula.

### **Part 5**

Recall that the delta method implies that if $\hat{\delta}$ is asymptotically normal with asymptotic variance $V$ then the vector-values function $F(\hat{\delta})$ is also asymptotically normal with:
$$
\sqrt{N}F((\hat{\theta})-F(\theta)) \xrightarrow{d} \mathcal{N}(0,\nabla_{\theta'}FV\nabla_{\theta}F')
$$
Use this fact to estimate the asymptotic variance of $(\hat{h},\hat{\delta},\hat{\mu},\hat{\sigma},\hat{w^*},\hat{\lambda},\hat{b})$.

### **Part 6**

Now report all of your estimates and standard errors for this group. Repeat this exercise for each group.

If we thought that the parametric relationship using $\gamma$ from **Assignment 2** described the true values of the parameters for each group, how might we use these group-specific estimates to derive estimates of each $\gamma$?