# **Assignment 2**

## **ECON8208 - Applied Econometrics**

### *Conor Bayliss*

#### **Setup**

Consider the following extension of the undirected search model. Let $X_n$ be a vector of demographics for person $n$:
$$
X_n = [1, C_n, F_n, R_n]
$$
where $C_n$ is a dummy variable that indicates if an individual has a college degree, $F_n$ is a dummy variable indicating that an individual is female, and $R_n$ is a dummy that indicates if person $n$ reports their race as not "white". Define a new set of parameters which depend on these observables:
* The flow value of unemployment is $b(X) = X\gamma_b$
* The probability of job destruction is 
$$
\delta(X) = \frac{\exp(X\gamma_\delta)}{1+\exp(X\gamma_\delta)}
$$
* The probability of a job offer is
$$
\lambda(X) = \frac{\exp(X\gamma_\lambda)}{1+\exp(X\gamma_\lambda)}
$$
* $\beta$ takes a value of 0.995
* Wage offers are drawn from a log normal distribution with mean $\mu(X) = X\gamma_\mu$ and standard deviation $\sigma(X) = \exp(X\gamma_\sigma)$
* Log wages are observed with measurement error
$$
\log(W_n^o) = \log(W_n) + \xi_n, \quad \xi_n \sim N(0,\sigma^2_\xi)
$$
Hence, the parameters of the model which we are going to estimate are:
$$
\theta = (\gamma_b, \gamma_\delta, \gamma_\lambda, \gamma_\mu, \gamma_\sigma,\sigma^2_\xi)
$$

We are going to estimate this model on CPS data. The following code follows Jo's setup. We import the data, impute wages for workers who are not paid by the hour, limits the observations to January, and convert weekly unemployment durations to monthly.

In [30]:
using CSV, DataFrames, DataFramesMeta, Statistics

data = CSV.read("C:\\Users\\bayle\\Documents\\Github\\metrics\\hw2\\data\\cps_00019.csv", DataFrame)
data = @chain data begin
    @transform :E = :EMPSTAT .< 21
    @transform @byrow :wage = begin
        if :PAIDHOUR == 0
            return missing
        elseif :PAIDHOUR == 2
            if :HOURWAGE < 99.99 && :HOURWAGE > 0
                return :HOURWAGE
            else
                return missing
            end
        elseif :PAIDHOUR == 1
            if :EARNWEEK>0 && :UHRSWORKT.<997
                return :EARNWEEK / :UHRSWORKT
            else
                return missing
            end
        end
    end
    @subset :MONTH.==1
    @select :AGE :SEX :RACE :EDUC :wage :E :DURUNEMP
    @transform :DURUNEMP = round.(:DURUNEMP.*12/52)
end

Row,AGE,SEX,RACE,EDUC,wage,E,DURUNEMP
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Float64?,Bool,Float64
1,72,1,100,81,missing,true,231.0
2,66,1,100,111,missing,true,231.0
3,61,2,100,111,missing,true,231.0
4,52,2,200,73,20.84,true,231.0
5,19,2,200,73,10.0,true,231.0
6,56,2,200,111,25.0,true,231.0
7,22,2,200,81,9.5,true,231.0
8,23,2,100,124,missing,true,231.0
9,24,2,100,124,missing,true,231.0
10,59,2,200,111,missing,true,231.0


Recall that the optimal decision rule of the worker is characterised by a reservation wage. In class, we derived the following reservation wage equation:
$$
w^* = b + \beta \lambda \int_{w^*} \frac{1-F_W(w)}{1-\beta(1-\lambda)}dw
$$
and we showed that the steady state unemployment rate is:
$$
P[E=0] = \frac{h}{h+\delta}
$$
where $h=\lambda(1-F_W(w^*))$ is the rate at which workers exit unemployment. In steady state, the fraction of unemploymed agent who have been unemployed for $t$ periods is:
$$
P[t_U = t] = h(1-h)^t
$$
The following code follows the provided code from the `Models` section of the course website.

In [2]:
using Distributions, QuadGK

dS(x;F,β,δ) = (1-cdf(F,x)) / (1-β*(1-δ))
res_wage(wres; b,λ,δ,β,F::Distribution) = wres - b - β * λ * quadgk(x -> dS(x;F,β,δ),wres,Inf)[1]
pars = (;b = -5.,λ = 0.45, δ = 0.03, β=0.99,F = LogNormal(1,1))
res_wage(1. ; pars...)

-33.70656559385876

In [3]:
using Optim
function solve_res_wage(pars)
    (;F) = pars
    w_lb = quantile(F,0.001)
    w_ub = quantile(F,0.999)
    r = optimize(x -> res_wage(x; pars...)^2,w_lb,w_ub)
    return r.minimizer
end
rwage = solve_res_wage(pars)

7.2471320591661295

##### **Question 1**

**Following your notes from class, write a function that, given a set of parameters, solves for the reservation wage for each unique combination of the variable in X (there are 8 in total).**

In [33]:
data = @chain data begin
    @transform(:college = Int.((:EDUC .== 110) .| (:EDUC .== 111) .| (:EDUC .== 120) .| (:EDUC .== 121) .| (:EDUC .== 122) .| (:EDUC .== 123)))
    @transform(:nonwhite = Int.(:RACE .!= 100))
    @transform(:SEX = :SEX .- 1)
    @transform(:demographic = ifelse.(:nonwhite .== 0 .& :college .== 1 .& :SEX .== 0, 1,
                            ifelse.(:nonwhite .== 0 .& :college .== 0 .& :SEX .== 0, 2,
                            ifelse.(:nonwhite .== 0 .& :college .== 1 .& :SEX .== 1, 3,
                            ifelse.(:nonwhite .== 0 .& :college .== 0 .& :SEX .== 1, 4,
                            ifelse.(:nonwhite .== 1 .& :college .== 1 .& :SEX .== 0, 5,
                            ifelse.(:nonwhite .== 1 .& :college .== 0 .& :SEX .== 0, 6,
                            ifelse.(:nonwhite .== 1 .& :college .== 1 .& :SEX .== 1, 7, 8))))))))
end

Row,AGE,SEX,RACE,EDUC,wage,E,DURUNEMP,college,nonwhite,demographic
Unnamed: 0_level_1,Int64,Int64,Int64,Int64,Float64?,Bool,Float64,Int64,Int64,Int64
1,72,-1,100,81,missing,true,231.0,0,0,2
2,66,-1,100,111,missing,true,231.0,1,0,2
3,61,0,100,111,missing,true,231.0,1,0,1
4,52,0,200,73,20.84,true,231.0,0,1,8
5,19,0,200,73,10.0,true,231.0,0,1,8
6,56,0,200,111,25.0,true,231.0,1,1,8
7,22,0,200,81,9.5,true,231.0,0,1,8
8,23,0,100,124,missing,true,231.0,0,0,1
9,24,0,100,124,missing,true,231.0,0,0,1
10,59,0,200,111,missing,true,231.0,1,1,8


In [35]:
wage_missing = ismissing.(data.wage) ## Boolean vector of missing values
wage = coalesce.(data.wage,1.) ## Replace missing values with 1
## Create dictionary of necessary variables
### logwage, missing wages, employment status, demographic
d = (;logwage = log.(wage), wage_missing, E = data.E, demographic = data.demographic)

(logwage = [0.0, 0.0, 0.0, 3.0368742168851663, 2.302585092994046, 3.2188758248682006, 2.2512917986064953, 0.0, 0.0, 0.0  …  0.0, 0.0, 2.4849066497880004, 3.056356895370426, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], wage_missing = Bool[1, 1, 1, 0, 0, 0, 0, 1, 1, 1  …  1, 1, 0, 0, 1, 1, 1, 1, 1, 1], E = Bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], demographic = [2, 2, 1, 8, 8, 8, 8, 1, 1, 8  …  2, 1, 1, 2, 2, 2, 1, 2, 1, 1])

# Work in progress from here onwards

First, let us create a `NamedTuple` for each different demographic category. This will allow us to update the parameters for each different demographic category separately.

In [3]:
using Distributions

# White men, college #

w_m_c_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# White men, non-college #

w_m_nc_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# White women, college #

w_w_c_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# White women, non-college #

w_w_nc_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# Non-white men, college #

nw_m_c_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# Non-white men, non-college #

nw_m_nc_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# Non-white women, college # 

nw_w_c_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

# Non-white women, non-college #

nw_w_nc_params = (;γb = 1., # unemployment benefits
                γδ = 0.03, # job destruction rate
                γλ = 0.45, # job finding rate
                γμ = 0.1, # mean of the wage distribution
                γσ = 0.5, # standard deviation of the wage distribution
                σζ = 0.5, # standard deviation of the wage measurement error
                β = 0.99, # discount factor
                F = LogNormal(0.1,0.5)) # wage distribution

(γb = 1.0, γδ = 0.03, γλ = 0.45, γμ = 0.1, γσ = 0.5, σζ = 0.5, β = 0.99, F = LogNormal{Float64}(μ=0.1, σ=0.5))

In [25]:
function update!(params::NamedTuple) ### update parameters, needs changing once we have log-likelihood
    (; γb, γδ, γλ, γμ, γσ, σζ, β, F) = params
    γb = 0.2
    γδ = 0.5
    γλ = 0.7
    γμ = 0.9
    γσ = 1.1
    σζ = 0.8
    β = 0.99
    F = LogNormal(γμ, γσ)
    return return (; γb, γδ, γλ, γμ, γσ, σζ, β, F)
end

update!(w_m_c_params) ### test of the update function

(γb = 0.2, γδ = 0.5, γλ = 0.7, γμ = 0.9, γσ = 1.1, σζ = 0.8, β = 0.99, F = LogNormal{Float64}(μ=0.9, σ=1.1))

In [28]:
using Optim, QuadGK

dS(x;params) = (1-cdf(params.F,x)) / (1-params.β*(1-params.γδ))
res_wage(wres; params) = wres - params.γb - params.β * params.γλ * quadgk(x -> dS(x; params), wres, Inf)[1]


function solve_res_wage(params)
   (;γb, γδ, γλ, γμ, γσ, σζ, β, F) = params
    w_lb = quantile(F,0.001)
    w_ub = quantile(F,0.999)
    r = optimize(x -> res_wage(x; params)^2, w_lb, w_ub)
    return r.minimizer
end

solve_res_wage(w_m_c_params) ### test of the solve_res_wage function

1.924445520908056

##### **Question 2**

**Write a function that takes a single observation from the cross-section and calculates the log-likelihood of that observation given the model solution, current parameters, and observables, $X_n$. Show the output from a function call to prove that it works, then use the ``` @time ``` macro to test how long it takes.**

##### *Hint*: 

**Relative to your notes in class, you will need to integrate out the measurement error here for wages. Letting $\phi(x;\mu,\sigma)$ be the normal pdf with mean $\mu$ and standard error $\sigma$, the likelihood of observing a wage $W^o$ will be:**
$$
f(W^o|E,X) = \int_{w*} \frac{\phi(\log(w);\mu(X),\sigma(X))}{1-\Phi(log(w^*);\mu(X),\sigma(X))}\phi(\log(W^o)-w;\sigma_{\zeta}) dw
$$

**You will want to use a package like `QuadGK` to evaluate this integral numerically.**

In [53]:
function calculate_density(logwage, wres, params)
    (;γb, γδ, γλ, γμ, γσ, σζ, β, F) = params
    integrand = (w) -> pdf(Normal(γμ, γσ), log(w)) / (1 - cdf(Normal(γμ, γσ), log(wres))) * pdf(Normal(0, σζ), logwage - log(w))
    density = quadgk(integrand, wres, Inf)[1]
    return density
end

calculate_density (generic function with 1 method)

In [56]:
function log_likelihood_single_observation(d, i, params)
    # Extract the i-th elements from the dictionary
    logwage = d.logwage[i]
    wage_missing = d.wage_missing[i]
    # If the wage data is missing, return a log-likelihood of 0
    if wage_missing
        return 0.0
    end
    # Calculate the reservation wage
    wres = solve_res_wage(params)
    # Calculate the density of the observed wage given the reservation wage and other parameters
    density = calculate_density(logwage, wres, params)
    # Take the natural log of the density
    ll = log(density)
    return ll
end

log_likelihood_single_observation (generic function with 1 method)

In [61]:
### Note that I choose observation 900 as a test case because it has a non-missing wage value
@time begin
log_likelihood_single_observation(d, 900, w_m_c_params) ### test of the log_likelihood_single_observation function
end

  0.000133 seconds (33 allocations: 5.562 KiB)


-4.589271534806701

##### **Question 3**

**Write a function that iterates over every observation in the data and calculates the log-likelihood of the data given parameters.**

#### *Hint*: 

**You may find that these functions work faster if you pull the data you need out of the `DataFrame` format and save it as arrays or vectors with known type. For example, I would recommend creating a flag for missing wage data and a default value for those missing wages, and iterating over those objects.**

In [1]:
function log_likelihood(params)
    # Initialise the log-likelihood
    log_likelihood_val = 0.0
    # Loop over the observations
    for i in 1:length(d.logwage)
        # Add the log-likelihood contribution of each observation to the total log-likelihood
        log_likelihood_val += log_likelihood_single_observation(d, i, params)
    end
    return log_likelihood_val
end


log_likelihood (generic function with 1 method)

In [5]:
log_likelihood(w_m_c_params) ### test of the log_likelihood function

UndefVarError: UndefVarError: `d` not defined