# **ECON8502: Methods for Structural Microeconometrics**

## **Assignment 6**

### *Conor Bayliss*

#### **Introduction and Setup**

*Let's return to the simple model of labour supply and programme participation from Assignment 2.*

*Consider the following extension to the model. Suppose that each individual $n$ belongs to one of $K$ finite types, $k(n) \in \{1,2,...,K\}$. Types determine differences in the cost of work and programme participation as well as differences in wages:*
$$
U_{ntj} = \log(\max\{50, Y_{nt}(W_{nt}H_j)\})+\alpha_{l,k}\log(112-H_j)-\alpha_{P,k,1}\mathbf{1}\{P_j>0\}-\alpha_{P,k,2}\mathbf{1}\{P_j>1\}
$$
*and*
$$
\log(W_{nt}) = \gamma_{k,0}+\gamma_{k,1}\text{Age}_{nt}+\gamma_{k,2}\text{Age}^2_{nt}.
$$
*So, the parameters $\alpha_l,\alpha_P,\gamma$ are now **heterogeneous by type**.*

In [1]:
using Clustering, Makie, CairoMakie

In [2]:
using CSV, DataFrames, DataFramesMeta, Optim

In [3]:
include("C:\\Users\\bayle\\Documents\\Github\\micro_labour\\src\\Transfers.jl")

Main.Transfers

In [5]:
data = @chain begin
    CSV.read("C:\\Users\\bayle\\Documents\\Github\\micro_labour\\hw1\\hw1_micro_labour_data\\MainPanelFile.csv",DataFrame,missingstring = "NA")
    #@select :MID :year :wage :hrs :earn :SOI :CPIU :WelfH :FSInd :num_child :age
    @subset :year.>=1985 :year.<=2010
    @transform :AFDC = :WelfH.>0 
    @rename :FS = :FSInd
    @transform :P  = :FS + :AFDC :H = min.(2,round.(Union{Int64, Missing},:hrs / (52*20)))
    @subset .!ismissing.(:P) .&& .!ismissing.(:H)
    @transform @byrow :wage = begin
        if :hrs>0 && :earn>0
            return :earn / :hrs / :CPIU
        else
            return missing
        end
    end
end

data_mle = (;P = Int64.(data.P), H = Int64.(data.H), year = data.year, age = data.age,
            soi = data.SOI, num_kids = data.num_child, cpi = data.CPIU,
            logwage = log.(coalesce.(data.wage,1.)),wage_missing = ismissing.(data.wage))

(P = [2, 2, 2, 1, 0, 0, 0, 2, 1, 0  …  2, 2, 1, 2, 0, 0, 1, 1, 1, 1], H = [0, 0, 0, 0, 0, 0, 0, 0, 0, 2  …  0, 0, 2, 2, 2, 2, 0, 0, 0, 0], year = [1994, 1995, 1996, 1998, 2000, 2002, 2004, 2006, 2008, 1990  …  1991, 1992, 1991, 1992, 1991, 1992, 1991, 1992, 1991, 1992], age = [21, 22, 23, 25, 27, 29, 31, 33, 35, 17  …  21, 22, 23, 24, 39, 40, 30, 31, 25, 26], soi = [43, 43, 43, 43, 43, 43, 43, 43, 43, 17  …  44, 44, 7, 7, 5, 5, 39, 39, 39, 39], num_kids = [2, 2, 3, 3, 3, 3, 3, 3, 3, 1  …  2, 2, 2, 2, 3, 3, 4, 4, 2, 2], cpi = [0.860812349005761, 0.884959812302546, 0.910948243820851, 0.946664188812488, 1.0, 1.04457233785542, 1.09707768072849, 1.17054218546739, 1.25008130459022, 0.758792510685746  …  0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336, 0.790785866939231, 0.8148346032336], logwage = [0.0, 0.0, 0.0, 0.0, 1.2039728043259361, 0.0, 0.0, 0.0, 0.0, 0.8230555833449215  …  0.0, 0.0, 2.651257

#### **Part 1**

*Write a routine to classify individuals in the data into one of $K=3$ types using K-means clustering. You may find the package `Clustering.jl` useful.*

#### **Part 2**

*Calculate and plot average work and average programme participation over time for each of these types. Comment on what you are seeing.*

#### **Part 3**

*Write code to estimate this extended model. You could just make a small extension to the maximum likelihood estimator you used in Assignment 2, or you could try another approach if your prefer.*