## The Probit model

Assume an agent $n \in N$ has to decide between two alternatives, working $(i=1)$ or leisure $(i=0)$. The utility can be writen as

$U_{n1}=\beta x_n+\varepsilon_{n1} $ if the agent chose to work

$U_{n0}=\mu +\varepsilon_{n0} $ if the agent chose to do not work, i.e., leisure.

The utility of working can be written as a linear component plus an idiosyncratic shock. If an agent $n$ chooses to work, she achieves a utility that is a function of her education attainment $x$ plus a working-related shock. The parameter $\beta$ indicates the premium for each year of education, a parameter we are interested in recover it.

The utility of not-working can be written as a fixed component $\mu$ plus a leisure-related idiosyncratic shock. The fixed component can be interpreted as unemployment insurance and for simplicity, we are assuming it does not vary between agents. This assumption can be relaxed easily.

The shock $\varepsilon_{nj}$, $j=\{0,1\}$ is i.i.d. and choice specific, meaning that the agent face as many shocks as decisions she can choose from. In this model, we are going to assume that these idiosyncratic shocks come from standard Normal distrubution.

### Decision
The agent has to decide between working or do not working. In doing so, she need to compare the utility reported by each alternative. The agent will work if

$$U_{n1}>U_{n0}$$

In terms of of probabilities, we can write

$$Pr(U_{n1}>U_{n0})$$

Using the expression for the utilities, we have

$$Pr(\beta x_n+\varepsilon_{n1}>\mu +\varepsilon_{n0})$$

And after some manipulation, he have

$$Pr(\varepsilon_{n0}-\varepsilon_{n1}<\beta x_n-\mu)$$

Due to the fact that the error is iid normal distributed, the term in the left also distributes normally but with mean $0$ and standard deviation $\sqrt{2}$. That is

$$z \sim N(0,\sqrt{2})$$

Where $z=\varepsilon_{n0}-\varepsilon_{n1}$.

### Numerical example

Let's simulate data on education for 1000 agents. For doing that, in Julia, we need to use `Plots`, `Distributions`, and `Random packages`. Let's set a seed so our results are replicable.

In [1]:
using Plots, Distributions, Random, Optim, BenchmarkTools

Random.seed!(3)
β=0.5
μ=2.0;
n=10000
x=rand(Uniform(1, 10),n)

dist=Normal();
ϵw=rand(dist,n);
ϵn=rand(dist,n);

uw=β*x+ϵw;
un=ϵn.+μ;

decision=zeros(n)
for i=1:n
    if uw[i]>un[i]
        decision[i]=1
    else
        decision[i]=0
    end
end
mean(decision)

0.6366

Now, let's define our distribution z. In Julia, this can implemented as

In [2]:
z=Normal(0,sqrt(2));

And our `prob(β,μ)` function can be written as

In [3]:
function prob(β,μ)
    pr=cdf(z,β*x.-[μ])
end

prob (generic function with 1 method)

Finally, we need to write our Log-Likelihood function. No changes from two previous lectures.

In [4]:
function logL_fn(θ) #LogL function
    β=θ[1]
    μ=θ[2]
    logL=0
    n=1000
    pr=prob(β,μ)
    for id=1:n 
        logL=logL+log(pr[id])*decision[id]+log(1-pr[id])*(1-decision[id])
    end
    return -(logL)
end

logL_fn (generic function with 1 method)

And through a MLE, we estimate our parameters

In [5]:
θguess=[0.4,1.0]
res=optimize(logL_fn, θguess)
θstar=Optim.minimizer(res)

2-element Vector{Float64}:
 0.5106298425303077
 2.0552833922712783

Values that are very close to the true parameters.

Go to Lecture 4: [Dynamic Probit Model with two choices](https://github.com/ruedatesta/discrete_choice_models/blob/main/lec4_dynamic_probit.ipynb)