# The Logit Model through simulation


Probably, the most important feature of the Logit model is the fact that the probability has a closed-form solution.
However, this is not the case in general and it will be useful to re-solve the previous example through simulation.

Assume the same environment as before, where an agent $n \in N$ has to decide between two alternatives, working $(i=1)$ or leisure $(i=0)$. The utility can be writen as

$U_{n1}=\beta x_n+\varepsilon_{n1} $ if the agent chose to work

$U_{n0}=\mu +\varepsilon_{n0} $ if the agent chose to do not work, i.e., leisure.

We are interesting into solving the following probability

$$Pr(\varepsilon_{n0}-\varepsilon_{n1}<\beta x_n-\mu)$$

Let's re-write in Julia the previous steps up to the computation of the probability function `prob(β,μ)`.
Notice that in this step, we need to the package `StatsBase` for computing an empirical cdf.

In [6]:
using Plots, Distributions, Random, Optim, StatsBase
β=0.5; # premium for education
μ=2.0; # ui
n=1000; # Number of individuals
Random.seed!(3);
x=rand(Uniform(1, 10),n);
Random.seed!(3);
dist=Gumbel();
ϵw=rand(dist,n);
ϵn=rand(dist,n);

uw=β*x+ϵw;
un=ϵn.+μ;

decision=zeros(n)
for i=1:n
    if uw[i]>un[i]
        decision[i]=1
    else
        decision[i]=0
    end
end
mean(decision)

0.666

But now, as a difference with the previous example, we are going to use simulations to solve for this probability. The starting point will be seting the number of simulations `sim`

In [4]:
sim=2000;

Important to notice, this `sim` is not the number of people we are going to simulate, this was the object `n`. Here, we are going to set the number of draws for the working and not-working shocks. Our goal is to compute an empirical cummulative distribution function of the distribution $z=\varepsilon_{n0}-\varepsilon_{n1}$

In [12]:
ϵw_sim=rand(dist,sim);
ϵn_sim=rand(dist,sim);

z_sim=ϵn_sim-ϵw_sim;
z_sim=sort!(z_sim);

z_cdf=ecdf(z_sim); #This is the empirical cdf

In Julia, `e_cdf` is a higher-level function, that returns a function... Wait, what? 

In plain English, this means that `ecdf` is a function that take as an input a vector of realizations, and gives us a function that we can evaluate at any point. In our case, `z_sim` is the vector of realizations and `z_cdf`, is the empirical cdf function that we can evaluate. Let's see an example.

We know that z_cdf is actually the cdf of a standard logistic distribution. Because a standard logistic distribution is symmetric, we know that the cdf evaluated at the mean should be 0.5. Let's check that our function is correct

In [14]:
z_cdf(0.0)

0.47

Our function is very close to the true value. If we want to improve how accurate is our empirical cdf, we could increase the number of draws.

Now, let's focus on the main challenge of this example. Computing the function `prob(β,μ)`.
Taking advantage of our previous function `z_cdf()`, we can define `prob(β,μ)` as a function that for any pairs of `(β,μ)`, returns the empirical cdf. A possible implementation in Julia can be

In [9]:
function prob(β,μ)
    pr=z_cdf(β*x.-[μ])
end

prob (generic function with 1 method)

Now, we can copy-paste our Log-Likelihood function for the previous example

In [10]:
function logL_fn(θ) #LogL function
    β=θ[1]
    μ=θ[2]
    logL=0
    n=1000
    pr=prob(β,μ)
    for id=1:n 
        logL=logL+log(pr[id])*decision[id]+log(1-pr[id])*(1-decision[id])
    end
    return -(logL)
end

logL_fn (generic function with 1 method)

And the estimation of our parameters of interest reads

In [11]:
θguess=[0.4,1.0];
res=optimize(logL_fn, θguess)
θstar=Optim.minimizer(res)

2-element Vector{Float64}:
 0.5786890474613755
 2.359476813510991

The parameters estimated are close to the population parameters, however we can notice that they are not as closer as the previous example. Nevertheless, this can be fixed if we increase the the number of draws `sim`.

Go to Lecture 3: [The Probit Model](https://github.com/ruedatesta/discrete_choice_models/blob/main/lec3_probit.ipynb)