# Question 2: Homogeneous Coefficients with Demographics

In [1]:
using Plots, DataFrames, CSV, GLM
using Optim, Distributions, Random, ForwardDiff
using LinearAlgebra,StatsFuns

In [2]:
df = DataFrame(CSV.File("data/ps1_ex2.csv"));
products = sort(unique(df, ["choice"]));

# construct vectors
D = Array(df[:,["d.1", "d.2"]]);
X = Array(products[:, ["x.1", "x.2", "x.3"]]);
C = Array(df[:,["choice"]]);
Y = zeros(size(D)[1], size(X)[1]);
# for now brute force but probably better way to do this
for i in 1:size(Y)[1]
    Y[i, C[i]] = 1;
end 

# Part 5: Estimating $(\delta, \Gamma)$
Building a log-likelihood function:
$$ \sum_i \sum_j y_{ij} \left[ \delta_j + d_i' \Gamma x_j - \log \left( {\sum_{k=1}^{31} \exp(\delta_k + d_i' \Gamma x_k)} \right) \right]  $$

Inputs are:
- $X$ is a $31 \times 3$ matrix of 31 products with 3 characteristics
- $D$ is a $4000 \times 2$ matrix of 4000 individuals with 2 demographic observables

The parameters to be estimated should have the following dimensions:
- $\delta$ should be a vector with 31 rows 
- $\Gamma$ should be a $2 \times 3$ matrix of coefficients
   

In [3]:
# Likelihood 
ll = function(δ, Γ)
    likelihood = 0
    for i in 1:size(D)[1]
        likelihood += (Y[i,:]' * (δ + (D[i, :]' * Γ * X')')) - log(sum(exp.(δ + (D[i, :]' * Γ * X')')))
    end
    return -likelihood # notice that we are returning the negative likelihood
end

# Optim wrapper (since it takes one vector as argument)
ll_wrap = function(x)
    δ = x[1:31]
    Γ = reshape(x[33:38],2,3)
    return ll(δ, Γ)
end

#3 (generic function with 1 method)

In [4]:
# Minimize the negative likelihood
params0 = zeros(38);
optimum = optimize(ll_wrap ,params0, LBFGS(), autodiff=:forward)
#MLE = optimum.minimizer;

 * Status: success

 * Candidate solution
    Final objective value:     8.721408e+03

 * Found with
    Algorithm:     L-BFGS

 * Convergence measures
    |x - x'|               = 1.02e-10 ≰ 0.0e+00
    |x - x'|/|x'|          = 4.03e-11 ≰ 0.0e+00
    |f(x) - f(x')|         = 3.64e-12 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 4.17e-16 ≰ 0.0e+00
    |g(x)|                 = 7.93e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   76  (vs limit Inf)
    Iterations:    165
    f(x) calls:    496
    ∇f(x) calls:   496


In [5]:
MLE = optimum.minimizer;
δ = MLE[1:31];
Γ = reshape(MLE[33:38],2,3);

## Double check coefficients
Here I am using the FOC for $\delta_j$ to verify that the estimated coefficients yield predicted shares that match the data:
$$ \frac{1}{N} \sum_i y_{ij} = \frac{1}{N} \sum_i \frac{\exp(\delta_j + d_i ' \Gamma x_j)}{\sum_k \exp(\delta_k + d_i ' \Gamma x_k)} $$

In [10]:
# predicted shares
aa = [] 
for j in 1:31
    ss = 0
    for i in 1:size(D)[1]
        ss += exp.(δ[j] + (D[i,:]' * Γ * X[j,:])') / sum(exp.(δ + (D[i,:]' * Γ * X')'))
    end
    push!(aa, ss / 4000)
end

# actual shares
yy = []
for j in 1:31
    yij = 0
    for i in 1:size(D)[1]
        yij += Y[i,j]
    end
    push!(yy, yij / 4000)
end

# Now put them next to each other: yay!
maximum(aa .- yy)

1.8013229796665087e-12

# Part 7: Obtaining estimate of $\beta$
The proposed moment condition is simply exogeneity of the product specific term and other observed characteristics:

$$E[x_j \xi_j] = 0$$

In this case, $\beta$ can be identified from regressing the following equation; and $\xi_j$ would be the error term

$$ \delta_j = x_j' \beta + \xi_j $$

In [7]:
# build dataframe
products[!, :delta] = δ
rename!(products,[:choice,:x1, :x2, :x3, :d1, :d2, :delta])

# regression
lm(@formula(delta ~ x1 + x2 + x3), products)

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

delta ~ 1 + x1 + x2 + x3

Coefficients:
────────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error       t  Pr(>|t|)    Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)  -1.57627    0.119483   -13.19    <1e-12  -1.82143     -1.33111
x1            0.140275   0.0676787    2.07    0.0479   0.00140971   0.27914
x2            1.01255    0.0687109   14.74    <1e-13   0.87157      1.15354
x3            0.430487   0.0652463    6.60    <1e-06   0.296613     0.564362
────────────────────────────────────────────────────────────────────────────