# Logit and Nested Logit

In [1]:
import pyblp
import numpy as np

pyblp.options.digits = 3
pyblp.options.verbose = False
pyblp.__version__

'0.5.0'

We will compare two simple models, the plain (IIA) logit model and the nested logit (GEV) model using the Nevo (2000) Fake Cereal Dataset

In [2]:
# First load the data
product_data = np.recfromcsv(pyblp.data.NEVO_PRODUCTS_LOCATION, encoding='utf-8')
product_data = {k: product_data[k] for k in product_data.dtype.names}

## Plain Logit

Let's start with the plain logit model under independence of irrelevant alternatives (IIA).

In this  model (indirect) utility is given by:
$$u_{ijt} = x_{jt} \beta - \alpha p_{jt} + \xi_{jt} + \varepsilon_{ijt}$$

Where $\varepsilon_{ijt}$ is distribuetd IID with Type I extreme value (Gumbel) distribution. It is common to normalize the mean utility of the outside good to zero so that $u_{i0t} = \varepsilon_{i0t}$.

This gives us aggregate marketshares 
$$s_{jt} = \frac{e^{x_{jt} \beta   - \alpha p_{jt}  + \xi_{jt}}}{\sum_k e^{x_{kt} \beta   - \alpha p_{jt}  + \xi_{kt}}}$$

If we take logs we get that:

\begin{align}
\ln s_{jt} &=x_{jt} \beta  - \alpha p_{jt}  + \xi_{jt} &- \ln\left(\sum_k e^{x_{kt} \beta  - \alpha p_{jt}  + \xi_{kt}}\right)\\
\ln s_{0t} &=  0 &- \ln\left(\sum_k e^{x_{kt} \beta   - \alpha p_{jt}  + \xi_{kt}}\right)
\end{align}

By differencing the above we get a linear estimating equation:
$$\ln s_{jt} - \ln s_{0t} = x_{jt}\beta  - \alpha p_{jt}  + \xi_{jt} $$

Because the left hand side is data, we can estimate this model using linear IV GMM.

Comparing results from the full BLP model with results from the simpler Logit model is straightforward. A Logit :class:`Problem` can be created by simply excluding the formulation for $X_2$ along with any agent information. We'll set up and solve a simpler version of the fake cereal problem from :ref:`references:Nevo (2000)`. Since we won't include any nonlinear characteristics or parameters, we don't have to worry about configuring an optimization routine.

In [3]:
logit_formulation = pyblp.Formulation('0 + prices', absorb='C(product_ids)')
logit_formulation

prices + Absorb[C(product_ids)]

In [4]:
problem = pyblp.Problem(logit_formulation, product_data)
problem

Dimensions:
 N     T    K1    MD    ED 
----  ---  ----  ----  ----
2256  94    1     20    1  

Formulations:
       Column Indices:           0   
-----------------------------  ------
 X1: Linear Characteristics    prices

In [5]:
results = problem.solve()
results

Problem Results Summary:
Cumulative  GMM   Optimization   Objective   Total Fixed Point  Total Contraction  Objective    Gradient   
Total Time  Step   Iterations   Evaluations     Iterations         Evaluations       Value    Infinity Norm
----------  ----  ------------  -----------  -----------------  -----------------  ---------  -------------
 0:00:00     2         0             1              0.0                0.0         +4.23E+05       NA      

Linear Estimates (Robust SEs in Parentheses):
Beta:    prices   
-----  -----------
        -3.00E+01 
       (+1.01E+00)

## Nested Logit

We can extend the logit model to allow for correlation within a group $g$ so that:
$$u_{ijt} = x_{jt} \beta + \xi_{jt} +  \eta_{ig} +  (1-\rho) \varepsilon_{ijt}$$

Now, we require that $\eta_{ig} +  (1-\rho) \varepsilon_{ijt}$ is distributed Type I extreme value (Gumbel) distribution. As $\rho \rightarrow 1$ then all consumers stay within t the group.

This gives us aggregate marketshares 
$$s_{jt} = \frac{e^{x_{jt} \beta + \xi_{jt}}}{\sum_k e^{x_{kt} \beta + \xi_{kt}}}$$

If we take logs we get that:

\begin{align}
\ln s_{jt} &=x_{jt} \beta + \xi_{jt} &- \ln\left(\sum_k e^{x_{kt} \beta + \xi_{kt}}\right)\\
\ln s_{0t} &=  0 &- \ln\left(\sum_k e^{x_{kt} \beta + \xi_{kt}}\right)
\end{align}

After some work (see Berry (1994) or Cardell (1991)) we again obtain the linear estimating equation:
$$\ln s_{jt} - \ln s_{0t} = x_{jt}\beta + \xi_{jt} $$

Because the left hand side is data, we can estimate this model using linear IV GMM.

In [11]:
import pandas as pd

df=pd.DataFrame(product_data)
df['nesting_ids'] = 1
df['demand_instruments20']=df.groupby(['market_ids','nesting_ids'])['shares'].transform(lambda x: len(x))
problem = pyblp.Problem(logit_formulation, df)
results_1= problem.solve(rho=0.7)

df2=pd.DataFrame(product_data)
df2['nesting_ids'] = product_data['mushy']
df2['demand_instruments20']=df2.groupby(['market_ids','nesting_ids'])['shares'].transform(lambda x: len(x))
problem = pyblp.Problem(logit_formulation, df2)
results_2= problem.solve(rho=0.7)

df3=pd.DataFrame(product_data)
df3['nesting_ids']=pd.cut(df3.sugar,[-0.5,3.5,12.5,100],labels=False)
df3['demand_instruments20']=df3.groupby(['market_ids','nesting_ids'])['shares'].transform(lambda x: len(x))
problem = pyblp.Problem(logit_formulation, df3)
results_3= problem.solve(rho=0.7)

print("*"*25)
print("All Products in Same Nest")
print("*"*25)
print(results_1)
print("*"*25)
print("Mushy vs. non-Mushy")
print("*"*25)
print(results_2)
print("*"*25)
print("Low/Medium/High Sugar Content")
print("*"*25)
print(results_3)

*************************
All Products in Same Nest
*************************
Problem Results Summary:
Cumulative  GMM   Optimization   Objective   Total Fixed Point  Total Contraction  Objective    Gradient   
Total Time  Step   Iterations   Evaluations     Iterations         Evaluations       Value    Infinity Norm
----------  ----  ------------  -----------  -----------------  -----------------  ---------  -------------
 0:00:00     2         0             2               0                  0          +8.08E+05    +3.19E+05  

Linear Estimates (Robust SEs in Parentheses):
Beta:    prices   
-----  -----------
        -3.05E+00 
       (+1.77E+00)

Nonlinear Estimates (Robust SEs in Parentheses):
Rho:  All Groups 
----  -----------
       +9.50E-01 
      (+5.84E-02)
*************************
Mushy vs. non-Mushy
*************************
Problem Results Summary:
Cumulative  GMM   Optimization   Objective   Total Fixed Point  Total Contraction  Objective    Gradient   
Total Time  Ste

Logit :class:`ProblemResults` can be to compute the same types of post-estimation outputs as :class:`ProblemResults` created by a full BLP problem.