In [62]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
%matplotlib inline

We first read the data.

In [108]:
data = pd.read_csv('../data/io_assignment1_data.txt')

We then create the market-share for the outside good $j=0$, and our dependent variable $log(s_{jt})-log(s_{0t})$ to perform the regression.

In [109]:
data['ms0'] = 1 - data.groupby('t')['ms'].transform(sum)

In [110]:
data['log_diff'] = np.log(data['ms'])-np.log(data['ms0'])

Since we allow for systematic differences in the firm's quality, which we do not observe, what we have is a model with fixed-effects. Therefore, we get the LSDV estimator adding dummy variables for each firm $j$.

In [111]:
data[['j_1','j_2']] = pd.get_dummies(data, columns=['j'])[['j_1','j_2']]

In [112]:
data.head()

Unnamed: 0,t,r,j,ms,price,channels,channels_spec,ms0,log_diff,j_1,j_2
0,1,1,1,0.226,39,17,3,0.524,-0.840957,1,0
1,1,1,2,0.25,35,43,2,0.524,-0.740031,0,1
2,2,1,1,0.163,37,21,2,0.656,-1.392411,1,0
3,2,1,2,0.181,33,67,2,0.656,-1.287664,0,1
4,3,1,1,0.221,39,50,5,0.521,-0.857587,1,0


With the data ready, we run a OLS of the variable 'log_diff' on 'price', 'channels', 'channels_spec', and the dummies, without worrying about possible endogeneity of the variable price.

In [113]:
end_model = sm.OLS(data['log_diff'],data[['price','channels','channels_spec','j_1','j_2']])

In [114]:
end_results = end_model.fit()

In [207]:
print(end_results.summary(title = 'Logit model without intruments'))

                        Logit model without intruments                        
Dep. Variable:               log_diff   R-squared:                       0.395
Model:                            OLS   Adj. R-squared:                  0.389
Method:                 Least Squares   F-statistic:                     64.47
Date:                Wed, 02 Feb 2022   Prob (F-statistic):           6.24e-42
Time:                        11:44:14   Log-Likelihood:                -341.22
No. Observations:                 400   AIC:                             692.4
Df Residuals:                     395   BIC:                             712.4
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
price            -0.0388      0.004     -9.244

As we would expect, the coefficient of 'price' is negative.

If the standard errors are clustered at the firm level, we get

In [205]:
end_results_rob = end_results.get_robustcov_results(cov_type = 'cluster', groups = data['j'])

In [206]:
print(end_results_rob.summary(title = 'Logit model without intruments'))

                        Logit model without intruments                        
Dep. Variable:               log_diff   R-squared:                       0.395
Model:                            OLS   Adj. R-squared:                  0.389
Method:                 Least Squares   F-statistic:                       nan
Date:                Wed, 02 Feb 2022   Prob (F-statistic):                nan
Time:                        11:41:12   Log-Likelihood:                -341.22
No. Observations:                 400   AIC:                             692.4
Df Residuals:                     395   BIC:                             712.4
Df Model:                           4                                         
Covariance Type:              cluster                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
price            -0.0388      0.003    -11.991