<a href="https://colab.research.google.com/github/floriandendorfer/demand-estimation/blob/main/code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Packages.**

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm



---



**Data.**

In [2]:
!git clone https://github.com/floriandendorfer/demand-estimation.git
data = pd.read_csv('demand-estimation/data.csv',index_col=0)

Cloning into 'demand-estimation'...
remote: Enumerating objects: 29, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (27/27), done.[K
remote: Total 29 (delta 5), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (29/29), 31.25 KiB | 2.60 MiB/s, done.
Resolving deltas: 100% (5/5), done.




---



**Variables.**

In [3]:
print(data.columns)

Index(['county', 'Häagen-Dazs', 'price', 'fuel cost', 'sales', 'county size',
       'top 10% income'],
      dtype='object')


Each row contains sales information for an ice cream brand (i.e., Häagen-Dazs, Ben \& Jerry's) and a geographic market (i.e., a county).

*   `county` is the market identifier ($j$ in the slides).
*   `Häagen-Dazs` is 1 if the ice cream brand is Häagen-Dazs and 0 if the ice cream brand is Ben \& Jerry's.
*   `price` is the dollar price an ice cream serving of that brand is sold at.
*   `gas price` is the dollar price of a liter of gas.
*   `sales` is the number of ice cream servings of that brand sold per week.
*   `county size` is the number of *total* ice cream servings sold per week.
*   `top 10%` is 1 if the county is among the 10\% highest-income counties and 0 otherwise.


---



**Data description.**

In [4]:
data.describe()

Unnamed: 0,county,Häagen-Dazs,price,fuel cost,sales,county size,top 10% income
count,400.0,400.0,400.0,400.0,400.0,400.0,400.0
mean,100.5,0.5,2.441204,0.300149,17.19,198.31,0.095
std,57.806609,0.500626,0.385204,0.173953,11.592881,60.05406,0.293582
min,1.0,0.0,1.653232,-0.21432,2.0,100.0,0.0
25%,50.75,0.0,2.187071,0.179127,8.0,141.75,0.0
50%,100.5,0.5,2.382321,0.298173,14.0,200.5,0.0
75%,150.25,1.0,2.59344,0.428955,24.0,249.75,0.0
max,200.0,1.0,3.768709,0.796064,58.0,300.0,1.0



1.   How many counties are there?
2.   What is the average unit price?
2.   What is the average market size in the sample? What is the largest market size?
3.   What is the median number of ice cream servings sold in a county?


---





**Market shares**

In [5]:
data['s'] = data['sales']/data['county size']

Calculate the market share of each firm (Häagen-Dazs, Ben & Jerry's) in each county based on the number of ice cream units sold.


---



**Market share and price comparison**

In [14]:
data.groupby('Häagen-Dazs')[['price','s']].describe().loc[:, (slice(None), ['count', 'mean', 'std'])]

Unnamed: 0_level_0,price,price,price,s,s,s
Unnamed: 0_level_1,count,mean,std,count,mean,std
Häagen-Dazs,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
0,200.0,2.525192,0.380305,200.0,0.126976,0.040995
1,200.0,2.357216,0.372424,200.0,0.046823,0.019804


Compare Hägen-Dazs and Ben & Jerry's in terms of their market shares and prices across counties.

1.   In how many counties is Häagen-Dazs ice cream sold? Ben \& Jerry's?
2.   Which ice cream brand is more expensive? Which has the larger market share?
3.   Which ice cream brand do you think consumers prefer?
4.   For a given brand, do market shares vary across counties? If so, why do you think that is?


---



**Market concentration**

Hägen-Dazs, Ben \& Jerry's are the only products in the market for 'super-premium' ice cream. Let's calculate the HHI.


In [19]:
10000*(data[data['Häagen-Dazs'] == 1]['sales'].sum()**2 + data[data['Häagen-Dazs'] == 0]['sales'].sum()**2)/(data[data['Häagen-Dazs'] == 1]['sales'].sum() + data[data['Häagen-Dazs'] == 0]['sales'].sum())**2

6048.01552372434


Is the market highly concentrated, moderately concentrated or unconcentrated?


---



**'Outside good' market share**

In [None]:
data['s0'] = 1 - data.groupby(['county'])['s'].transform('sum')

Define the 'outside good' market share for each county. Here the 'outside good' is any ice cream sold other than Ben & Jerry's or Hägen-Dazs.

Transform the market shares to back out the **mean utilities**. The transformed market share is going to be the **dependent** variable in the OLS regression we run next.

In [None]:
Y = np.log(data['s']) - np.log(data['s0'])

In the OLS regression, the **independent** variables are going to be the price and the Häagen-Dazs dummy variable, plus a constant.

In [None]:
X=sm.add_constant(data[['price','Häagen-Dazs']])

In [None]:
ols = sm.OLS(Y,X)
ols_result = ols.fit(cov_type='HC3')
ols_result.summary()