cereal project

In [2]:
import pandas as pd
from linearmodels import OLS
import numpy as np

In [3]:
df_data = pd.read_csv("CerealData.csv", index_col=0)
df_data.head()

Unnamed: 0_level_0,Manufacturer,Brand,Price,MarketShare,Adult,Kids,Calories,Fat,Sugar
j,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,Kellogg's,Corn Flakes,1.81,5.67,0,0,100,0.0,2.0
2,General Mills,Cheerios,3.16,4.38,0,0,110,2.0,1.0
3,Kellogg's,Rice Krispies,2.96,4.04,0,0,120,0.0,3.0
4,Kellogg's,Frosted Flakes,2.52,3.82,0,0,120,0.0,13.0
5,Kellogg's,Raisin Bran,2.34,2.73,0,0,200,1.5,18.0


In [4]:
covariates = ['Price', 'Adult', 'Kids', 'Calories', 'Fat', 'Sugar']

s0 = 100 - df_data.MarketShare.sum()
df_data['log_ratio_market_shares'] = np.log(df_data.MarketShare / s0)
df_data.head()

Unnamed: 0_level_0,Manufacturer,Brand,Price,MarketShare,Adult,Kids,Calories,Fat,Sugar,log_ratio_market_shares
j,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Kellogg's,Corn Flakes,1.81,5.67,0,0,100,0.0,2.0,-1.452402
2,General Mills,Cheerios,3.16,4.38,0,0,110,2.0,1.0,-1.710543
3,Kellogg's,Rice Krispies,2.96,4.04,0,0,120,0.0,3.0,-1.791347
4,Kellogg's,Frosted Flakes,2.52,3.82,0,0,120,0.0,13.0,-1.847341
5,Kellogg's,Raisin Bran,2.34,2.73,0,0,200,1.5,18.0,-2.18329


In [5]:
# Run regression
formula_str = 'log_ratio_market_shares ~ 1 + Price + Adult + Kids + Calories + Fat + Sugar'# complete the formula
model = OLS.from_formula(formula_str ,df_data)

res = model.fit()
print(res.summary)
beta_price = res.params.Price

                               OLS Estimation Summary                              
Dep. Variable:     log_ratio_market_shares   R-squared:                      0.3486
Estimator:                             OLS   Adj. R-squared:                 0.2577
No. Observations:                       50   F-statistic:                    20.666
Date:                     Thu, Jan 27 2022   P-value (F-stat)                0.0021
Time:                             01:14:49   Distribution:                  chi2(6)
Cov. Estimator:                     robust                                         
                                                                                   
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -2.2186     0.6094    -3.6407     0.0003     -3.4130     -1.0242
Price       

The β parameter estimates are:
$$β_0=-2.2186$$
$$β_{Price}=-0.0804$$
$$β_{Adult}=-0.5748$$
$$β_{Kids}=-0.5037$$
$$β_{Calories}=0.0024$$
$$β_{Fat}=0.0113$$
$$β_{Sugar}=-0.0423$$

Two reasons this might not be true are, 
(1) our OLS assumption is that the residual is uncorrelated with the regressors, so it may be the case that our error term is correlated with at least one of our regressors(prices we observe are not randomly chosen, they are the prices that the firm chose because they're the prices that maximize their prices. In this case, price P is going to be a function of c or some subset of c and so c and P are going to be correlated and our OLS assumption is not going to hold.) and also, (2)prices reflect costs, where manufacturers/firms know more about demand which drives price.

$$\eta_{jj}=\beta_nX_{nj}(1-S_j)$$ 

In [7]:
# Creating vector of own-price demand elasticities
eta = (beta_price * df_data.Price * (1-df_data.MarketShare/100))
eta.mean()

-0.23646783421275863

it seems inelastic because it is less than one. 

In [8]:
# Creating the eV for each j, i.e. exp(V_j). Notice that exp(V_j) = sj/s0
df_data['eV'] = df_data.MarketShare / s0
df_data.head()

Unnamed: 0_level_0,Manufacturer,Brand,Price,MarketShare,Adult,Kids,Calories,Fat,Sugar,log_ratio_market_shares,eV
j,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Kellogg's,Corn Flakes,1.81,5.67,0,0,100,0.0,2.0,-1.452402,0.234007
2,General Mills,Cheerios,3.16,4.38,0,0,110,2.0,1.0,-1.710543,0.180768
3,Kellogg's,Rice Krispies,2.96,4.04,0,0,120,0.0,3.0,-1.791347,0.166735
4,Kellogg's,Frosted Flakes,2.52,3.82,0,0,120,0.0,13.0,-1.847341,0.157656
5,Kellogg's,Raisin Bran,2.34,2.73,0,0,200,1.5,18.0,-2.18329,0.11267


In [9]:
# create the eV for Kellogg's Raisin Bran
eV_KRB = df_data.eV.loc[5]

In [10]:
sj_NoKRB = 100 * df_data.eV / (1+ sum(df_data.eV) - eV_KRB) 
sj_NoKRB.sum()

77.89657653952914

In [11]:
df_data['ratio'] = sj_NoKRB / df_data.MarketShare
print(df_data.ratio.describe())

# Check that new market shares sum to 100, after including additional substitution to the outside option.
sum(sj_NoKRB)-sj_NoKRB[5] + np.mean(df_data.ratio) * s0 

count    5.000000e+01
mean     1.028066e+00
std      8.988764e-16
min      1.028066e+00
25%      1.028066e+00
50%      1.028066e+00
75%      1.028066e+00
max      1.028066e+00
Name: ratio, dtype: float64


99.99999999999997

minimum:1.028066e+00
maximum:1.028066e+00
mean: 1.028066e+00

In [12]:
# finding Post Raisin Bran
df_data[df_data.Manufacturer=='Post']

Unnamed: 0_level_0,Manufacturer,Brand,Price,MarketShare,Adult,Kids,Calories,Fat,Sugar,log_ratio_market_shares,eV,ratio
j,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
9,Post,Grape Nuts,2.14,2.12,1,0,200,1.0,7.0,-2.436175,0.087495,1.028066
16,Post,Raisin Bran,2.23,1.46,0,0,190,1.0,20.0,-2.809155,0.060256,1.028066
34,Post,Honey Bunches of Oats,2.85,0.95,1,0,125,2.2,6.0,-3.238885,0.039208,1.028066
35,Post,Great Grains,2.9,0.89,1,0,215,5.5,10.5,-3.304125,0.036731,1.028066
44,Post,Fruity Pebbles,3.32,0.83,0,1,110,1.0,12.0,-3.373921,0.034255,1.028066
49,Post,Honeycomb,3.4,0.74,0,1,110,0.0,11.0,-3.488697,0.030541,1.028066


ratio for Post Raisin Bran is 1.028066.

In [13]:
# finding Kellogg's Corn Pops
kellogg = df_data[df_data.Manufacturer == "Kellogg's"] 
corn_pops = kellogg[kellogg.Brand=='Corn Pops'] 
corn_pops

Unnamed: 0_level_0,Manufacturer,Brand,Price,MarketShare,Adult,Kids,Calories,Fat,Sugar,log_ratio_market_shares,eV,ratio
j,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
18,Kellogg's,Corn Pops,3.51,1.46,0,0,120,0.0,14.0,-2.809155,0.060256,1.028066


ratio for Kellogg's Corn Pops is 1.028066.

The ratio for Post Raisin Bran and Kellogg's Corn Pops are the same. This is not realistic. 

$$max_{Pj}\{S_j(P_j)(P_j-c_j)\}$$
$$\frac{2π_j}{2P_j}= S_j(P_j)+\frac{2S_j}{2P_j}(P_j-c_j)=0$$
$$\frac{\frac{2S_j}{2P_j}= exp(V_j)β_{price}[exp(V_0)+Σ_k exp(V_k)]-exp(V_j)exp(V_j)β_{price}}{[exp(V_0)+Σ_kexp(V_k)^2]}$$
$$=β_{price}S_j(1-S_j)$$
$$C_J=p_j+\frac{S_j}{β_{price}S_j(1-S_j)}$$

In [14]:
df_data.MarketShare = df_data.MarketShare/100
cj = (df_data.Price)+ df_data.MarketShare/ (beta_price*df_data.MarketShare*(1-df_data.MarketShare))

print(cj.describe())

(df_data.Price-cj).describe()


count    50.000000
mean     -9.646590
std       0.548273
min     -11.376702
25%      -9.971412
50%      -9.569460
75%      -9.225750
max      -8.621183
dtype: float64


count    50.000000
mean     12.631790
std       0.134683
min      12.522920
25%      12.550717
50%      12.583729
75%      12.634215
max      13.186702
dtype: float64

You get negative marginal costs because Demand is inelastic at every quantity where marginal revenue is negative. We already know that our demand elasticity seems inelastic.

$$Max_{P_j}^{π}=Σ_{K=1}^{MarketTotal}S_kP_k(P_k-C_k)$$

each manufacturer must consider the market share of each cereal, the price it costs to produce each brand, and the appropriate price to charge.

Prices will be lower when they set the price for their brands separately compared to when they set them jointly . We can tell this just by looking at their formulas.