# Probability and Utility
This notebook computes utilities and choice probabilities of four competing ketchup brands. We will also make choice predictions and compute marginal effects.

## Data Exploration

In [1]:
DF <- read.csv("ketchup_data.csv")
head(DF)

shopper_id,trip_id,choice,price.heinz,price.hunts,price.delmonte,price.stb,southeast
1,1,heinz,1.19,1.39,1.49,0.89,1
1,2,heinz,0.99,1.36,1.39,0.95,1
1,3,hunts,1.46,1.43,1.49,0.99,1
1,4,hunts,1.46,1.43,1.45,0.99,1
1,5,stb,1.46,1.36,1.39,0.95,1
2,1,heinz,0.99,1.36,1.47,0.95,0


In [2]:
# compute market shares
table(DF$choice) / nrow(DF)


  delmonte      heinz      hunts        stb 
0.05165456 0.50968523 0.20560936 0.23305085 

Note that the data is in wide form and must be transformed to long form. The index columns of the alternate-specific variable 'price' is 4:7.

In [6]:
# transform wide data to long form
library(mlogit)
DF_long <- mlogit.data( DF, shape = "wide", choice = "choice", sep = ".", varying = c(4:7) )

## Modeling and Comparison

In [7]:
# model with intercepts only
logit1 <- mlogit(choice ~ 0 | 1 | 0, data = DF_long)
summary(logit1)


Call:
mlogit(formula = choice ~ 0 | 1 | 0, data = DF_long, method = "nr", 
    print.level = 0)

Frequencies of alternatives:
delmonte    heinz    hunts      stb 
0.051655 0.509685 0.205609 0.233051 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 4.02E-08 
gradient close to zero 

Coefficients :
                  Estimate Std. Error z-value  Pr(>|z|)    
heinz:(intercept) 2.289215   0.065591  34.901 < 2.2e-16 ***
hunts:(intercept) 1.381400   0.069911  19.759 < 2.2e-16 ***
stb:(intercept)   1.506678   0.069080  21.811 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -5755.1
McFadden R^2:  0 
Likelihood ratio test : chisq = 0 (p.value = 1)

Delmonte is the baseline alternative.

The utility and log-odds of choosing heinz over delmonte is 2.29.

The utility and log-odds of choosing hunts over delmonte is 1.38.

The utility and log-odds of choosing stb over delmonte is 1.51.

In [8]:
# model with intercepts and price
logit2 <- mlogit(choice ~ price | 1 | 0, data = DF_long)
summary(logit2)


Call:
mlogit(formula = choice ~ price | 1 | 0, data = DF_long, method = "nr", 
    print.level = 0)

Frequencies of alternatives:
delmonte    heinz    hunts      stb 
0.051655 0.509685 0.205609 0.233051 

nr method
5 iterations, 0h:0m:0s 
g'(-H)^-1g = 7.23E-07 
gradient close to zero 

Coefficients :
                   Estimate Std. Error  z-value  Pr(>|z|)    
heinz:(intercept)  1.743609   0.069922  24.9365 < 2.2e-16 ***
hunts:(intercept)  1.070434   0.074099  14.4459 < 2.2e-16 ***
stb:(intercept)   -0.520243   0.086088  -6.0431 1.511e-09 ***
price             -4.321122   0.112277 -38.4861 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -4896.7
McFadden R^2:  0.14914 
Likelihood ratio test : chisq = 1716.7 (p.value = < 2.22e-16)

A one dollar increase in price decreases alternative utility and log-odds by 4.32 relative to any other product.

In [9]:
# model with intercepts, price, and southeast
logit3 <- mlogit(choice ~ price | southeast + 1 | 0, data = DF_long)
summary(logit3)


Call:
mlogit(formula = choice ~ price | southeast + 1 | 0, data = DF_long, 
    method = "nr", print.level = 0)

Frequencies of alternatives:
delmonte    heinz    hunts      stb 
0.051655 0.509685 0.205609 0.233051 

nr method
7 iterations, 0h:0m:0s 
g'(-H)^-1g = 1.05E-06 
successive function values within tolerance limits 

Coefficients :
                   Estimate Std. Error  z-value  Pr(>|z|)    
heinz:(intercept)  1.732663   0.072985  23.7399 < 2.2e-16 ***
hunts:(intercept) -0.037445   0.090357  -0.4144    0.6786    
stb:(intercept)   -0.799171   0.092545  -8.6355 < 2.2e-16 ***
price             -4.583948   0.124274 -36.8859 < 2.2e-16 ***
heinz:southeast   -0.162705   0.309039  -0.5265    0.5986    
hunts:southeast    3.776106   0.299897  12.5914 < 2.2e-16 ***
stb:southeast      1.529874   0.299641   5.1057 3.296e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-Likelihood: -4206.1
McFadden R^2:  0.26915 
Likelihood ratio test : chisq = 3098 (p.value

Consumers in southeastern states receive 3.78 more utility for choosing hunts over delmonte. Additionally, southeastern consumer's log-odds of purchasing heinz is 3.78 more than consumers in other states.

Similar interpretation for heinz and stb.

This is the best fit model, as it has the least negative log-likelihood and highest McFadden R^2.

## Predicting Choice Probabilities

In [11]:
# create a matrix of fitted probabilities
P1 <- fitted(logit3, outcome = FALSE)
round(colMeans(P1), 4)

The average sample choice probability predictions are the same as the observed choice probabilities.

In [14]:
# compute marginal effect of changing hunt's price on hunt's choice probability
me_hunt_PriceHunt <- mean(P1[,3] * (1 - P1[,3]) * logit3$coefficients['price'])
round(me_hunt_PriceHunt * 100, 2)

A one dollar increase in hunt's price decreases probability of purchase by 43.08%.

In [15]:
# compute marginal effect of changing heinz's price on hunt's choice probability
me_hunt_PriceHeinz <- mean(-logit3$coefficients['price'] * P1[,2] * P1[,3])
round(me_hunt_PriceHeinz * 100, 2)

A one dollar increase in heinz's price increases probability of purchasing hunt by 20.47%.

In [17]:
# compute effect on hunt's choice probability by increasing hunt's price by $0.25
round(me_hunt_PriceHunt * 0.25 * 100, 2)

Increasing hunt's price by $0.25 decreases choice probability by 10.77%.

In [18]:
# compute effect on hunt's choice probability by increasing heinz's price by $0.25
round(me_hunt_PriceHeinz * 0.25 * 100, 2)

Increasing heinz's price by $0.25 increases hunt's choice probability by 5.12%.

In [19]:
# compute effect on hunt's choice probability by increasing hunt's price by 1%
round(mean(DF$price.hunts) * me_hunt_PriceHunt * 0.01 * 100, 2)

Increasing hunt's price by 1% decreases choice probability by 0.58%.