Suppose we have $i=1,\ldots,n$ consumers who each select exactly one product $j$ from a set of $J$ products. The outcome variable is the identity of the product chosen $y_i \in \{1, \ldots, J\}$ or equivalently a vector of $J-1$ zeros and $1$ one, where the $1$ indicates the selected product. For example, if the third product was chosen out of 4 products, then either $y=3$ or $y=(0,0,1,0)$ depending on how we want to represent it. Suppose also that we have a vector of data on each product $x_j$ (eg, size, price, etc.). 

We model the consumer's decision as the selection of the product that provides the most utility, and we'll specify the utility function as a linear function of the product characteristics:

$$ U_{ij} = x_j'\beta + \epsilon_{ij} $$

where $\epsilon_{ij}$ is an i.i.d. extreme value error term. 

The choice of the i.i.d. extreme value error term leads to a closed-form expression for the probability that consumer $i$ chooses product $j$:

$$ \mathbb{P}_i(j) = \frac{e^{x_j'\beta}}{\sum_{k=1}^Je^{x_k'\beta}} $$

For example, if there are 4 products, the probability that consumer $i$ chooses product 3 is:

$$ \mathbb{P}_i(3) = \frac{e^{x_3'\beta}}{e^{x_1'\beta} + e^{x_2'\beta} + e^{x_3'\beta} + e^{x_4'\beta}} $$

A clever way to write the individual likelihood function for consumer $i$ is the product of the $J$ probabilities, each raised to the power of an indicator variable ($\delta_{ij}$) that indicates the chosen product:

$$ L_i(\beta) = \prod_{j=1}^J \mathbb{P}_i(j)^{\delta_{ij}} = \mathbb{P}_i(1)^{\delta_{i1}} \times \ldots \times \mathbb{P}_i(J)^{\delta_{iJ}}$$

Notice that if the consumer selected product $j=3$, then $\delta_{i3}=1$ while $\delta_{i1}=\delta_{i2}=\delta_{i4}=0$ and the likelihood is:

$$ L_i(\beta) = \mathbb{P}_i(1)^0 \times \mathbb{P}_i(2)^0 \times \mathbb{P}_i(3)^1 \times \mathbb{P}_i(4)^0 = \mathbb{P}_i(3) = \frac{e^{x_3'\beta}}{\sum_{k=1}^Je^{x_k'\beta}} $$

The joint likelihood (across all consumers) is the product of the $n$ individual likelihoods:

$$ L_n(\beta) = \prod_{i=1}^n L_i(\beta) = \prod_{i=1}^n \prod_{j=1}^J \mathbb{P}_i(j)^{\delta_{ij}} $$

And the joint log-likelihood function is:

$$ \ell_n(\beta) = \sum_{i=1}^n \sum_{j=1}^J \delta_{ij} \log(\mathbb{P}_i(j)) $$


We will use the `yogurt_data` dataset, which provides anonymized consumer identifiers (`id`), a vector indicating the chosen product (`y1`:`y4`), a vector indicating if any products were "featured" in the store as a form of advertising (`f1`:`f4`), and the products' prices (`p1`:`p4`). For example, consumer 1 purchased yogurt 4 at a price of 0.079/oz and none of the yogurts were featured/advertised at the time of consumer 1's purchase.  Consumers 2 through 7 each bought yogurt 2, etc.

_todo: import the data, maybe show the first few rows, and describe the data a bit._

In [1]:
import pandas as pd

yogurt = pd.read_csv('yogurt_data.csv')

yogurt.head()

Unnamed: 0,id,y1,y2,y3,y4,f1,f2,f3,f4,p1,p2,p3,p4
0,1,0,0,0,1,0,0,0,0,0.108,0.081,0.061,0.079
1,2,0,1,0,0,0,0,0,0,0.108,0.098,0.064,0.075
2,3,0,1,0,0,0,0,0,0,0.108,0.098,0.061,0.086
3,4,0,1,0,0,0,0,0,0,0.108,0.098,0.061,0.086
4,5,0,1,0,0,0,0,0,0,0.125,0.098,0.049,0.079


In [2]:
yogurt.shape

(2430, 13)

In [3]:
yogurt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2430 entries, 0 to 2429
Data columns (total 13 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   id      2430 non-null   int64  
 1   y1      2430 non-null   int64  
 2   y2      2430 non-null   int64  
 3   y3      2430 non-null   int64  
 4   y4      2430 non-null   int64  
 5   f1      2430 non-null   int64  
 6   f2      2430 non-null   int64  
 7   f3      2430 non-null   int64  
 8   f4      2430 non-null   int64  
 9   p1      2430 non-null   float64
 10  p2      2430 non-null   float64
 11  p3      2430 non-null   float64
 12  p4      2430 non-null   float64
dtypes: float64(4), int64(9)
memory usage: 246.9 KB


In [4]:
yogurt.describe()

Unnamed: 0,id,y1,y2,y3,y4,f1,f2,f3,f4,p1,p2,p3,p4
count,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0,2430.0
mean,1215.5,0.341975,0.401235,0.029218,0.227572,0.055556,0.039506,0.037449,0.037449,0.106248,0.081532,0.053622,0.079507
std,701.6249,0.474469,0.490249,0.168452,0.419351,0.229109,0.194836,0.189897,0.189897,0.020587,0.011047,0.008054,0.007714
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.012,0.0,0.025,0.004
25%,608.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.103,0.081,0.05,0.079
50%,1215.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.108,0.086,0.054,0.079
75%,1822.75,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115,0.086,0.061,0.086
max,2430.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.193,0.111,0.086,0.104


After conducting some research on the dataset, it is shown that the rate at which products y1 through y4 are selected are between 2% and 40%, with 2% being extremely low compared to the other products (23%, 34%). Each product also has a large standard deviation of being selected, likely because the product is either selected (1) or not selected (0). Also, the averages of each product being selected sum to 1, meaning there is no missing data for the y columns in the dataset.

The products were featured approximately same proportion of time (between 3.7%-5.6%), with product 1 being featured more than others at 5.6%, but it was not the most selected product. That product is y2. The summed average of featured products does not sum to 1, meaning a product wasn't always featured. The standard deviation are also very large, likely for the same reason as mentioned above (binary). They are slightly higher SD's proportionally to the averages, which is likely because there are much more 0's than 1's in columns due to the fact that there doesn't have to be a product featured. 

Average prices per ounce range from approximately $0.11 to $0.05. These standard deviations are not nearly as proportionally high as the other columns, likely because the prices are continuous and not binary. Price seems to potentially follow a normal distribution.

Let the vector of product features include brand dummy variables for yogurts 1-3 (we'll omit a dummy for product 4 to avoid multi-collinearity), a dummy variable to indicate if a yogurt was featured, and a continuous variable for the yogurts' prices:  

$$ x_j' = [\mathbf{1}_{\text{Yogurt 1}}, \mathbf{1}_{\text{Yogurt 2}}, \mathbf{1}_{\text{Yogurt 3}}, X_f, X_p] $$


The "hard part" of the MNL likelihood function is organizing the data, as we need to keep track of 3 dimensions (consumer $i$, covariate $k$, and product $j$) instead of the typical 2 dimensions for cross-sectional regression models (consumer $i$ and covariate $k$). 

What we would like to do is reorganize the data from a "wide" shape with $n$ rows and multiple columns for each covariate, to a "long" shape with $n \times J$ rows and a single column for each covariate.  As part of this re-organization, we'll add binary variables to indicate the first 3 products; the variables for featured and price are included in the dataset and simply need to be "pivoted" or "melted" from wide to long.  

_todo: reshape and prep the data_

In [5]:
# Melting and restructuring the data in one step
long_data = yogurt.melt(id_vars='id', 
                             value_vars=['y1', 'y2', 'y3', 'y4', 'f1', 'f2', 'f3', 'f4', 'p1', 'p2', 'p3', 'p4'],
                             var_name='variable', 
                             value_name='value')

# Adding separate columns for type and product number directly
long_data['type'] = long_data['variable'].str[0]
long_data['product_num'] = long_data['variable'].str[1].astype(int)

# Pivoting to have separate columns for each data type (purchase, featured, price)
long_data = long_data.pivot_table(index=['id', 'product_num'], columns='type', values='value', aggfunc='first').reset_index()

# Renaming the columns for clarity
long_data.columns = ['id', 'product_num', 'featured', 'price', 'purchase']

# Creating binary indicators for the first three products
long_data['yogurt_1'] = (long_data['product_num'] == 1).astype(int)
long_data['yogurt_2'] = (long_data['product_num'] == 2).astype(int)
long_data['yogurt_3'] = (long_data['product_num'] == 3).astype(int)


In [6]:
long_data.head()

Unnamed: 0,id,product_num,featured,price,purchase,yogurt_1,yogurt_2,yogurt_3
0,1,1,0.0,0.108,0.0,1,0,0
1,1,2,0.0,0.081,0.0,0,1,0
2,1,3,0.0,0.061,0.0,0,0,1
3,1,4,0.0,0.079,1.0,0,0,0
4,2,1,0.0,0.108,0.0,1,0,0


_todo: Code up the log-likelihood function._

In [102]:
from scipy.optimize import minimize
from sklearn.preprocessing import StandardScaler
import numpy as np

# Define the logistic regression likelihood function
def logistic_neg_log_likelihood(beta, X, y):
 
    # Linear combination: X * beta
    z = np.dot(X, beta)
    # Logistic function application
    probability = 1 / (1 + np.exp(-z))
    # Log-likelihood
    log_likelihood = np.sum(y * np.log(probability) + (1 - y) * np.log(1 - probability))
    # Return negative log-likelihood
    return -log_likelihood

_todo: Use `optim()` in R or `optimize()` in Python to find the MLEs for the 5 parameters ($\beta_1, \beta_2, \beta_3, \beta_f, \beta_p$).  (Hint: you should find 2 positive and 1 negative product intercepts, a small positive coefficient estimate for featured, and a large negative coefficient estimate for price.)_

In [103]:
# Standardize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(long_data[['featured', 'price', 'yogurt_1', 'yogurt_2', 'yogurt_3']])

# Update the feature matrix X to include the scaled features and an intercept
X_logistic = np.hstack((np.ones((long_data.shape[0], 1)), features_scaled))  
y_logistic = long_data['purchase'].astype(int).values

# Initial beta estimates
initial_beta_logistic = np.zeros(X_logistic.shape[1])

# Running the optimization for logistic regression
result_logistic = minimize(logistic_neg_log_likelihood, initial_beta_logistic, args=(X_logistic, y_logistic), method='BFGS')
result_logistic

  message: Desired error not necessarily achieved due to precision loss.
  success: False
   status: 2
      fun: 4645.881282128891
        x: [-1.468e+00  9.508e-02 -7.251e-01  6.140e-01  3.903e-01
            -1.360e+00]
      nit: 19
      jac: [ 6.104e-05  6.104e-05  0.000e+00  0.000e+00  0.000e+00
             6.104e-05]
 hess_inv: [[ 2.432e-05 -1.092e-05 ...  7.449e-06 -3.793e-06]
            [-1.092e-05  5.426e-04 ... -2.932e-04  1.295e-05]
            ...
            [ 7.449e-06 -2.932e-04 ...  7.850e-04 -5.431e-04]
            [-3.793e-06  1.295e-05 ... -5.431e-04  1.323e-03]]
     nfev: 196
     njev: 28

In [104]:
coefficients = result_logistic.x

coefficients

array([-1.46768525,  0.09508497, -0.72514485,  0.61395611,  0.39025875,
       -1.35989235])

_todo: interpret the 3 product intercepts (which yogurt is most preferred?)._

The most preferred yogurt is Yogurt #1, which has a coefficient or beta of 0.614. The next highest is Yogurt #2, which has a coefficient of 0.390. Yogurt #4 which is not featured is the base case, so it has a coefficient of 0. The least preferred yogurt is Yogurt #3, which has a coefficient of -1.360.

_todo: use the estimated price coefficient as a dollar-per-util conversion factor. Use this conversion factor to calculate the dollar benefit between the most-preferred yogurt (the one with the highest intercept) and the least preferred yogurt (the one with the lowest intercept). This is a per-unit monetary measure of brand value._

In [136]:
dollar_benefit = (coefficients[3] - coefficients[5]) /abs(coefficients[2])

dollar_benefit

2.722005784519816

The dollar benefit between the most-preferred yogurt and the least preferred yogurt is $1.43. This means that the most preferred yogurt is worth $1.43 more than the least preferred yogurt.

One benefit of the MNL model is that we can simulate counterfactuals (eg, what if the price of yogurt 1 was $0.10/oz instead of $0.08/oz).

_todo: calculate the market shares in the market at the time the data were collected.  Then, increase the price of yogurt 1 by $0.10 and use your fitted model to predict p(y|x) for each consumer and each product (this should be a matrix of $N \times 4$ estimated choice probabilities.  Take the column averages to get the new, expected market shares that result from the $0.10 price increase to yogurt 1.  Do the yogurt 1 market shares decrease?_

In [106]:
y1 = yogurt['y1'].sum()
y2 = yogurt['y2'].sum()
y3 = yogurt['y3'].sum()
y4 = yogurt['y4'].sum()

y1_share = y1 / (y1 + y2 + y3 + y4)
y2_share = y2 / (y1 + y2 + y3 + y4)
y3_share = y3 / (y1 + y2 + y3 + y4)
y4_share = y4 / (y1 + y2 + y3 + y4)

y1_share, y2_share, y3_share, y4_share

(0.3419753086419753,
 0.4012345679012346,
 0.029218106995884775,
 0.22757201646090536)

In [107]:
# consolidate columns y1 to y4 into a single column

yogurt['yogurt_type'] = yogurt[['y1', 'y2', 'y3', 'y4']].idxmax(axis=1).str[1].astype(int)


Unnamed: 0,id,y1,y2,y3,y4,f1,f2,f3,f4,p1,p2,p3,p4,yogurt_type
0,1,0,0,0,1,0,0,0,0,0.108,0.081,0.061,0.079,4
1,2,0,1,0,0,0,0,0,0,0.108,0.098,0.064,0.075,2
2,3,0,1,0,0,0,0,0,0,0.108,0.098,0.061,0.086,2
3,4,0,1,0,0,0,0,0,0,0.108,0.098,0.061,0.086,2
4,5,0,1,0,0,0,0,0,0,0.125,0.098,0.049,0.079,2


In [108]:
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd

# Assuming 'long_data' is your DataFrame and it's already loaded
# The 'purchase' column should be categorical with each category representing a different type of yogurt

# Prepare the features and target
X = yogurt[['f1', 'f2', 'f3', 'f4', 'p1', 'p2', 'p3', 'p4']].values
y = yogurt['yogurt_type'].astype(int).values  # Ensure y is an integer type representing different categories


# Fit the multinomial logistic regression model
model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)
model.fit(X, y)

#  Predict probabilities
probabilities = model.predict_proba(X)

# Calculate original market shares
market_shares_original = probabilities.mean(axis=0)

market_shares_original

array([0.34197574, 0.40123432, 0.02921825, 0.22757169])

In [109]:
yogurt['p1_new'] = yogurt['p1'] + 0.10

X_new = yogurt[['f1', 'f2', 'f3', 'f4', 'p1_new', 'p2', 'p3', 'p4']].values

# Predict new probabilities with the adjusted prices
probabilities_new = model.predict_proba(X_new)

# Calculate new market shares
market_shares_new = probabilities_new.mean(axis=0)

# Output the results
print("Original Market Shares:", market_shares_original)
print("New Market Shares After Price Increase:", market_shares_new)


Original Market Shares: [0.34197574 0.40123432 0.02921825 0.22757169]
New Market Shares After Price Increase: [0.19483994 0.50127048 0.02768742 0.27620216]


Market shares do decrease for yogurt 1 by nearly 15% when price is increased by $0.10.

_todo: describe the data a bit. How many respondents took the conjoint survey?  How many choice tasks did each respondent complete?  How many alternatives were presented on each choice task? For each alternative._


In [126]:
conjoint = pd.read_csv('conjoint.csv')

conjoint

Unnamed: 0,resp.id,ques,alt,carpool,seat,cargo,eng,price,choice
0,1,1,1,yes,6,2ft,gas,35,0
1,1,1,2,yes,8,3ft,hyb,30,0
2,1,1,3,yes,6,3ft,gas,30,1
3,1,2,1,yes,6,2ft,gas,30,0
4,1,2,2,yes,7,3ft,gas,35,1
...,...,...,...,...,...,...,...,...,...
8995,200,14,2,no,7,3ft,gas,35,1
8996,200,14,3,no,7,3ft,hyb,35,0
8997,200,15,1,no,7,2ft,gas,35,0
8998,200,15,2,no,8,3ft,elec,40,0


In [111]:
# number of unique respondents

conjoint['resp.id'].nunique()

200

In [112]:
# number of choice tasks

conjoint[conjoint['resp.id'] == 1]['ques'].nunique()

15

In [113]:
# number of alternatives

conjoint['alt'].nunique()

3

The data within the conjoint dataset is in long format, meaning it is organized in a way that each row represents a single observation. The data is organized by respondent, choice task, and alternative. There are 200 respondents who took the conjoint survey, and each respondent completed 15 choice tasks. Each choice task presented 3 alternatives to the respondent. Each alternative is represented by a set of attributes, such as brand, price, and feature.

_todo: estimate a MNL model omitting the following levels to avoide multicollinearity (6 seats, 2ft cargo, and gas engine). Include price as a continuous variable. Show a table of coefficients and standard errors.  You may use your own likelihood function from above, or you may use a function from a package/library to perform the estimation._

In [127]:
conjoint = pd.get_dummies(conjoint, columns=['seat', 'cargo', 'eng'], drop_first=False)

In [129]:
conjoint.drop(['seat_6', 'cargo_2ft', 'eng_gas'], axis=1, inplace=True)

In [130]:
X = conjoint[['price', 'seat_7', 'seat_8', 'cargo_3ft', 'eng_hyb', 'eng_elec']]  # predictors
y = conjoint['choice']  # response variable


# Fit the multinomial logistic regression model
model = LogisticRegression(multi_class='multinomial', solver='lbfgs', max_iter=1000)
model.fit(X, y)

model.coef_


array([[-0.07952075, -0.26151586, -0.14596134,  0.21885267, -0.37897658,
        -0.71594544]])

_todo: Interpret the coefficients. Which features are more preferred?_

In terms of price, a cheaper price is more preferred as it has a negative coefficient. 

For seats, both 7 and 8 seats have negative coefficients, meaning that seat 6 is the most preferred. 

For cargo space, 3 ft is a positive coefficient, meaning its the most preferred.

Finally, for engine type, both electric and hybrid have negative coefficients, meaning that gas is the most preferred.

_todo: Use the price coefficient as a dollar-per-util conversion factor. What is the dollar value of 3ft of cargo space as compared to 2ft of cargo space?_

In [143]:
cargo_3ft = model.coef_[0][3]
price = model.coef_[0][0]

dollar_value = (cargo_3ft / abs(price)) * 1000

dollar_value.round(2)

2752.15

_todo: assume the market consists of the following 6 minivans. Predict the market shares of each minivan in the market._

In [149]:
data = {
    "minivan": ['A', 'B', 'C', 'D', 'E', 'F'],
    "seat": [7, 6, 8, 7, 6, 7],
    "cargo": [2, 2, 2, 3, 2, 2],
    "eng": ['hyb', 'gas', 'gas', 'gas', 'elec', 'hyb'],
    "price": [30, 30, 30, 40, 40, 35]
}

minivans = pd.DataFrame(data)


minivans


Unnamed: 0,minivan,seat,cargo,eng,price
0,A,7,2,hyb,30
1,B,6,2,gas,30
2,C,8,2,gas,30
3,D,7,3,gas,40
4,E,6,2,elec,40
5,F,7,2,hyb,35


In [150]:
minivans = pd.get_dummies(minivans, columns=['seat', 'cargo', 'eng'], drop_first=False)

In [151]:
minivans.drop(['seat_6', 'cargo_2', 'eng_gas'], axis=1, inplace=True)

In [153]:
# rename cargo_3 to cargo_3ft

minivans.rename(columns={'cargo_3': 'cargo_3ft'}, inplace=True)

In [157]:
x_minivan = minivans[['price', 'seat_7', 'seat_8', 'cargo_3ft', 'eng_hyb', 'eng_elec']]

predictions = model.predict_proba(x_minivan)[:,1]

predictions

array([0.37160976, 0.68041243, 0.61390225, 0.28494289, 0.09392392,
       0.21073102])

In [158]:
# add predictions to the minivans dataframe

minivans['predictions'] = predictions

In [160]:
# calculate market share for each

minivans['market_share'] = minivans['predictions']/minivans['predictions'].sum()

In [163]:
minivans

Unnamed: 0,minivan,price,seat_7,seat_8,cargo_3ft,eng_elec,eng_hyb,predictions,market_share
0,A,30,True,False,False,False,True,0.37161,0.164756
1,B,30,False,False,False,False,False,0.680412,0.301665
2,C,30,False,True,False,False,False,0.613902,0.272177
3,D,40,True,False,True,False,False,0.284943,0.126331
4,E,40,False,False,False,True,False,0.093924,0.041642
5,F,35,True,False,False,False,True,0.210731,0.093429
