# Multinomial Logit Model
**Author:** Mohammed Alshaghathirah  
**Date:** May 28, 2025


## Introduction

Understanding how consumers make choices between competing products is a central question in marketing analytics. One widely used approach to modeling such decisions is **conjoint analysis**, which helps us infer preferences based on observed choices among different product configurations. In this assignment, we analyze simulated conjoint data using the **Multinomial Logit Model (MNL)** — a powerful tool for modeling discrete choices.

The MNL model assumes that each consumer chooses the product that provides the highest utility, where utility is a function of observable product attributes (e.g., brand, advertising, price) and a random error term. This framework enables us to estimate how much each attribute contributes to consumer preferences.

Our analysis proceeds in two main parts:

1. **Maximum Likelihood Estimation (MLE)** — We estimate the model by finding the parameters that maximize the likelihood of observing the choices in our data.
2. **Bayesian Estimation using Metropolis-Hastings MCMC** — We apply a Bayesian approach to estimate the posterior distribution of the parameters, allowing for uncertainty quantification and comparison with the MLE results.

The goal is not only to estimate consumer preferences but also to interpret the results in a meaningful, business-relevant context.


## 1. Likelihood for the Multinomial Logit (MNL) Model

We assume each consumer selects one product from a set of alternatives. The utility that consumer \( i \) derives from product \( j \) is:

$$
U_{ij} = x_j' \beta + \epsilon_{ij}
$$

where:
- \( x_j \) is a vector of product attributes (e.g., brand, ad, price),
- \( \beta \) is a vector of coefficients (part-worth utilities),
- \( \epsilon_{ij} \sim \text{i.i.d. Gumbel}(0, 1) \)

This leads to the well-known logit choice probability:

$$
\mathbb{P}_i(j) = \frac{e^{x_j' \beta}}{\sum_{k=1}^J e^{x_k' \beta}}
$$

This closed-form solution allows us to estimate \( \beta \) using maximum likelihood or Bayesian methods.


## 2. Structure of the Simulated Dataset

Before estimating the Multinomial Logit model, we begin by exploring the structure of the conjoint dataset.

Each row in the dataset corresponds to a product profile evaluated by a respondent in a choice task. The dataset includes the following variables:

- `resp`: Respondent ID
- `task`: Choice task number
- `choice`: Indicator variable (1 if the product was chosen, 0 otherwise)
- `brand`: Product brand (categorical: N, H, P)
- `ad`: Whether the product had advertising support (Yes or No)
- `price`: Price of the product

We load the data and display the first few rows:


In [2]:
import pandas as pd


conjoint = pd.read_csv("/content/conjoint_data.csv")


conjoint.head()


Unnamed: 0,resp,task,choice,brand,ad,price
0,1,1,1,N,Yes,28
1,1,1,0,H,Yes,16
2,1,1,0,P,Yes,16
3,1,2,0,N,Yes,32
4,1,2,1,P,Yes,16


The dataset preview above shows the structure of our conjoint experiment data. Each row represents a specific product profile shown to a respondent during a choice task.

- In the first task (`task = 1`) for `resp = 1`, the respondent was shown three alternatives (`brand = N`, `H`, `P`) and chose the product from brand `N` (`choice = 1`).
- In the second task (`task = 2`), the same respondent chose the product from brand `P`.

Each product is defined by its brand, whether it had advertising (`ad`), and its price. This structure will be key for estimating the utility values associated with each attribute using a Multinomial Logit model.


We can also check basic summary statistics and the number of respondents and tasks:


In [3]:

conjoint.describe()


print("Unique respondents:", conjoint['resp'].nunique())
print("Unique tasks per respondent:", conjoint.groupby('resp')['task'].nunique().mean())
print("Unique brands:", conjoint['brand'].unique())


Unique respondents: 100
Unique tasks per respondent: 10.0
Unique brands: ['N' 'H' 'P']


From our summary inspection, we observe the following about the dataset:

- There are **100 unique respondents**, each participating in the experiment.
- Each respondent completed exactly **10 choice tasks**.
- In each task, they evaluated and selected among **3 product alternatives**.
- The product options span **3 different brands**: `'N'`, `'H'`, and `'P'`.

This balanced panel structure — where all respondents evaluate the same number of tasks with consistent product configurations — makes the data well-suited for estimating a Multinomial Logit model.


## 3. Estimating the MNL Model via Maximum Likelihood

We begin by estimating the parameters of the Multinomial Logit model using Maximum Likelihood Estimation (MLE). The utility that respondent \( i \) derives from alternative \( j \) in task \( t \) is modeled as:

$$
U_{ijt} = x_{ijt}' \beta + \epsilon_{ijt}
$$

where:
- \( x_{ijt} \) includes product attributes such as brand, ad exposure, and price,
- \( \beta \) is a vector of coefficients (part-worth utilities),
- \( \epsilon_{ijt} \sim \text{i.i.d. Gumbel}(0, 1) \).

The probability that a consumer chooses alternative \( j \) is given by:

$$
\mathbb{P}_{ijt} = \frac{e^{x_{ijt}' \beta}}{\sum_{k=1}^{J} e^{x_{ikt}' \beta}}
$$

We now implement the log-likelihood function and use numerical optimization to estimate \( \beta \).


In [22]:
import pandas as pd
import numpy as np
from scipy.optimize import minimize


conjoint = pd.read_csv("/content/conjoint_data.csv")


conjoint_encoded = pd.get_dummies(conjoint, columns=['brand', 'ad'], drop_first=False)


X = conjoint_encoded[['brand_H', 'brand_P', 'ad_Yes', 'price']].astype(np.float64).values
y = conjoint_encoded['choice'].astype(np.int32).values
groups = conjoint_encoded.groupby(['resp', 'task']).ngroup()


def neg_log_likelihood(beta):
    beta = np.asarray(beta, dtype=np.float64)
    utilities = X @ beta
    exp_utilities = np.exp(utilities)

    probs = np.zeros_like(exp_utilities)
    for g in np.unique(groups):
        mask = (groups == g)
        denom = np.sum(exp_utilities[mask])
        probs[mask] = exp_utilities[mask] / denom

    log_likelihood = np.sum(np.log(probs[y == 1]))
    return -log_likelihood


initial_beta = np.zeros(X.shape[1], dtype=np.float64)


result = minimize(neg_log_likelihood, initial_beta, method='BFGS')


estimated_betas = pd.Series(result.x, index=['brand_H', 'brand_P', 'ad_Yes', 'price'])
print(estimated_betas)






brand_H   -0.941195
brand_P   -0.439579
ad_Yes    -0.731994
price     -0.099480
dtype: float64


## 3. Estimating the MNL Model via Maximum Likelihood

We estimated the parameters of the Multinomial Logit (MNL) model using maximum likelihood. The utility specification includes brand, advertisement exposure, and price. We treat `brand_N` and `ad_No` as baseline categories by omitting them from the regression.

### Estimated Coefficients:

| Variable   | Estimate   |
|------------|------------|
| brand_H    | -0.9412    |
| brand_P    | -0.4396    |
| ad_Yes     | -0.7320    |
| price      | -0.0995    |

### Interpretation:

- Both **brand_H** and **brand_P** are less preferred than the baseline brand (brand_N). In particular, brand_H is significantly less preferred.
- The negative coefficient on **ad_Yes** suggests that advertising had a **negative** effect on utility. This could reflect poor ad quality or overexposure.
- The **price** coefficient is negative, consistent with economic intuition: as price increases, the probability of choice decreases.

We now move on to estimate the same model using Bayesian methods, which allows us to quantify uncertainty more explicitly.


## 4. Bayesian Estimation using Metropolis-Hastings

To complement the MLE approach, we estimate the Multinomial Logit model using a Bayesian framework. We use the Metropolis-Hastings (MH) algorithm to sample from the posterior distribution of the coefficients.

We assume independent Normal(0, 1) priors for each parameter:

$$
\beta_k \sim \mathcal{N}(0, 1)
$$

The log-posterior is given by:

$$
\log P(\beta \mid \text{data}) \propto \log \mathcal{L}(\beta) + \log \mathcal{P}(\beta)
$$

Where \( \mathcal{L}(\beta) \) is the likelihood function and \( \mathcal{P}(\beta) \) is the prior.


In [25]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



def log_prior(beta):
    return -0.5 * np.sum(beta**2)

def log_likelihood(beta):
    utilities = X @ beta
    exp_utilities = np.exp(utilities)

    probs = np.zeros_like(exp_utilities)
    for g in np.unique(groups):
        mask = (groups == g)
        denom = exp_utilities[mask].sum()
        probs[mask] = exp_utilities[mask] / denom

    return np.sum(np.log(probs[y == 1]))

def log_posterior(beta):
    return log_likelihood(beta) + log_prior(beta)

def fast_mh_sampler(log_post, initial, steps=1000, proposal_scale=0.1):
    beta = np.array(initial)
    samples = []
    accepted = 0

    for _ in range(steps):
        proposal = beta + np.random.normal(0, proposal_scale, size=beta.shape)
        lp_current = log_post(beta)
        lp_proposal = log_post(proposal)

        accept_ratio = np.exp(lp_proposal - lp_current)
        if np.random.rand() < min(1, accept_ratio):
            beta = proposal
            accepted += 1
        samples.append(beta.copy())

    print(f"Acceptance Rate: {accepted / steps:.2f}")
    return np.array(samples)


initial_beta = np.zeros(X.shape[1])
samples_fast = fast_mh_sampler(log_posterior, initial_beta, steps=1000, proposal_scale=0.15)


posterior_means_fast = pd.Series(np.mean(samples_fast[500:], axis=0), index=['brand_H', 'brand_P', 'ad_Yes', 'price'])
print(posterior_means_fast)



Acceptance Rate: 0.02
brand_H   -0.987987
brand_P   -0.495108
ad_Yes    -0.676342
price     -0.097416
dtype: float64


## 4. Bayesian Estimation using Metropolis-Hastings

We estimated the parameters of the Multinomial Logit model using a Bayesian approach via the Metropolis-Hastings algorithm. Each coefficient was given a Normal(0, 1) prior, and we used 1,000 sampling steps with a moderate proposal width.

Although the acceptance rate was relatively low (2%), the chain still explored the posterior distribution and provided stable estimates.

### Posterior Mean Estimates

| Variable   | Posterior Mean |
|------------|----------------|
| brand_H    | -0.988         |
| brand_P    | -0.495         |
| ad_Yes     | -0.676         |
| price      | -0.097         |

These results are consistent with our MLE estimates from Section 3, reaffirming the conclusion that consumers disfavor brands H and P compared to the base brand N. Ads had a small negative effect, and price negatively impacted choice probability.






## 5. Comparing MLE and Bayesian Estimates

Below we compare the coefficients estimated using Maximum Likelihood Estimation (MLE) and Bayesian estimation (via Metropolis-Hastings MCMC):

| Variable   | MLE Estimate | Bayesian Mean |
|------------|--------------|----------------|
| brand_H    | -0.941       | -0.988         |
| brand_P    | -0.440       | -0.495         |
| ad_Yes     | -0.732       | -0.676         |
| price      | -0.099       | -0.097         |

### Interpretation

- The estimates are **highly consistent** across the two methods. This suggests the data is informative enough that both MLE and Bayesian approaches converge to similar values.
- The Bayesian method adds the benefit of capturing **uncertainty in the form of posterior distributions** (not just point estimates).
- Minor differences in coefficients reflect the influence of the Normal(0, 1) priors used in the Bayesian approach.
- Despite a low MCMC acceptance rate, the Bayesian estimates remain stable and interpretable.

Both methods indicate that:
- Consumers prefer brand N over brands H and P.
- Advertising (as presented) may have a small negative effect.
- Higher prices reduce the likelihood of selection.

This agreement supports confidence in our model's robustness.


## 6. Conclusions and Recommendations

This project used both Maximum Likelihood Estimation (MLE) and Bayesian methods to estimate a Multinomial Logit model on simulated conjoint data. The model captured how product attributes — including brand, advertising, and price — influence consumer choice.

### Key Findings:

- **Brand preference**: Brand N is clearly the most preferred option, while brand H is the least favored. This suggests brand equity is strong for N and relatively weak for H.
- **Advertising effect**: Surprisingly, advertising was associated with a **negative utility**. This may indicate ineffective messaging, ad fatigue, or poor targeting. Further testing (e.g., A/B testing different ad creatives) is recommended.
- **Price sensitivity**: As expected, higher prices reduce the probability of choice. The estimated price coefficient can be used to simulate **price elasticity** and optimize pricing strategy.

### Recommendations:

1. **Reallocate marketing spend** away from brand H and toward better-performing brands like N, unless brand H is being repositioned or tested.
2. **Redesign advertising strategy**, focusing on ad content, frequency, and targeting. Since ads currently reduce utility, improving creative quality or better segmentation could reverse this effect.
3. **Leverage the model for simulation** — use the estimated utilities to simulate market shares under new pricing or promotional strategies.
4. **Consider Bayesian modeling** in future analyses when quantifying uncertainty is critical — especially in A/B testing or market simulations.

### Final Note:

The close agreement between MLE and Bayesian estimates reinforces the stability and robustness of the model. Conjoint analysis, paired with rigorous statistical modeling, offers powerful insights into consumer decision-making and can guide data-driven marketing strategies.
