# **Discrete Choice : Linear Models**

In [1]:
import numpy as np
import pandas as pd
from choice_learn.models import SimpleMNL, ConditionalLogit, NestedLogit
from choice_learn.data import ChoiceDataset
from choice_learn.datasets.base import load_heating

2025-05-24 03:19:11.112402: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
import tensorflow as tf
import tensorflow_probability as tfp

print(f"TensorFlow: {tf.__version__}")
print(f"TFP: {tfp.__version__}")

TensorFlow: 2.16.1
TFP: 0.24.0


## **Random Utility Model (RUM)**

**Objective:**

We aim to model the probability that an agent $i$ chooses alternative $d$ among a discrete set of $J$ options.
This is done by specifying a utility model and computing the probability that alternative $d$ yields the highest utility for agent $i$.

**Definition (Random Utility Model or RUM):**

For every agent $i$, we define the utility vector $$\mathbf{U}_i = (U_{i1}, \dots, U_{iJ}) = \mathbf{v}_i + \boldsymbol{\varepsilon}_i,$$ where:

- $\mathbf{v}_i = (v_{i1}, \dots, v_{iJ}) \in \mathbb{R}^J$ is the deterministic utility vector for each of the $J$ alternatives,
- $\boldsymbol{\varepsilon}_i$ is a random noise vector in $\mathbb{R}^J$ (distribution not specified at this stage),
- $\mathbf{U}_i \in \mathbb{R}^J$ is the total utility vector.

Then, the agent chooses the alternative with the highest utility:

$$
y_i = \arg\max_{j \in \{1, \dots, J\}} U_{ij}
$$

**Property (Conditional Choice Probability or CCP):**

Let $\boldsymbol{\varepsilon}_i \sim \text{Gumbel}(\mathbf{0}_J, \mathbf{1}_J)$ with i.i.d. components. Then,

$$
\Pr(y_i = d \mid \mathbf{v}_i) = \frac{\exp(v_{id})}{\sum_{j=1}^{J} \exp(v_{ij})}
$$

**Property (Independance of Irrelevant Alternatives or IIA):**

Let $\boldsymbol{\varepsilon}_i \sim \text{Gumbel}(\mathbf{0}_J, \mathbf{1}_J)$ with i.i.d. components. Then,

$$
\frac{\Pr(y_i = j \mid \mathbf{v}_i)}{\Pr(y_i = k \mid \mathbf{v}_i)} = \exp(v_{ij} - v_{ik})
$$

## **Conditional Logit**

**Definition:**

Conditional logit is a **RUM** with $\mathbf{v}_i = X \boldsymbol{\beta}$, where:

- $X \in \mathbb{R}^{J \times K}$ is the matrix of alternative features (same for all individuals),
- $\boldsymbol{\beta} \in \mathbb{R}^{K}$ is the common coefficient vector applied to all alternatives,
- $\mathbf{v}_i \in \mathbb{R}^{J}$ is the deterministic utility vector for individual $i$,
- $\boldsymbol{\varepsilon}_i \sim \text{Gumbel}(\mathbf{0}_J, \mathbf{1}_J)$ is i.i.d. across alternatives,
- $J$ is the number of alternatives, $K$ is the number of features per alternative.

*Under the i.i.d. Gumbel noise assumption, the conditional logit model satisfies both the **CCP** and **IIA** properties.*


## **Multinomial Logit**

**Definition:**

Multinomial logit is a **RUM** with $\mathbf{v}_i = B \mathbf{x}_i$, where:

- $\mathbf{x}_i \in \mathbb{R}^{K}$ is the feature vector of individual $i$ (same across alternatives),
- $B \in \mathbb{R}^{J \times K}$ is the matrix of coefficients (one row per alternative),
- $\mathbf{v}_i \in \mathbb{R}^{J}$ is the deterministic utility vector for individual $i$,
- $\boldsymbol{\varepsilon}_i \sim \text{Gumbel}(\mathbf{0}_J, \mathbf{1}_J)$ is i.i.d. across alternatives,
- $J$ is the number of alternatives, $K$ is the number of features per alternative.

*Under the i.i.d. Gumbel noise assumption, the multinomial logit model satisfies both the **CCP** and **IIA** properties.*


In [11]:
heating_df = load_heating(as_frame=True)

shared_features_by_choice = ["income", "agehed", "rooms"]
choice = ["depvar"]
items_features_by_choice = ["ic.", "oc."]
items = ["hp", "gc", "gr", "ec", "er"]

choices = np.array([items.index(val) for val in heating_df[choice].to_numpy().ravel()])
shared_features_by_choice = heating_df[shared_features_by_choice].to_numpy().astype("float32")
items_features_by_choice = np.stack([heating_df[[feat + item for feat in items_features_by_choice]].to_numpy() for item in items], axis=1)

dataset = ChoiceDataset(items_features_by_choice=items_features_by_choice,
                        choices=choices)
model = SimpleMNL(intercept=None)
history = model.fit(dataset, get_report=True, verbose=0)

print("Estimation Negative LogLikelihood:",
      model.evaluate(dataset) * len(dataset))

                                fit models needing them such as Conditional Logit.


Using L-BFGS optimizer, setting up .fit() function
Using L-BFGS optimizer, setting up .fit() function
Estimation Negative LogLikelihood: tf.Tensor(1095.2418, shape=(), dtype=float32)


In [5]:
model.report

Unnamed: 0,Coefficient Name,Coefficient Estimation,Std. Err,z_value,P(.>z)
0,Weights_items_features_0,-0.006232,0.000353,-17.665276,0.0
1,Weights_items_features_1,-0.00458,0.000322,-14.216596,0.0


## **Nested Logit**