# Thurstonian Utility Model

## Introduction
A **Thurstonian utility model** is a framework used to explain how people make choices when comparing different options. It originates from a psychological theory developed by L. L. Thurstone, which explores how individuals perceive differences between items.

---

## Core Concept
Imagine you have two kinds of food in front of you. You take a bite of each and decide which one you prefer. However, your decision isn't always 100% clear-cut—sometimes the difference between the two is small, and you might even pick differently if you tried again.

The Thurstonian model assumes that when comparing options, your brain assigns an **internal value** (or utility) to each one. This value, however, includes some randomness or uncertainty. As a result:
- Even if Food `A` is slightly better on average, you might sometimes pick Food `B due to this inherent variability in perception.

---

## Key Ideas

### 1. Comparisons, Not Absolute Scores
- People don't judge things on a fixed scale; instead, they evaluate options relative to each other.
- The focus is on pairwise comparisons rather than assigning absolute ratings.

### 2. Randomness (Noise)
- Perception is not deterministic. There is always some level of uncertainty or "noise" in how we evaluate options.
- This randomness accounts for inconsistencies in decision-making.

### 3. Probability of Preference
- The greater the difference between two options, the more likely you are to consistently choose the better one.
- When options are very similar, the probability of choosing either becomes closer to 50%.

---

## Applications
The Thurstonian model is widely used in various fields, including:
- **Psychology**: To study human perception and decision-making processes.
- **Marketing**: To predict consumer preferences and optimize product offerings.
- **AI**: To design algorithms that mimic human-like decision-making in scenarios involving uncertain or noisy data.

This model provides a robust framework for understanding and predicting how people choose between similar alternatives!

---

## Thurstonian Utility Model

In a **Thurstonian utility model**, a **Gaussian distribution** represents the uncertainty or variability in how people perceive the value of each option. Here's how it works:

### 1. Each Option Gets a Gaussian (Normal) Distribution
- Imagine you are comparing a **Burger** and an **Apple**.
- Each food item has an **underlying "true value"** (or utility), but you don’t perceive it perfectly. Instead, your brain assigns a utility that varies slightly each time due to randomness (e.g., mood, hunger, environment).
- This variation is modeled using a **Gaussian distribution** with:
  - A **mean** ($\mu$) representing the average perceived utility of that option.
  - A **standard deviation** ($\sigma$) representing the amount of randomness or noise in perception.

Mathematically, we say:

$$
U_{\text{Burger}} \sim \mathcal{N}(\mu_{\text{Burger}}, \sigma_{\text{Burger}}^2)
$$

$$
U_{\text{Apple}} \sim \mathcal{N}(\mu_{\text{Apple}}, \sigma_{\text{Apple}}^2)
$$

Where:
- $U_{\text{Burger}}$ and $U_{\text{Apple}}$ are the **perceived utilities**.
- $\mathcal{N}(\mu, \sigma^2)$ is a normal distribution.
- The means ($\mu_{\text{Burger}}, \mu_{\text{Apple}}$) are the "true" utilities.
- The standard deviations ($\sigma_{\text{Burger}}, \sigma_{\text{Apple}}$) represent uncertainty in perception.

### 2. Making a Choice: Pairwise Comparison
- When comparing the two options, you **draw** a sample utility from each distribution.
- You then choose the option with the higher sampled utility.
- This means if $U_{\text{Burger}} > U_{\text{Apple}}$, you choose **Burger**; otherwise, you choose **Apple**.
- The probability of choosing **Burger** depends on how much its utility distribution overlaps with **Apple**'s.

### 3. Probability of Preference
- If the distributions overlap a lot, your choice is more uncertain.
- If they are far apart, you're almost always going to pick the better one.
- Mathematically, the probability of choosing **Burger** over **Apple** is given by:

$$
P(\text{Burger is chosen}) = P(U_{\text{Burger}} > U_{\text{Apple}})
$$

This can be computed using the **cumulative distribution function (CDF)** of a normal distribution.

---

## Example in Action
Imagine you rate foods:
- **Burger:** $\mu_{\text{Burger}} = 7$, $\sigma_{\text{Burger}} = 1.5$
- **Apple:** $\mu_{\text{Apple}} = 6.5$, $\sigma_{\text{Apple}} = 2$

Even though **Burger is slightly better on average**, the overlap in their distributions means you won't always pick it. Some days, the Apple might taste better due to small variations in perception!

This is why the Thurstonian model explains **probabilistic choices** rather than strict, fixed preferences. It's useful in psychology, marketing, and machine learning for modeling **how people make uncertain decisions**.

# Code example with 3 choices

We consider an Apple, Burger and Chocolate

In [15]:
import numpy as np
from scipy.stats import norm
from scipy.optimize import minimize

# Example data: (Option A, Option B, Chosen Option)
# A: Apple
# B: Burger
# C: Chocolate
data = [
    ("B", "A", "B"),  # Burger preferred over Apple
    ("A", "C", "C"),  # Chocolate preferred over Apple
    ("B", "C", "B"),  # Burger preferred over Chocolate
    ("A", "B", "A"),  # Apple preferred over Burger -----> "Inconsistency", hence the probabilistic approach
    ("C", "A", "C"),  # Chocolate preferred over Apple
    ("C", "B", "B"),  # Burger preferred over Chocolate
    ("B", "A", "B"),  # Burger preferred over Apple
    ("A", "C", "A"),  # Apple preferred over Chocolate -----> "Inconsistency", hence the probabilistic approach
    ("C", "B", "C"),  # Chocolate preferred over Burger
    ("A", "B", "B"),  # Burger preferred over Apple
]

# Step 1: Initialize parameters
options = list(set([item for sublist in data for item in sublist[:2]]))
if len(options) < 2:
    raise ValueError("Insufficient unique options in the data")

mu = {opt: 0 for opt in options}  # Initialize means to 0
sigma = {opt: 1.0 for opt in options}  # Initial variance is 1

# Step 2: Define likelihood function
def neg_log_likelihood(params):
    for i, opt in enumerate(options):
        mu[opt] = params[i]
        sigma[opt] = np.exp(params[i + len(options)])  # Ensure positive sigma

    likelihood = 0
    epsilon = 1e-10  # Small constant for numerical stability
    for A, B, chosen in data:
        # Probability estimation (classical formula)
        mu_diff = mu[A] - mu[B]
        sigma_sum = np.sqrt(sigma[A]**2 + sigma[B]**2)
        prob_A = norm.cdf(mu_diff / (sigma_sum + epsilon))
        # Binary log-likelihood
        likelihood += np.log(prob_A if chosen == A else 1 - prob_A + epsilon)
    return -likelihood

# Step 3: Optimize
init_params = [0] * len(options) + [0] * len(options)  # Initialize mu and log(sigma)
bounds = [(-10, 10)] * len(options) + [(-10, 2)] * len(options)  # Bounds for mu and log(sigma)
result = minimize(neg_log_likelihood, init_params, method="L-BFGS-B", bounds=bounds)

# Extract results
mu_estimates = result.x[:len(options)]
sigma_estimates = np.exp(result.x[len(options):])

# Print estimated parameters
print("Estimated parameters:")
for i, opt in enumerate(options):
    print(f"{opt}: mu = {mu_estimates[i]:.2f}, sigma = {sigma_estimates[i]:.2f}")

# Optional: Calculate and print confidence intervals
# This would require computing the Hessian matrix at the optimum

print(f"\nLog-likelihood: {-result.fun:.2f}")
print(f"Converged: {result.success}")

Estimated parameters:
A: mu = -0.57, sigma = 1.20
C: mu = 0.00, sigma = 0.57
B: mu = 0.57, sigma = 1.20

Log-likelihood: -6.07
Converged: True


We can see some "inconsistencies" that can be represented with this model. The user generally chooses B > A, but there is an instance of A > B

## Summary of Results:

1. Preference Ranking: Burger > Chocolate > Apple
   (Based on estimated means: B: 0.57, C: 0.00, A: -0.57)

2. Preference Consistency:
   - Chocolate shows the most consistent preferences (σ = 0.57)
   - Burger and Apple have equal, higher variability (σ = 1.20 each)

3. Key Observations:
   - Burger is most preferred but with high variability
   - Chocolate is moderately preferred with the most consistent ratings
   - Apple is least preferred, also with high variability

# How to get the confidence of preferences?

I was asking Sonnet 3.5 because I was not 100% sure. I think we could use the data to get **frenquencies** but wanted to double check using the model

In [16]:
from scipy.stats import norm

mu_A, mu_B = -0.57, 0.57
sigma_A, sigma_B = 1.20, 1.20

prob_A_greater_B = norm.cdf((mu_A - mu_B) / np.sqrt(sigma_A**2 + sigma_B**2))
print(f"Probability of A > B: {prob_A_greater_B:.4f}")

Probability of A > B: 0.2509


In [17]:
A_wins = sum(1 for a, b, chosen in data if (a == 'A' and b == 'B' and chosen == 'A') or (a == 'B' and b == 'A' and chosen == 'A'))
total_AB_comparisons = sum(1 for a, b, _ in data if set([a, b]) == set(['A', 'B']))

raw_prob_A_greater_B = A_wins / total_AB_comparisons
print(f"Raw frequency of A > B: {raw_prob_A_greater_B:.4f}")

Raw frequency of A > B: 0.2500


## "Emergence" of utilities

We use cross-entropy here to compare how similar is the model's probability to the data. To have a really low cross-entropy the data should not have any "inconsistencies".



In [18]:
import numpy as np
from scipy.stats import norm

def calculate_pairwise_prob(mu_X, mu_Y, sigma_X, sigma_Y):
    return norm.cdf((mu_X - mu_Y) / np.sqrt(sigma_X**2 + sigma_Y**2))

def calculate_cross_entropy(data, mu, sigma):
    cross_entropy = 0
    for X, Y, chosen in data:
        p_X_over_Y = calculate_pairwise_prob(mu[X], mu[Y], sigma[X], sigma[Y])

        # Observed choice: 1 if X was chosen, 0 if Y was chosen
        y = 1 if chosen == X else 0

        # Cross-entropy formula
        cross_entropy -= (y * np.log(p_X_over_Y) + (1-y) * np.log(1 - p_X_over_Y))

    return cross_entropy / len(data)

# Using the fitted parameters from before
mu = {'A': -0.57, 'B': 0.57, 'C': 0.00}
sigma = {'A': 1.20, 'B': 1.20, 'C': 0.57}

cross_entropy = calculate_cross_entropy(data, mu, sigma)
print(f"Cross-entropy: {cross_entropy:.4f}")

Cross-entropy: 0.6068


### What does it mean to have a low cross-netropy here?

A low cross-entropy in the context of the Thurstonian model has several important implications:

1. **Good Model Fit**: The model's predictions align closely with observed choices, indicating it captures the underlying preference structure well.

2. **Accurate Predictions**: The model predicts pairwise choices with high accuracy, closely matching actual observed choice frequencies.

3. **Consistent Preferences**: Suggests that preferences in the data are relatively consistent and well-captured by the model's assumptions (e.g., normally distributed utilities).

4. **Useful for Decision Making**: The model can more reliably predict preferences in new situations or for items not directly compared in the original dataset.

5. **Effective Parameter Estimation**: Indicates that estimated parameters (μ and σ for each item) likely represent true underlying preferences well.

6. **Minimal Information Loss**: Less additional information is needed to explain observed choices beyond what the model provides.

7. **Potential for Generalization**: The model is more likely (though not guaranteed) to generalize well to new, unseen data.

In simple terms, low cross-entropy means the model is doing a good job of "understanding" and representing the preferences expressed in the choice data. It's akin to having a friend who can accurately predict your food choices because they have a good grasp of your tastes.

**Note**: While low cross-entropy is generally desirable, it should be interpreted in context. Extremely low values might indicate overfitting, especially if the cross-entropy on a separate validation set is much higher. The relative value of cross-entropy (comparing across models or datasets) is often more informative than its absolute value.