# MaxEnt Mixed Logit

### Choice probabilities

Probability Function: 
$$
P(y_i | X_i, \alpha_j, \beta_i) = \frac{e^{\alpha_j + X_{i}' \beta_i}}{\sum_{k \in C} e^{\alpha_k + X_{ik}' \beta_i}}
$$

- $P(y_i | X_i, \alpha_j, \beta_i)$: Probability of individual $i$ choosing a particular alternative.
- $y_i$: Choice outcome for individual $i$.
- $X_i$: Set of explanatory variables for individual $i$.
- $\alpha_j$: Intercept term for choice $j$, assumed to be a random variable varying across choices.
- $\beta_i$: Coefficient vector for individual $i$, assumed to be a random variable varying across individuals.
- $C$: Set of available choices.


The price betas vary across each customer, with the same mean and sigma for each product.

$$ \beta_i = \beta + \sigma v_i$$

The alpha betas vary across each customer, with a different mean and sigma for each product.

$$ \alpha_{ij} = \alpha_{j} + \sigma_j v_i $$

Because beta is a random variable, having several betas creates a random vector $\phi$ with a joint probability distribution.

Assuming independence (i.e. $X$ are not substitutes or complements)

$$\phi_i = \phi + \Gamma v_i$$

where $\Gamma$ is the standard deviation matrix with the sigmas on the main diagonal.

With correlation, the covariance matrix becomes

$$ \Sigma = \Gamma \Gamma' $$

where the elements below the main diagonal $\neq 0.$

**Dealing with $\alpha$**

We can rewrite the utility function $U = \alpha + \beta X$ as 

$$ U = \alpha \cdot 1 + \beta \cdot X $$

and then pretend that $\alpha$ is just another beta, and that the 1 is a dummy variable which is true when the product of interest is (potentially?) being chosen.

### Likelihood function
Likelihood Function for MNL Model:
$$
L(\beta) = \prod_{i=1}^{N} \left( \frac{e^{X_{i}' \beta}}{\sum_{j \in C} e^{X_{ij}' \beta}} \right)^{y_{ij}}
$$

- $L(\beta)$: Likelihood function for the coefficient vector $\beta$.
- $N$: Total number of observations (or individuals).
- $X_i$: Set of explanatory variables for individual $i$.
- $\beta$: Coefficient vector for the explanatory variables.
- $C$: Set of all available choices.
- $y_{ij}$: Indicator variable that is 1 if individual $i$ chooses alternative $j$, and 0 otherwise.

Explanation:
- For each individual $i$, the probability of choosing the observed alternative is calculated.
- These probabilities are then multiplied across all individuals to get the likelihood of observing the entire set of choices given the parameters $\beta$.
- This function is typically maximized to estimate the $\beta$ coefficients in MNL models.


# MaxEnt form

$$ y_{i} = F(X_{i}' \beta) + \epsilon_{i}$$

where 
- $y_i$ is the binary (1/0) choice data of what actually happened.
- $F( \cdot )$ is a link function.

$$ y_{i} = p_i \epsilon_{i}$$

Simplifying, the choice that actually happened is just the predicted probability plus some error $\epsilon_i$.

Because $0 < p_i < 1$ and $y_i$ is either $0$ or $1$, the error term is bounded between $[-1,1]$.

### Revenue cross-moments

$\sum_i y_i x_i$ is the total number of choices made (seats purchased) times the price—namely, the total actual revenue.

$\sum_i p_i x_i$ is the total predicted choices (fractional seats purchased) times the price—namely, the total predicted revenue.

$\sum_i \epsilon_i x_i$ is the total predicted choice error times the price—namely, the total revenue error.

$$\sum_i y_i x_i = \sum_i p_i x_i + \sum_i \epsilon_i x_i $$

Or more intuitively, total revenue equals predicted revenue plus revenue error.

### Making probabilities and error random variables

Let $p_i$ be the average of discrete random variable $s$, whose values can range $[0,1]$. 

$$ \langle s \rangle  = p_i $$

The true PMF of $s$ is $\pi$, such that every potential value of s $[0,1]$, multiplied by that value's probability $\pi$ equals the expected value.

$$ \langle s \rangle  = \sum_m s_m \pi_m $$



Let $\epsilon_i$ be the average of a discrete random variable $u$, whose values can range from $[-1,1]$. 

$$  \langle u \rangle = \epsilon_i $$

The true PMF of $u$ is $w$ and the same logic as before holds.

$$ \langle u \rangle  = \sum_h u_h w_h $$


### Revisiting revenue cross-moments

We substitute in our new definitions of $p_i$ and $\epsilon_i$.

$$\sum_i y_i x_i = \sum_i \left( \sum_m s_m \pi_m \right) x_i + \sum_i \left( \sum_h u_h w_h \right) x_i$$


Our goal is to estimate the probability distributions of $p_i$ ($\pi_m$) and $\epsilon_i$ ($w_h$) simultaneously by maximising their joint entropy, subject to the constraint above and the usual sum to one constraints. 



$$ \max_{\pi, w} H(\pi, w) = H(\pi) + H(w)$$

s.t. 

Revenue equalling the sum of predicted revenue and total revenue:

$$\sum_i y_i x_i = \sum_i \left( \sum_m s_m \pi_m \right) x_i + \sum_i \left( \sum_h u_h w_h \right) x_i$$

The $p_i$ and $\epsilon_i$ RV probabilities adding up to 1.

$$ \sum_m \pi_m = 1$$

$$ \sum_h w_h = 1$$



# Lagrangeans

$$ \hat{\pi} $$

$$ \hat{w} $$


# References

Manzini, P., Mariotti, M., & Ülkü, L. (2019). Stochastic complementarity. The Economic Journal, 129(619), 1343-1363.

Golan, A., & Greene, W. H. (2016). An Information Theoretic Estimator for the Mixed Discrete Choice Model. Handbook of Empirical Economics and Finance, 90-105.

Yan, Z., Natarajan, K., Teo, C. P., & Cheng, C. (2022). A representative consumer model in data-driven multiproduct pricing optimization. Management Science, 68(8), 5798-5827.