# <center>Discrete choice estimation</center>
### <center>Alfred Galichon (NYU & Sciences Po)</center>
## <center>'math+econ+code' masterclass series</center>
#### <center>With python code examples</center>
© 2018–2023 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274 are acknowledged, as well as inputs from contributors listed [here](http://www.math-econ-code.org/team).

**If you reuse material from this masterclass, please cite as:**<br>
Alfred Galichon, 'math+econ+code' masterclass series. https://www.math-econ-code.org/

## References

* Savage, L. (1951). The theory of statistical decision. JASA.
* Bonnet, Fougère, Galichon, Poulhès (2021). Minimax estimation of hedonic models. Preprint.

## Loading the libraries

First, let's load the libraries we shall need.

In [1]:
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here

In [2]:
import numpy as np
import pandas as pd
import scipy.sparse as spr
from scipy import optimize, special
import gurobipy as grb
from sklearn import linear_model
from tabulate import tabulate


We will also import objects created in the previous lectures, which are now stored in the `objects_D3`module.

In [3]:
# from objects_D3 import *

## Our data
We will go back to the dataset of Greene and Hensher (1997). As a reminder, 210 individuals are surveyed about their choice of travel mode between Sydney, Canberra and Melbourne, and the various costs (time and money) associated with each alternative. More precisely, recall that the variable `wait` is the waiting time of the plane, train or bus (waiting time is zero for car). `vcost` is the in-vehicle cost, without accounting for the value of the time savings. `travel` is the in-vehicle travel time. `gcost` is the generalized cost, which is the sum of ` vcost` and the value associated with time savings. `income` is the income of the household, in thousands of dollars. `size` is the size of the travel group.

Therefore there are 840 = 4 x 210 observations, which we can stack into `travelmodedataset` a 3 dimensional array whose dimensions are mode,individual,dummy for choice+covariates.

Let's load the dataset and take a first glance at it:

In [4]:
thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/demand_travelmode/'
travelmode =  pd.read_csv(thepath+'travelmodedata.csv')
travelmode.head()

Unnamed: 0,individual,mode,choice,wait,vcost,travel,gcost,income,size
0,1,air,no,69,59,100,70,35,1
1,1,train,no,34,31,372,71,35,1
2,1,bus,no,35,25,417,70,35,1
3,1,car,yes,0,10,180,30,35,1
4,2,air,no,64,58,68,68,30,2


# GLM to estimate discrete choice models

## Estimation with observed heterogeneity

We assume that we observe individual characteristics that are relevant for individual choices, that is $U_{iy}=\sum_k \Phi_{iyk} \beta_k$, or in matrix form<br>
$U = \Phi \beta,$<br>
where $\beta\in\mathbb{R}^{p}$ is a parameter, and $\Phi$ is a $\left(\left\vert \mathcal{I}\left\vert\right\vert\mathcal{Y}\right\vert \right) \times p$ matrix.

Assume $u_{iy}=U_{iy} + \varepsilon _{iy}= \sum_{k}\Phi _{iyk} \beta _{k}+\varepsilon _{iy}$.

Let $\hat{\mu}_{iy}$ be the indicator that $i$ chooses alternative $y$. 



We create a `discreteChoicePb` class where we store that data. An arc $a$ is a pair $iy$.

In [5]:
class DiscreteChoicePb():
    def __init__(self,Φ_i_y_k, μhat_i_y):
        self.nbi,self.nby,self.nbk = Φ_i_y_k.shape
        self.nba = self.nbi * self.nby
        self.Φ_a_k = Φ_i_y_k.reshape((self.nba,-1))
        self.μhat_a = μhat_i_y.flatten()

We build an object `travelEx` based on our travel data and a parametric model where the regressors are `travel`, `-travel*income` and `gcost`.

In [6]:
def prepare_travel_data():
    μhat_a = np.where(travelmode['choice'] =='yes' , 1, 0)
    #nobs,ncols = travelmode.shape
    nby = travelmode['mode'].nunique()
    nbi = travelmode.shape[0] // nby
    covariates = travelmode[['travel', 'income', 'gcost']].values
    Φ_a_k = np.column_stack([ covariates[:,0] , - (covariates[:,0] * covariates[:,1] ), - covariates[:,2] ])
    _,nbk = Φ_a_k.shape 
    Φbar_k = Φ_a_k.mean(axis = 0)
    Φstdev_k = Φ_a_k.std(axis = 0, ddof = 1)
    Φ_i_y_k = ((Φ_a_k - Φbar_k[None,:]) / Φstdev_k[None,:]).reshape((nbi,nby,nbk))
    return DiscreteChoicePb(Φ_i_y_k,μhat_a)

travelEx = prepare_travel_data()

## Maximum likelihood estimation

The probability $\mu_{iy}$ that individual $i$ chooses alternative $y$ is given by $\partial G_i / \partial U_y (\Phi \beta)$. 


The log-likelihood function is given by

$
l\left(  \beta\right)  =\sum_{y}\hat{\mu}_{iy}\log \mu_{iy}\left(\Phi \beta\right) = \sum_{y}\hat{\mu}_{iy}\log  
\frac {\partial G_i}  {\partial U_y} (\Phi \beta)
$

A common estimation method of $\beta$ is by maximum likelihood: $\max_{\beta}l\left(  \beta\right) $; MLE is statistically efficient; the problem is that the problem is not guaranteed to be convex, so there may be computational difficulties (e.g. local optima).



### MLE, logit case

In the logit case, the log-likelihood associated with observation $i\in\mathcal{I}$ is

$
l_{i}\left( \beta \right) =\sum_{y\in \mathcal{Y}}\hat{\mu}_{iy}\left( \Phi
\beta \right) _{iy}-\log \sum_{y\in \mathcal{Y}}\exp \left( \Phi \beta
\right) _{iy}$

and the max-likelihood rewrites as

$\max_{\beta }\left\{ \sum_{i\in \mathcal{I},y\in \mathcal{Y}}\hat{\mu}%
_{iy}\left( \Phi \beta \right) _{iy}-\sum_{i\in \mathcal{I}}\log \sum_{y\in 
\mathcal{Y}}\exp \left( \Phi \beta \right) _{iy}\right\}$

so that the max-likehood boils down to

\begin{align*}
\max_{\beta}\left\{  \hat{\mu}^{\intercal} \Phi \beta- \sum_i G_i\left( \Phi \beta\right)\right\}
\end{align*}

whose value is the Legendre-Fenchel transform of $\beta\rightarrow \sum_i G_i\left( \Phi \beta\right)$ evaluated at $\Phi ^{^{\intercal}}\hat{\mu}$.

Note that the vector $\Phi^{^{\intercal}}\hat{\mu}$ is the vector of empirical moments, which is a sufficient statistics in the logit model.

As a result, in the logit case, the MLE is a convex optimization problem, and it is therefore both statistically efficient and computationally efficient.



### Moment estimation

The previous remark will inspire an alternative procedure based on the moments statistics $\Phi^{^{\intercal}}\hat{\mu}$.

The social welfare is given in general by $W\left(  \beta\right) =\sum_i G_i\left(  \Phi\beta\right)  $. One has <br>$\partial_{\beta^{k}}W\left(\beta\right)  =\sum_{iy} \frac {\partial G_i} {\partial U_y}(\Phi\beta) \Phi_{yik}$, that is 

\begin{align*}
\nabla W\left(  \beta\right)  = \Phi^{\intercal}\nabla G_i\left(  \Phi\beta\right)  ,
\end{align*}

which is the vector of predicted moments.

Therefore the program

$\max_{\beta }\left\{ \hat{\mu}^{\top }\Phi \beta -\sum_{i\in \mathcal{I}%
}G\left( \left( \Phi \beta \right) _{i.}\right) \right\} ,$

(where G is the Emax operator associated with the distribution of the random utility), picks up the parameter $\beta$ which matches the empirical moments $\Phi^{^{\intercal}}\hat{q}$ with the predicted ones $\nabla W\left(\beta\right)  $. This procedure is not statistically efficient, but is computationally efficient becauses it arises as a convex optimization problem.

### Creating a DiscreteChoicePb class

In [7]:
def DiscreteChoicePb_mle_diy(self):
    def minus_log_likelihood(β_k, σ = 1):
        Φβ_i_y = (self.Φ_a_k.dot(β_k)).reshape((-1,self.nby)) 
        maxΦβ_i = Φβ_i_y.max(axis = 1)
        d_i = np.sum(np.exp((Φβ_i_y -maxΦβ_i[:,None])/σ ), axis = 1)
        return - ((Φβ_i_y.flatten()*self.μhat_a).sum() / σ  -  (maxΦβ_i / σ + np.log(d_i)).sum())

    def grad_minus_log_likelihood(β_k, σ = 1):
        Φβ_i_y = (self.Φ_a_k.dot(β_k)).reshape((-1,self.nby)) 
        maxΦβ_i = Φβ_i_y.max(axis = 1)
        d_i = np.sum(np.exp((Φβ_i_y - maxΦβ_i[:,None] )/σ ), axis = 1)
        μβ_iy = (np.exp((Φβ_i_y - maxΦβ_i[:,None] )/σ ) / d_i[:,None]).flatten()
        return  - ((self.μhat_a - μβ_iy).reshape((1,-1)) @ self.Φ_a_k).flatten()

    βtilde0_k = np.zeros(self.nbk)
    res = optimize.minimize(minus_log_likelihood,method = 'CG',jac = grad_minus_log_likelihood, x0 = βtilde0_k )
    βtilde_k = res['x']
    print(-minus_log_likelihood (βtilde_k))
    return βtilde_k[:-1] / βtilde_k[-1],  1 / βtilde_k[-1], - res['fun']

DiscreteChoicePb.mle_diy = DiscreteChoicePb_mle_diy


Compute using:

In [8]:
βmle_k,Tmle,llmle = travelEx.mle_diy()
print('DIY approach. βmle_k = ',βmle_k,'; Tmle = ',Tmle, ' ; ll=',llmle,'.')

-277.7052141445852
DIY approach. βmle_k =  [0.33826733 0.85179311] ; Tmle =  1.8162760072465682  ; ll= -277.7052141445852 .


### Computation as a Poisson regression using GLM in scikit-learn

As a reminder, this can be computed as 
\begin{align*}
\min_{\beta,u} \left\{ \sum_{iy} \hat{\mu}_{iy} \left(   \Phi\beta - (I_\mathcal{I} \otimes 1_\mathcal{Y}) u \right)  _{iy} - \sum_{iy} \exp\left(  \left(  \Phi\beta - (I_\mathcal{I} \otimes 1_\mathcal{Y}) u \right) _{iy} \right)  \right\}
\end{align*}

which leads to the following call to `scikit-learn`:


In [9]:
def DiscreteChoicePb_mle_glm(self, max_iter = 1000, tol=0.0001):
    poisson = linear_model.PoissonRegressor(alpha = 0, fit_intercept=False,max_iter= max_iter, tol = tol)
    X_a_l = spr.hstack([self.Φ_a_k, -spr.kron(spr.identity(self.nbi), np.ones((self.nby,1)))])
    poisson.fit(X_a_l, self.μhat_a)
    val =  self.μhat_a @ X_a_l @ poisson.coef_ - (np.exp(X_a_l @ poisson.coef_)).sum() + self.nbi
    return poisson.coef_[:self.nbk-1] / poisson.coef_[self.nbk-1],1 / poisson.coef_[self.nbk-1], val

DiscreteChoicePb.mle_glm = DiscreteChoicePb_mle_glm

We verify that both approaches are indeed equivalent:

In [10]:
βmleglm_k,Tmleglm,llmleglm = travelEx.mle_glm(max_iter = 10000, tol = 1e-9)
print('GLM approach. βmle_k = ',βmleglm_k,'; Tmle = ',Tmleglm, ' ; ll=',llmleglm,'.')
print('DIY approach. βmle_k = ',βmle_k,'; Tmle = ',Tmle, ' ; ll=',llmle,'.')

GLM approach. βmle_k =  [0.33826722 0.85179332] ; Tmle =  1.8162751318843122  ; ll= -277.7052141446431 .
DIY approach. βmle_k =  [0.33826733 0.85179311] ; Tmle =  1.8162760072465682  ; ll= -277.7052141445852 .


# Fixed temperature MLE

Back to the logit case. Recall we have

\begin{align*}
l\left(  \tilde{\beta}\right)  =N\left\{  \hat{\mu}^{\intercal}\Phi\tilde{\beta}-\sum_i\log\sum_{y} \exp\left(  \Phi\tilde{\beta}\right)  _{iy}\right\}
\end{align*}

Assume that we restrict ourselves to $\tilde{\beta}[k]>0$. Then we can define $\beta =  \tilde{\beta} / \tilde{\beta}[k]$ and $T=1/ \tilde{\beta}[k]$ so we have $\tilde{\beta}=\beta/T$  and $\beta[k]=1$. Letting $B=\left\{  \beta\in\mathbb{R}^{p},\beta[k]=1\right\}  $, so that $\beta\in B$. We have for $\beta \in B$ and $T>0$

\begin{align*}
l\left(  \beta,T\right)  =\frac{N}{T}\left\{  \hat{\mu}^{\intercal}
\Phi\beta-T\sum_i\log\sum_{y}\exp\left(  \frac{\left(  \Phi\beta\right)  _{iy}}{T}\right)  \right\}
\end{align*}

and we define the *fixed temperature maximum likelihood estimator* by

\begin{align*}
\beta\left(  T\right)  =\arg\max_{\beta \in B}l\left(  \beta,T\right)
\end{align*}

 Note that $\beta\left(  T\right)  =\arg\max_{\beta\in B}Tl\left(\beta,T\right)$ where

\begin{align*}
Tl\left(  \beta,T\right)  =N\left\{  \hat{\mu}^{\intercal}\Phi\beta-T\sum_i\log\sum _{y}\exp\left(  \frac{\left(  \Phi\beta\right)  _{iy}}{T}\right)  \right\}.
\end{align*}



### Normalization and implementation

We denote $\phi[a] = \Phi[k,a]$ and we have 
\begin{align*}
\min_{\beta,u} \left\{ \sum_{iy} \hat{\mu}_{iy} \left(  \frac{\left(  \phi + \Phi\beta - (I_\mathcal{I} \otimes 1_\mathcal{Y}) u \right)  }{T}\right)_{iy} - \sum_{iy} \exp\left(  \frac{\left(\phi+  \Phi\beta - (I_\mathcal{I} \otimes 1_\mathcal{Y}) u \right)  }{T}\right)_{iy}  \right\}
\end{align*}

which amounts to a weighted Poisson regression with weights $\exp(\phi/T)$ and dependent variable $\hat{\mu} \exp(-\phi/T)$.


We have

\begin{align*}
\frac{Tl\left(  \beta,T\right)  }{N}=\hat{\mu}^{\intercal}( \phi+ \Phi\beta)-T\sum_i \log\sum_{y}\exp\left(  \frac{ \phi_{iy} + \left(   \Phi\beta\right)  _{iy}}{T}\right)
\end{align*}

One has 

\begin{align*}
\beta\left(  T\right)  \in\arg\max\left\{  \hat{\mu}^{\intercal}(\phi+ \Phi\beta)- \sum_i u_i\left(  \beta\right)  -T\sum_i \log\sum_{y}\exp\left(  \frac{\left( \phi+ \Phi\beta\right)  _{iy}- u_i\left(  \beta\right)  }{T}\right)  \right\}
\end{align*}


Implement as follows:

In [11]:
def DiscreteChoicePb_mle_fixed_temp_diy (self, T=1):
    def minus_log_likelihood(β_k, *args):
        T = args[0]
        Φβ_i_y = (self.Φ_a_k.dot(np.append(β_k ,1))).reshape((-1,self.nby)) 
        maxΦβ_i = Φβ_i_y.max(axis = 1)
        d_i = np.sum(np.exp((Φβ_i_y -maxΦβ_i[:,None])/T ), axis = 1)
        return - ((Φβ_i_y.flatten()*self.μhat_a).sum() / T  -  (maxΦβ_i / T + np.log(d_i)).sum())

    def grad_minus_log_likelihood(β_k, *args):
        T = args[0]
        Φβ_i_y = (self.Φ_a_k.dot(np.append(β_k ,1))).reshape((-1,self.nby)) 
        maxΦβ_i = Φβ_i_y.max(axis = 1)
        d_i = np.sum(np.exp((Φβ_i_y - maxΦβ_i[:,None] )/T ), axis = 1)
        μβ_iy = (np.exp((Φβ_i_y - maxΦβ_i[:,None] )/T ) / d_i[:,None]).flatten()
        return  - ((self.μhat_a - μβ_iy).reshape((1,-1)) @ self.Φ_a_k).flatten()[:-1]

    β0_k = np.zeros(self.nbk-1)
    res = optimize.minimize(minus_log_likelihood,method = 'CG',jac = grad_minus_log_likelihood, args = (T,), x0 = β0_k )
    β_k = res['x']
    return β_k,T,-res['fun']

DiscreteChoicePb.mle_fixed_temp_diy = DiscreteChoicePb_mle_fixed_temp_diy

In [12]:
β2_k ,_,ll2 = travelEx.mle_fixed_temp_diy(2)
print('DIY approach. β2_k = ',β2_k, ' ; ll=',ll2,'.')

DIY approach. β2_k =  [0.35608773 0.9495755 ]  ; ll= -277.7440071232399 .


As before, we can also compute the problem using `glm` as follows:

In [13]:
def DiscreteChoicePb_mle_fixed_temp_glm(self, T=1,max_iter = 1000, tol=0.0001):
    X_a_l = spr.hstack([self.Φ_a_k[:,:-1], -spr.kron(spr.identity(self.nbi), np.ones((self.nby,1)))])
    poisson = linear_model.PoissonRegressor(alpha = 0, fit_intercept=False,max_iter= max_iter, tol = tol)
    Φ_a = self.Φ_a_k[:,-1]
    poisson.fit(X_a_l, self.μhat_a * np.exp(-Φ_a / T), sample_weight = np.exp(Φ_a / T) )
    β_k = T * poisson.coef_[:(self.nbk-1)]
    val =  self.μhat_a @ (X_a_l @ poisson.coef_ + Φ_a / T)- (np.exp( (X_a_l @ poisson.coef_+ Φ_a / T ) ) ).sum() + self.nbi
    return β_k, T,val

DiscreteChoicePb.mle_fixed_temp_glm = DiscreteChoicePb_mle_fixed_temp_glm

Compare the two:

In [14]:
β2glm_k,_ ,ll2glm = travelEx.mle_fixed_temp_glm(2) 
print('GLM approach. β2_k = ',β2glm_k, ' ; ll=',ll2glm,'.')
print('DIY approach. β2_k = ',β2_k, ' ; ll=',ll2,'.')

GLM approach. β2_k =  [0.3499578  0.94529618]  ; ll= -277.74802551270767 .
DIY approach. β2_k =  [0.35608773 0.9495755 ]  ; ll= -277.7440071232399 .


### Minimax-regret estimation

Let $\beta\left(  0\right)  =\lim_{T\rightarrow0}\beta\left(T\right)  $. Calling $u_i\left(  \beta\right)  =\max_{y\in\mathcal{Y}}\left\{\phi_{iy} + \left(   \Phi\beta\right)  _{iy}\right\}  $, we have

\begin{align*}
\beta\left(  0\right)  \in\arg\max_{\beta}\left\{  \hat{\mu}^{\intercal}(\phi+ \Phi\beta)-\sum_i u_i\left(  \beta\right)  \right\},
\end{align*}

or

\begin{align*}
\beta\left(  0\right)  \in\arg\min_{\beta}\left\{ \sum_i u_i\left(  \beta\right)-\hat{\mu}^{\intercal}(\phi+ \Phi\beta)\right\},
\end{align*}

Note that $Tl\left(  \beta,T\right)  \rightarrow N\left\{  \hat{\mu}^{\intercal}\Phi\beta-\sum_i\max_{y\in\mathcal{Y}}\left\{  \left(  \Phi\beta\right)_{iy}\right\}  \right\}  $ as $T\rightarrow0$. As a result,

\begin{align*}
\beta\left(  0\right)  \in\arg\max\left\{  \hat{\mu}^{\intercal} (\phi+ \Phi\beta)
-\sum_i u_i\left(  \beta\right)  \right\}  .
\end{align*}

Define $R_{i}\left(  \beta,y\right)  =\left(  \phi+ \Phi\beta\right)_{iy}-\left( \phi+ \Phi\beta\right)  _{iy_{i}}$ the regret associated with observation $i$ with respect to $y$. This is equal to the difference between the payoff given by $y$ and the payoff obtained under observation $i$, denoting $y_{i}$ the action taken in observation $i$. The max-regret associated with observation $i$ is therefore

\begin{align*}
\max_{y\in\mathcal{Y}}R_{i}\left(  \beta,y\right)  =\max_{y\in\mathcal{Y}}\left\{  \left(  \phi+ \Phi\beta\right)_{iy}-\left(  \phi+\Phi\beta\right)_{iy_{i}}  \right\}= u_i(\beta) - \sum_y \hat{\mu}_{iy} \left( \phi+ \Phi\beta\right)_{iy} 
\end{align*}

and the max-regret associated with the sample is $\frac{1}{N}\sum_i \max_{y\in\mathcal{Y}}\left\{  R_{i}\left(  \beta,y\right)  \right\}  $, that is $\sum_i u_i(\beta)  - \sum_{iy} \hat{\mu}_{iy} \left( \phi+ \Phi\beta\right)_{iy}$.

This leads to the minimax regret estimator

\begin{align*}
\hat{\beta}^{MMR}=\min_{\beta}\left\{  \sum_i u_i(\beta)  -\hat
{\mu}^{\intercal}(\phi+\Phi\beta)\right\}
\end{align*}

### Linear programming formulation

The minimax regret estimator

\begin{align*}
\hat{\beta}^{MMR}=\min_{\beta}\left\{  \sum_i u_i(\beta)  -\hat
{\mu}^{\intercal}\Phi\beta\right\}
\end{align*}

has a linear programming fomulation

\begin{align*}
&  \min_{u_i,\beta}\sum_i u_i -\hat{\mu}^{\intercal}(\phi+\Phi\beta)\\
s.t.~ &  u_i -\left( \Phi\beta\right)  _{iy}\geq  \phi_{iy} ~\forall i \in \mathcal{I} ~\forall y\in\mathcal{Y}
\end{align*}

dropping the unnecessary term from the objective function, this becomes

\begin{align*}
&  \min_{u_i,\beta}\sum_i u_i -\hat{\mu}^{\intercal}\Phi\beta\\
s.t.~ &  u_i -\left( \Phi\beta\right)  _{iy}\geq  \phi_{iy} ~\forall i \in \mathcal{I} ~\forall y\in\mathcal{Y}
\end{align*}

that is

\begin{align*}
 \min_{u_i,\beta}~ &  1^{\intercal}_\mathcal{I} u- \hat{\mu}^{\intercal}  \Phi\beta \\
s.t.~ &  (I_\mathcal{I} \otimes 1_\mathcal{Y}) u  - \Phi\beta \geq  \phi 
\end{align*}


In [15]:
def DiscreteChoicePb_minimax_regret(self,OutputFlag = False):
    Φ_a_k = self.Φ_a_k[:,:-1]
    φ_a = self.Φ_a_k[:,-1]
    nba,nbk = Φ_a_k.shape
    nbi,nby = self.nbi, self.nby
    μhat_a = self.μhat_a
    
    m = grb.Model()
    m.setParam( 'OutputFlag', OutputFlag )
    grb_β_k = m.addMVar(nbk , lb=-grb.GRB.INFINITY)
    grb_u_i = m.addMVar(nbi , lb=-grb.GRB.INFINITY)
    m.setObjective(np.ones((1,nbi)) @ grb_u_i -  μhat_a.reshape(1,nba) @ Φ_a_k @ grb_β_k  ,  grb.GRB.MINIMIZE)

    m.addConstr( spr.kron(spr.identity(nbi),np.ones((nby,1))) @ grb_u_i - Φ_a_k @ grb_β_k >= φ_a)
    m.optimize()
    if m.status == grb.GRB.Status.OPTIMAL:
        β_k = np.array(m.getAttr('x'))[:nbk]
        return β_k,0
    
DiscreteChoicePb.minimax_regret = DiscreteChoicePb_minimax_regret

In [16]:
travelEx.minimax_regret()

Set parameter Username
Academic license - for non-commercial use only - expires 2024-12-08


(array([0.13810992, 0.03058339]), 0)

### Set-identification

Note that the set of $\theta$ that enter the solution to the problem above is not unique, but is a convex set. Denoting $V$ the value of program, we can look for bounds of $\theta^{\intercal}d$ for a chosen direction $d$ by

\begin{align*}
& \min_{\beta,u}/\max_{\beta,u}   \beta^{\intercal}d\\
s.t.~  &  1^{\intercal}_\mathcal{I} u- \hat{\mu}^{\intercal}  \Phi\beta =V\\
&  (I_\mathcal{I} \otimes 1_\mathcal{Y}) u  - \Phi\beta \geq  \phi 
\end{align*}


# Plotting the whole path

In [17]:
indMax=100
tempMax= 2*Tmle
outcomemat = np.zeros((indMax+1,travelEx.nbk+1))

outcomemat[0,1:-1],_ = travelEx.minimax_regret()
outcomemat[0,-1] = -np.inf
iterMax = indMax+1
for k in range(2,iterMax+1,1):
    thetemp = tempMax * (k-1)/ indMax
    outcomeFixedTemp,_,ll = travelEx.mle_fixed_temp_diy(thetemp)
    outcomemat[k-1,0] = thetemp
    outcomemat[k-1,1:-1] = outcomeFixedTemp 
    outcomemat[k-1,-1] = ll 
df = pd.DataFrame(outcomemat, columns=['T', 'β_1', 'β_2' , 'logLikelihood'])
df.head(8)

Unnamed: 0,T,β_1,β_2,logLikelihood
0,0.0,0.13811,0.030583,-inf
1,0.036326,0.142615,0.027921,-2127.900288
2,0.072651,0.14524,0.034438,-1082.761589
3,0.108977,0.152885,0.047561,-743.861319
4,0.145302,0.162268,0.066417,-581.784325
5,0.181628,0.168505,0.074645,-489.953271
6,0.217953,0.175305,0.087783,-432.647199
7,0.254279,0.181414,0.10084,-394.515489
