<a href="https://colab.research.google.com/github/pmontman/tmp_choicemodels/blob/main/nb/tutorials/solutions/WK_04_sol_mnl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial 4: Multinomial logit using the python biogeme package

We will learn to specify multinomial logit models using the biogeme package.

"*Biogeme is a open source Python package designed for the maximum likelihood estimation of parametric models in general, with a special emphasis on discrete choice models.*"

# Preparing the environment

Google colab environment does not have biogeme installed by default,
so we need to install in every session. Hopefully, it will take less than one minute. Once installed it will be valid until the session expires (or we reset it).

In [32]:
!pip install biogeme

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


First load the packages typical packages, biogeme and the common one for python data analysis.

In [33]:
import pandas  as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

import biogeme.database as db
import biogeme.biogeme as bio
import biogeme.models as models
import biogeme.expressions as exp


# The dataset

We will use the example from the biogeme package, Swissmetro dataset:

"*This dataset consists of survey data collected on the trains between St. Gallen and Geneva, Switzerland, during March 1998. The respondents provided information in order to analyze the impact of
the modal innovation in transportation (a new mode of transport), represented by the Swissmetro, a revolutionary mag-lev underground system, against the usual transport modes represented by car and train.*"

---
---

#Loading the dataset

Biogeme can interact with the popular pandas package, so we can load the dataset in pandas first and then pass it to biogeme.

This specific dataset uses the format 'tab separated values' instead of the more common 'comma separated values'. We can specify this non-standard separator by the argument `sep` in `pandas.read_csv`.



In [34]:
swissmetro = pd.read_csv('http://transp-or.epfl.ch/data/swissmetro.dat', sep='\t')

We will take a look at it, using the head() method of to display the first few rows.

There is a detailed description of the dataset 
variables [here](http://transp-or.epfl.ch/documents/technicalReports/CS_SwissmetroDescription.pdf). **You will need to take a look at them later
to understand the data and create your own models.**

In [35]:
swissmetro.head(5)

Unnamed: 0,GROUP,SURVEY,SP,ID,PURPOSE,FIRST,TICKET,WHO,LUGGAGE,AGE,...,TRAIN_TT,TRAIN_CO,TRAIN_HE,SM_TT,SM_CO,SM_HE,SM_SEATS,CAR_TT,CAR_CO,CHOICE
0,2,0,1,1,1,0,1,1,0,3,...,112,48,120,63,52,20,0,117,65,2
1,2,0,1,1,1,0,1,1,0,3,...,103,48,30,60,49,10,0,117,84,2
2,2,0,1,1,1,0,1,1,0,3,...,130,48,60,67,58,30,0,117,52,2
3,2,0,1,1,1,0,1,1,0,3,...,103,40,30,63,52,20,0,72,52,2
4,2,0,1,1,1,0,1,1,0,3,...,130,36,60,63,42,20,0,90,84,2


We see some socioeconomic characteristics such as 'AGE', likely encoding by age groups, 'MALE' which we can assume refers to gender. There are also some attributes of the alternatives, such as 'CAR_TT' which would be travel time in car, 'TRAIN_CO' which refers to the cost of the fare by train.

An important variable is 'CHOICE' it is the result of the choice for each individual, coded as 1 for train, 2 for swissmetro, 3 for car. The value 0 indicates invalid response.

---
---

# Passing the dataset to biogeme

For now, we have loaded the dataset into a pandas dataframe, we need to transform it into a Biogeme database, the format that biogeme understands.
We pass first argument the name of the database that we want to give and second the pandas dataframe.

In [36]:
bgm_swissmetro = db.Database('swissmetro', swissmetro)

We can access the dictionary of variables in the biogeme database the following way:

In [37]:
bgm_swissmetro.variables['CHOICE']

CHOICE

Because this is too verbose, we can load them into the global variables
of the python environment, to make the symbolic manipulation less verbose, so we can refer to them just by writing `VARIABLE_NAME` instead of having to write `database.variables['VARIABLE_NAME']`. **Note that the use of global variables is discouraged in general, we use it here for the sake of simplicity, but if we are dealing with complex code, it can lead to confusion. For example when using several biogeme databases at the same time, that share variable names**

The following line of code adds the name of the variables in the biogeme to the global environment in python. We check that now, for example `CHOICE` is understood by python as a variable in the biogeme database.

In [38]:
globals().update(bgm_swissmetro.variables)
CHOICE

CHOICE

Before we begin, we will need to clean the dataset a bit, as suggested by the creators of the dataset.

In this case, we have people that did not respond to the survey, and their value assigned
to the choice is 0. The only valid values for the choice are 1,2,3 indicating
the alternatives, train, swissmetro and car.
We can remove them from the biogeme database using the `remove` method with the logical indicator for the row that have choice 0.

**It is recommended that all database manipulations/cleaning are applied directly on the pandas dataframe before passing it to biogeme.** The reason being that pandas is better designed for that purpose and makes the code more readable. Ideally we would like out interactions with biogeme to be minimized and do as much as possible with the standard frameworks such as pandas.  

In [39]:
bgm_swissmetro.remove( (CHOICE == 0) )


In [40]:
#If using pandas, we would call, for example:
#swissmetro[swissmetro.CHOICE !=0]
#BEFORE creating the biogeme database
#bgm_swissmetro = db.Database('swissmetro', swissmetro)

---
---

# Creating the model

What we usually need to define in the multinomial logit can be summarized as:
 * Which variables in the database are we going to include in the model linear model.
 * What is the variable in the database that specifies the choice made, the alternative selected by an individual. The 'target variable' or dependent variable.
 * What variables are used in the modelling of each alternative. Remember that we can define a utility function for each alternative, this means that different alternatives could use different variables.

We can connect this back to the utily theory view, we want to specify the functions that produce the observed component of the utility, the $V_{nj}$

For each alternative $j$ and observation $n$, we consider the vector $x_{nj}$
to be the joint vector of for both attributes and characteristics (to simplify things).  We try to find the vector of coefficients $\beta_j$ for each alternative. In other words, we try to find the linear relationship between the variables and utility for each alternative:
  $$V_{nj} = \beta_j x_n$$ 

Technical things to consider:

* Consider that some attributes or characteristics are not relevant for some alternatives: This would be equivalent to fixing some of the values for $\beta_j$ to 0 and not fitting them to data.
* Consider that some attributes or characteristics 'share' the value of the coefficient. For example, we can specify that age affects the same way to all alternatives.


---
---

# The alternative specific constants:
Just as in linear models we have the intercept, in choice models we have alternative specific constants. An important difference is that we cannot determine their 'true' value, because we have seen in the previous tutorial that the absolute level of utilities cannot be recovered, only relative differences among alternatives. In practice, what we will do is assume that the attribute specific characteristic of one of the alternatives is set to 0 and we do not fit it to the data. This will set a reference point.

Again, which alternative we use as reference and what value for the ASC is arbitrary, up to use. We often choose wich alternative is set its ASC to 0 for interpretability (e.g. positive numbers in other ASCs will mean more utilty than the reference, negative numbers less utility than the reference, **all other things being equal**).


---
---

# Definition of the model in biogeme
We define the parameters of the model through the function `exp.Beta`.
The function `exp.Beta` takes 5 arguments:
1. the name of the parameter.
2. the default value. We can use 0 for the default values unless we know a better starting value, for example when we have prior information.
3. The lower bound, if we want to restrict it to a range, `None` if we do not want to restrict the value of the parameter. For example, sometimes we might know or would like to for a parameter to be positive.
4. The upper bound, if we want to restrict it to a range, `None` if we do not want to restrict the value of the parameter.
5. A 0/1 argument, 0 if the parameter must be estimated and 1 if it remains fixed.

We will define a simple model with just the three ASCs and two parameters for two variables of interest, time and cost.
Note that one of the ASCs, `ASC_SM` is set to not be estimated, notice the 1 in the value of the last argument when it is created. This comes from the explanation above, we set one of the ASCs arbitrarily to 0 because we utility cannot be recovered up to changes in constants, so we will pick among the many possible solutions the one that makes the ASC for the SwissMetro alternative 0.


In [41]:
ASC_CAR = exp.Beta ( 'ASC_CAR' ,0, None , None ,0)
ASC_TRAIN = exp.Beta ( 'ASC_TRAIN' ,0, None , None ,0)
ASC_SM = exp.Beta ( 'ASC_SM' ,0, None , None ,1)
B_TIME = exp.Beta ( 'B_TIME' ,0, None , None ,0)
B_COST = exp.Beta ( 'B_COST' ,0, None , None ,0)

We will create an artificial variable, ussing the luggage variable but squared.
This variable will only be included as a parameter for the utility of the car alternative. This is a totally arbitrary variable for the purposes of exposition, it does not mean that it is a good one.

In [42]:
B_LUGGA_SQ = exp.Beta( 'B_LUGGA_SQ', 0, None, None, 0)
LUGGA_SQ = LUGGAGE**2

A warning from the creator of the biogeme package:
when we define the parameters of our model, we store them into python variables.
The author strongly recomments using the same name for the python variable
than for the parameter.
For example, while we could have define the variable following the code in the next cell, it is not recommended. I imagine that this could cause some confusion later on.



In [43]:
#doing this is not recommended!
car_cte = exp.Beta( 'ASC_CAR' ,0, None , None ,0)

We now define the utility functions for each alternative, more speficifically the linear relationship between the variables and the observed component of the utility. The are the $V_j$ in our model.

In [44]:
V1 = ASC_TRAIN + B_TIME * TRAIN_TT + B_COST * TRAIN_CO 

V2 = ASC_SM + B_TIME * SM_TT + B_COST * SM_CO 

V3 = ASC_CAR + B_TIME * CAR_TT + B_COST * CAR_CO + B_LUGGA_SQ*LUGGA_SQ


We have to create a dictionary that maps the utility functions to the numbers that identify the alternatives in the database.
In this case 1 for Train, 2 for Swissmetro, 3 for car.

In [45]:
V = {1: V1 ,
2: V2 ,
3: V3 }

We have to pass availabilities, these are the indicator variables signaling if the option is available for that individual. Remember that the multinomial does not need to have all alternatives present for all individuals, it can recover the model from data even if the full choice set is not available for all individuals. In most normal situations, availabilities can be set to one, becuase individuals observe all alternatives. **An example when this does npt happen can be a survey with many alternatives, and not all are presented to the individual at the same time.**

In [46]:
av = {1: TRAIN_AV,
2: SM_AV,
3: CAR_AV }

This is the definition of the model, in this case the multinomial logit (we will use other models later).

In [47]:
logprob = models.loglogit (V , av , CHOICE )

And finally we pack everything together in the biogeme object.

In [48]:
bgm_model = bio.BIOGEME ( bgm_swissmetro, logprob )

We can give a name to the model, this can help identifying the model when we come back to it later, for example when we save it to a file and want to use it in another report.

In [49]:
bgm_model.modelName = 'my first multinomial logit'

# Estimation of the model

Everythin is set, biogeme will kindly do maximum likelihood estimation for us.

In [50]:
results = bgm_model.estimate()

# Results of the model

We can check a basic summary of the estimated model, likelihoods, information
criterions, etc.

In [51]:
results.getGeneralStatistics()

{'Number of estimated parameters': GeneralStatistic(value=5, format=''),
 'Sample size': GeneralStatistic(value=10719, format=''),
 'Excluded observations': GeneralStatistic(value=9, format=''),
 'Init log likelihood': GeneralStatistic(value=-11434.723321665173, format='.7g'),
 'Final log likelihood': GeneralStatistic(value=-8901.037009384283, format='.7g'),
 'Likelihood ratio test for the init. model': GeneralStatistic(value=5067.37262456178, format='.7g'),
 'Rho-square for the init. model': GeneralStatistic(value=0.22157827880980363, format='.3g'),
 'Rho-square-bar for the init. model': GeneralStatistic(value=0.22114101418526078, format='.3g'),
 'Akaike Information Criterion': GeneralStatistic(value=17812.074018768566, format='.7g'),
 'Bayesian Information Criterion': GeneralStatistic(value=17848.472884502025, format='.7g'),
 'Final gradient norm': GeneralStatistic(value=0.060229431283387626, format='.4E'),
 'Nbr of threads': GeneralStatistic(value=2, format='')}

The value of the parameters and the p-values for their statistical significance.
**Note that ASC_SM is not shown, it was fixed to 0 by us.**

In [52]:
results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,0.288146,0.042592,6.765302,1.330314e-11
ASC_TRAIN,-0.537436,0.053722,-10.004045,0.0
B_COST,0.000158,1.7e-05,9.170692,0.0
B_LUGGA_SQ,-0.107936,0.025667,-4.205187,2.608666e-05
B_TIME,-0.012348,0.00063,-19.593269,0.0


We can recover the values for the parameters in a dictionary.

In [53]:
results.getBetaValues()

{'ASC_CAR': 0.2881462968681277,
 'ASC_TRAIN': -0.5374357600867589,
 'B_COST': 0.00015829819061999233,
 'B_LUGGA_SQ': -0.10793620108633975,
 'B_TIME': -0.012348360228362132}

# Exercises




## 1)Do you notice something unexepected in the sign of the cost parameter? Perhaps it has to do with another variable, `GA` the seasonal pass for public transport, a indicator variable of whether the individual has bought the seasonal pass or not. Create new variables for the cost of train and swissmetro that make the cost 0 if the user has bought the seasonal pass and keep the cost intact otherwise. *Please remove the 'luggage squared' variable.* Then fit the same model using those variables for their corresponding alternatives and check the results. 

The way to do this is to create an indicator variable for 'did not buy a seasonal pass', a varible that takes the value 1 if the used did **not** buy the seasonal pass, 0 otherwise. We can do that several ways, one is by taking the condition 'equals 0' on the original GA, `GA == 0`. Boolean values False/True are transformed into 0/1 integers when computed. Another option, looking at the definition could be transform GA in (1-GA). The second step is to multiply the value of the original cost variables by this new indicator.

In the next cell we will also create an auxiliary function to avoid repeating the code that creates and estimates the biogeme model. We will reuse it in other exercises. This auxiliary function takes the utility functions as input and outputs the results of the biogeme estimation.

Finally, we will print the estimated parameters.

In [54]:
TRAIN_CO_GA = TRAIN_CO * (GA ==0 )
SM_CO_GA = TRAIN_CO * (GA == 0)

V1 = ASC_TRAIN + B_TIME * TRAIN_TT + B_COST * TRAIN_CO_GA 
V2 = ASC_SM + B_TIME * SM_TT + B_COST * SM_CO_GA 
V3 = ASC_CAR + B_TIME * CAR_TT + B_COST * CAR_CO # removing lugga+ B_LUGGA_SQ*LUGGA_SQ

def estimate_swissmetro_from_V(V1, V2, V3):
 V = {1: V1 ,
  2: V2 ,
  3: V3 }

 av = {1: TRAIN_AV,
  2: SM_AV,
  3: CAR_AV }

 logprob = models.loglogit (V , av , CHOICE )
 bgm_model = bio.BIOGEME ( bgm_swissmetro, logprob )
 bgm_model.modelName = 'my first multinomial logit'
 results = bgm_model.estimate()
 return results

resGA = estimate_swissmetro_from_V(V1, V2, V3)
resGA.getEstimatedParameters()



Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,0.121867,0.037615,3.239882,0.001196
ASC_TRAIN,-0.581335,0.054404,-10.685587,0.0
B_COST,-0.00897,0.000579,-15.479236,0.0
B_TIME,-0.012065,0.000641,-18.812258,0.0



##2) Create a new model by *including two new variables*, you can use the ones existing or create new ones. Then estimate the model and compare it to the previous one. How can we compare them? Which one is better?


We add the variables 
'MALE', the indicator of gender, and 'INCOME' a varible that approximates the income of the individual into 5 different sections. We will 'clean' the 'INCOME' variable by assigning the 'unknown' income section into the lower section. We can do this by creating an indicator variable for the unkown incomes and the subtracting it to the actual 'INCOME' multiplied by 4, this makes the rows with 'INCOME 4' into 'INCOME 0'. Values of INCOME 0 are turned into 1 to remove the ambiguety in the definition of the variable (0 and 1 imply the same section of income in the description of the dataset). *These variables are arbitrarily chosen, you could have chosen other variables, or other transformations of the same variables. One thing that is important, however, is to handle invalid values in some way. In this case the invalid values for income are assigned to the lower section of income ranges. Leaving the variable completely unchanged would be a small error.*

We will begin with an example of doing it **wrong, there will be a problem with the model.** 
What is wrong with the following modelling is that we will be adding the socioeconomical variables using one parameter for all alternatives, similar to cost and travel time. *The problem here is that socieconomical variables have the same value for all alternatives, so modelling like this does not help with distinguishing the utilities, it will add the same amout of utility to all alternatives!.*
We can see that the coefficients have values of 0.

In [55]:
#BAD example, socieconomic variables are shared among alternatives
B_MALE = exp.Beta( 'B_MALE', 0, None, None, 0)
B_INCOME = exp.Beta( 'B_INCOME', 0, None, None, 0)
INCOME_FIX = INCOME - 4*(INCOME==4)
INCOME_FIX = INCOME_FIX + (INCOME==0)


tV1 = ASC_TRAIN + B_TIME * TRAIN_TT + B_COST * TRAIN_CO_GA + B_MALE*MALE + B_INCOME*INCOME_FIX
tV2 = ASC_SM + B_TIME * SM_TT + B_COST * SM_CO_GA + B_MALE*MALE + B_INCOME*INCOME_FIX 
tV3 = ASC_CAR + B_TIME * CAR_TT + B_COST * CAR_CO + B_MALE*MALE + B_INCOME*INCOME_FIX

resTwoNewVar = estimate_swissmetro_from_V(tV1, tV2, tV3)
resTwoNewVar.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,0.121867,0.03761456,3.239882,0.001196
ASC_TRAIN,-0.581335,0.05440365,-10.685587,0.0
B_COST,-0.00897,0.0005794696,-15.479236,0.0
B_INCOME,0.0,5.730742e-20,0.0,1.0
B_MALE,0.0,1.797693e+308,0.0,1.0
B_TIME,-0.012065,0.000641326,-18.812258,0.0


We need to add one *parameter per alternative*, to signify that these variables have different influence depending on the alternative.

In [56]:
#one example, per alternative socieocon: 
B_MALE_TR = exp.Beta( 'B_MALE_TR', 0, None, None, 0)
B_MALE_SM = exp.Beta( 'B_MALE_SM', 0, None, None, 0)
B_MALE_CAR = exp.Beta( 'B_MALE_CAR', 0, None, None, 0)
B_INCOME_TR = exp.Beta( 'B_INCOME_TR', 0, None, None, 0)
B_INCOME_SM = exp.Beta( 'B_INCOME_SM', 0, None, None, 0)
B_INCOME_CAR = exp.Beta( 'B_INCOME_CAR', 0, None, None, 0)

tV1 = ASC_TRAIN + B_TIME * TRAIN_TT + B_COST * TRAIN_CO_GA + B_MALE_TR*MALE + B_INCOME_TR*INCOME_FIX
tV2 = ASC_SM + B_TIME * SM_TT + B_COST * SM_CO_GA + B_MALE_SM*MALE + B_INCOME_SM*INCOME_FIX
tV3 = ASC_CAR + B_TIME * CAR_TT + B_COST * CAR_CO + B_MALE_CAR*MALE+ B_INCOME_CAR*INCOME_FIX

resTwoNewVar = estimate_swissmetro_from_V(tV1, tV2, tV3)
resTwoNewVar.getEstimatedParameters()


Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,-0.040901,0.076764,-0.532808,0.5941663
ASC_TRAIN,0.557575,0.073494,7.586677,3.28626e-14
B_COST,-0.008979,0.000565,-15.904848,0.0
B_INCOME_CAR,0.13601,0.019583,6.945128,3.781198e-12
B_INCOME_SM,0.184551,0.016511,11.177324,0.0
B_INCOME_TR,-0.320561,0.021662,-14.798151,0.0
B_MALE_CAR,0.335318,0.044597,7.518932,5.528911e-14
B_MALE_SM,0.059968,0.036171,1.657906,0.09733635
B_MALE_TR,-0.395286,0.043449,-9.097735,0.0
B_TIME,-0.011541,0.000644,-17.926625,0.0


Note the coefficients look much better! We can look at the interpretation.
Incomne has a positive relationship with car and Swissmetro, meaning that the larger the income, the more likely it makes individuals to go for these two alternatives. The coefficient of income for train is negative, the larger the income, the less likel to go or car. The effect of income is larger for swissmetro than for car.

The influence of gender: Being male makes it more likely to choose car, and less likely to go for train. The effect of swismetro is positive, but small.

**Important: We could have gotten all positive coefficients for the 'is male' variable, and still get the same conclusion. Why is that?** *We go back to the notion  of choice probabilities not being affected by adding a constant to the  observed utilities of each alternative. For example, if being male adds $2, 1, -1$ respectively to car, swissmetro and train, adding $4, 3, 1$ can produce the same results in terms of choice probabilities. It will depend on the other variables in the model, particular initial/reference values, etc.* **This is important if we get all positive values and interpret it as 'being male makes taking car, swissmetro and train more likely', it is clearly nonsense in our scenario, because you cannot make all alternatives more likely. So we have to always take into account one of the alternatives as reference.**


---

The second part of the question is to compare to the previous model, the one that used only travel cost and travel time.

To do a **non rigorous** comparison of which model does better we can use the value of the likelihood, larger 'Final log likelihood' implies a better model.
Another comparison is the prediction accuracy. 

In [57]:
resTwoNewVar.getGeneralStatistics()

{'Number of estimated parameters': GeneralStatistic(value=10, format=''),
 'Sample size': GeneralStatistic(value=10719, format=''),
 'Excluded observations': GeneralStatistic(value=9, format=''),
 'Init log likelihood': GeneralStatistic(value=-8687.476203942184, format='.7g'),
 'Final log likelihood': GeneralStatistic(value=-8462.046217507232, format='.7g'),
 'Likelihood ratio test for the init. model': GeneralStatistic(value=450.8599728699046, format='.7g'),
 'Rho-square for the init. model': GeneralStatistic(value=0.02594884649383644, format='.3g'),
 'Rho-square-bar for the init. model': GeneralStatistic(value=0.024797764204199435, format='.3g'),
 'Akaike Information Criterion': GeneralStatistic(value=16944.092435014463, format='.7g'),
 'Bayesian Information Criterion': GeneralStatistic(value=17016.89016648138, format='.7g'),
 'Final gradient norm': GeneralStatistic(value=0.04666203481406509, format='.4E'),
 'Nbr of threads': GeneralStatistic(value=2, format='')}

In [58]:
resGA.getGeneralStatistics()

{'Number of estimated parameters': GeneralStatistic(value=4, format=''),
 'Sample size': GeneralStatistic(value=10719, format=''),
 'Excluded observations': GeneralStatistic(value=9, format=''),
 'Init log likelihood': GeneralStatistic(value=-8976.716285827873, format='.7g'),
 'Final log likelihood': GeneralStatistic(value=-8687.476203942184, format='.7g'),
 'Likelihood ratio test for the init. model': GeneralStatistic(value=578.480163771379, format='.7g'),
 'Rho-square for the init. model': GeneralStatistic(value=0.03222114553651778, format='.3g'),
 'Rho-square-bar for the init. model': GeneralStatistic(value=0.03177554829665463, format='.3g'),
 'Akaike Information Criterion': GeneralStatistic(value=17382.952407884368, format='.7g'),
 'Bayesian Information Criterion': GeneralStatistic(value=17412.071500471135, format='.7g'),
 'Final gradient norm': GeneralStatistic(value=0.04606492046595142, format='.4E'),
 'Nbr of threads': GeneralStatistic(value=2, format='')}

Because these models are **nested** (the second model includes all variables of the first), we can make the comparison more rigorous by doing a statistical test. The common test is the likelihood ratio test, the null hypothesis is that the more complex model is not significantly better than the reference model.
We test if the likelihood of the complex model is larger enough than the likelihood of the reference model. By construction a model that is more complex and includes all the variables of the reference will always have the same or better likelihood of the reference, because it just has more parameters to fit the same data. We check if it is sufficiently better by considering the number of extra parameters that the more complex model is using.
The test statistics is $2(\text{Loglikelihood}_\text{complex model} - \text{Loglikelihood}_\text{reference model})$ and it should be distributed as a Chi-squared distribution with the degrees of freedom equal to the difference in the number of parameters.

Below the code to do the tests, in two ways
 * Looking at the results of getGeneralStatistics() and manually calculating the p-value from the Chi-squared

 * Using the implementation of the likelihood ratio test on the biogeme package.
   We can access the values programmatically in the element .data of the biogeme results data structure. We have to pass loglikelihoods and number of parameters of the models to be compared, and then the significance level.

In [59]:
from scipy import stats
import biogeme.tools as tools

#manually using the chi-square distribution and copy-pasting the results from getGeneralStatistics()
1 - stats.chi2.cdf( 2*(-8462 - -8687), 6)

#using the implementation provided by biogeme, in the package biogeme.tools
#we take the values from the results object
tools.likelihood_ratio_test( (resTwoNewVar.data.logLike, resTwoNewVar.data.nparam),
                            (resGA.data.logLike, resGA.data.nparam), 0.05)

LRTuple(message='H0 can be rejected at level 5.0%', statistic=450.8599728699046, threshold=12.591587243743977)

An it seems that the second model is clearly better.

**One caveat of hypothesis tests:** To really trust the results of the test, we should have established the two models to be compared **before seeing the data**. If we have the dataset and try and try different models until we get one that is significantly better than the reference, we are falling into the problem of multiple hypothesis testing. This is a huge problem in practice! Can lead to spurious results, 'findings' than then will not translate to practice.

Another way is to use a holdout set, part of the data that is not used for fitting the model, only to check accuracies and likelihoods. This is the preferred approach (when available), but we must also be careful not to use it to find a good model by trial an error. Ideally should be used to compare models only once. When using a holdout set, the number of parameters can be ignored for the purposes of comparing likelihoods or accuracies. In summary, using holdout sets follows the 'scientific method', first generate the hypothsis using some data (the training data used to fit the models), then test the hypothesis using new data (the holdout). 

**Final observation: When should we use shared or per-alternative parameters for the attributes of the alternatives?**
As a general guideline, if we have enough data, it is better to use per-alternative parameters. If we require a very simple interpretation, we might simplify and use shared parameters.


##3) Create a model that calculates a specific coefficient for the price and travel time of each alternative. Check the model and calculate willingness to pay for the travel time of each alternative (*you can take a look at the definition in lecture 2)*.

We should know how to calcule per-alternative parameters by now.

In [60]:
B_TT_TR = exp.Beta( 'B_TT_TR', 0, None, None, 0)
B_TT_SM = exp.Beta( 'B_TT_SM', 0, None, None, 0)
B_TT_CAR = exp.Beta( 'B_TT_CAR', 0, None, None, 0)
B_COST_TR = exp.Beta( 'B_COST_TR', 0, None, None, 0)
B_COST_SM = exp.Beta( 'B_COST_SM', 0, None, None, 0)
B_COST_CAR = exp.Beta( 'B_COST_CAR', 0, None, None, 0)

wV1 = ASC_TRAIN + B_TT_TR * TRAIN_TT + B_COST_TR * TRAIN_CO_GA 
wV2 = ASC_SM + B_TT_SM * SM_TT + B_COST_SM * SM_CO_GA 
wV3 = ASC_CAR + B_TT_CAR * CAR_TT + B_COST_CAR * CAR_CO



reswtp = estimate_swissmetro_from_V(wV1, wV2, wV3)
reswtp.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
ASC_CAR,-0.542838,0.071853,-7.554822,4.196643e-14
ASC_TRAIN,0.009602,0.085123,0.112804,0.9101856
B_COST_CAR,-0.006144,0.000963,-6.378496,1.788354e-10
B_COST_SM,-0.008572,0.000591,-14.516322,0.0
B_COST_TR,-0.017624,0.001297,-13.587535,0.0
B_TT_CAR,-0.010222,0.000931,-10.979893,0.0
B_TT_SM,-0.013987,0.000998,-14.007978,0.0
B_TT_TR,-0.013847,0.000898,-15.412997,0.0


The willingness to pay, in linear utility (what we have now) can be summarized as the coefficient of the variable of interest for that alternative divided by the cost coefficient for that alternative.

We can compute them manually, here the tuple for WTP of car, swissmetro and train.

In [61]:
(-0.010222 / -0.006145,

-0.013987 / -0.008572,
 
 -0.013987 / -0.017623)


(1.66346623270952, 1.631707886140924, 0.7936787153152131)

The willingness to pay to reduce he travel time in car is he highest and train is the lowest. The inepretation is that indivuals are willing to pay 1.7 francs to reduce their travel time in one minute, it is the most 'annoying' of the methods of transport. I find surprising that the swissmetro, given that is is a 'luxury' fast train, is more annoying than the train. Of course the data comes from a survey so the individuals have not actually experienced the fast train. 

We can calculate them programmatrically by recovering the dictionary of the parameters.

In [62]:

Betas = reswtp.getBetaValues()
(Betas['B_TT_CAR'] / Betas['B_COST_CAR'],
 Betas['B_TT_SM'] / Betas['B_COST_SM'],
 Betas['B_TT_TR'] / Betas['B_COST_TR'])

(1.663587712938376, 1.6316073735990617, 0.7856905146678734)

Remember that there is a more general definition of willingess to pay, the ratio of the derivatives of utility w.r.t to the variable of interest and cost.

A final note, the WTP will depend on the specific model that we use, so it can vary wildly.