# class BIOGEME: examples of use of each function

This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2020-04-28 19:48:27.581143


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.6a [2020-04-28]
Version entirely written in Python
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import Beta, Variable, exp

Define the verbosity of Biogeme

In [4]:
import biogeme.messaging as msg
logger = msg.bioMessage()
logger.setDetailed()

##  Definition of a database

In [5]:
df = pd.DataFrame({'Person':[1,1,1,2,2],
                   'Exclude':[0,0,1,0,1],
                   'Variable1':[1,2,3,4,5],
                   'Variable2':[10,20,30,40,50],
                   'Choice':[1,2,3,1,2],
                   'Av1':[0,1,1,1,1],
                   'Av2':[1,1,1,1,1],
                   'Av3':[0,1,1,1,1]})
myData = db.Database('test',df)

## Definition of various expressions

In [6]:
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
beta1 = Beta('beta1',-1.0,-3,3,0)
beta2 = Beta('beta2',2.0,-3,10,0)
likelihood = -beta1**2 * Variable1 - exp(beta2*beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike':likelihood,'beta1':beta1,'simul':simul}

## Creation of the BIOGEME object

In [7]:
myBiogeme = bio.BIOGEME(myData,dictOfExpressions)
myBiogeme.modelName = 'simpleExample'
print(myBiogeme)

[19:48:28] < General >   Remove 6 unused variables from the database as only 2 are used.
[19:48:28] < Detailed >  It is suggested to scale the following variables.
[19:48:28] < Detailed >  Multiply Variable2 by	0.01because the largest (abs) value is	50
[19:48:28] < Detailed >  To remove this feature, set the parametersuggestScales to False when creating theBIOGEME object.
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}


Note that, by default, Biogeme removes the unused variables from the database to optimize space.

In [8]:
myBiogeme.database.data.columns

Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
       'Av3'],
      dtype='object')

## calculateInitLikelihood

In [9]:
myBiogeme.calculateInitLikelihood()

[19:48:28] < General >   Log likelihood (N = 5):  -115.3003


-115.30029248549191

## calculateLikelihood

In [10]:
x = myBiogeme.betaInitValues
xplus = [v+1 for v in x]
print(xplus)

[0.0, 3.0]


In [11]:
myBiogeme.calculateLikelihood(xplus,scaled=True)

[19:48:28] < General >   Log likelihood (N = 5):       -555


-111.0

It is possible to calculate the likelihood based only on a sample of the data

In [12]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.5)

[19:48:28] < Detailed >  Use 50.0% of the data.
[19:48:28] < General >   Log likelihood (N = 2):       -232


-116.0

In [13]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
2,1,1,3,30,3,1,1,1
3,2,0,4,40,1,1,1,1


In [14]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[19:48:28] < Detailed >  Use 60.0% of the data.
[19:48:28] < General >   Log likelihood (N = 3):       -343


-114.33333333333333

In [15]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
4,2,1,5,50,2,1,1,1
0,1,0,1,10,1,0,1,0
3,2,0,4,40,1,1,1,1


By default, each observation has the same probability to be selected in the sample. It is possible to define the selection probability to be proportional to the values of a column of the database, using the parameter 'weights'. 

In [16]:
myBiogeme.columnForBatchSamplingWeights = 'Variable2'
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[19:48:28] < Detailed >  Use 60.0% of the data.
[19:48:28] < General >   Log likelihood (N = 3):       -343


-114.33333333333333

In [17]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
3,2,0,4,40,1,1,1,1
4,2,1,5,50,2,1,1,1
0,1,0,1,10,1,0,1,0


## calculateLikelihoodAndDerivatives

In [18]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=True,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[19:48:28] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f=-111.0
g=[ -90. -108.]
h=[[-270.  -30.]
 [ -30. -108.]]
bhhh=[[ 9900.  9720.]
 [ 9720. 11664.]]


Now the unscaled version

In [19]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=False,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[19:48:28] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f=-555.0
g=[-450. -540.]
h=[[-1350.  -150.]
 [ -150.  -540.]]
bhhh=[[49500. 48600.]
 [48600. 58320.]]


Using only a sample of the data

In [20]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=True,batch=0.5,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[19:48:28] < Detailed >  Use 50.0% of the data.
[19:48:28] < General >   Log likelihood (N = 2):       -252 Gradient norm:      3e+02 Hessian norm:       8e+02 BHHH norm:       6e+04
f=-126.0
g=[-135. -108.]
h=[[-405.  -45.]
 [ -45. -108.]]
bhhh=[[18450. 14580.]
 [14580. 11664.]]


## likelihoodFiniteDifferenceHessian

In [21]:
myBiogeme.likelihoodFiniteDifferenceHessian(xplus)

[19:48:28] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[19:48:28] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[19:48:28] < General >   Log likelihood (N = 5):  -555.0002 Gradient norm:      7e+02  


array([[-1380.00020229,  -150.        ],
       [ -150.0000451 ,  -540.00005396]])

## checkDerivatives

In [22]:
f,g,h,gdiff,hdiff = myBiogeme.checkDerivatives(verbose=True)

[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < Detailed >  x		Gradient	FinDiff		Difference
[19:48:28] < Detailed >  beta1          	-1.060058E+01	-1.060058E+01	-5.427932E-06
[19:48:28] < Detailed >  beta2          	-1.396997E+02	-1.396997E+02	+2.608000E-05
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:

In [23]:
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'gdiff={gdiff}')
print(f'hdiff={hdiff}')
hdiff

f=-115.30029248549191
g=[ -10.60058497 -139.69970751]
h=[[-111.20116994   20.30029249]
 [  20.30029249 -260.30029249]]
gdiff=[-5.42793187e-06  2.60800035e-05]
hdiff=[[-8.04552172e-06  7.36597983e-09]
 [-1.61387920e-07  2.22928137e-05]]


array([[-8.04552172e-06,  7.36597983e-09],
       [-1.61387920e-07,  2.22928137e-05]])

## estimate

During estimation, it is possible to save intermediate results, in case the estimation must be interrupted. 

In [24]:
results = myBiogeme.estimate(bootstrap=10,saveIterations=True)

[19:48:28] < General >   Log likelihood (N = 5):  -115.3003
[19:48:28] < Detailed >  ** Optimization: Newton with trust region for simple bounds
[19:48:28] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[19:48:28] < General >   Log likelihood (N = 5):  -69.98205
[19:48:28] < General >   Log likelihood (N = 5):  -69.98205 Gradient norm:      3e+01 Hessian norm:       2e+02 
[19:48:28] < Detailed >  1 f=  13.99641 projected rel. grad.=  0.28 delta=    10 rho=   1.1 ++
[19:48:28] < General >   Log likelihood (N = 5):  -67.07892
[19:48:28] < General >   Log likelihood (N = 5):  -67.07892 Gradient norm:          2 Hessian norm:       2e+02 
[19:48:28] < Detailed >  2 f=  13.41578 projected rel. grad.= 0.021 delta= 1e+02 rho=     1 ++
[19:48:28] < General >   Log likelihood (N = 5):  -67.06549
[19:48:28] < General >   Log likelihood (N = 5):  -67.06549 Gradient norm:      0.007 Hessian norm:       2e+02 
[19:48:28] < Detailed >  3 f=   1

[19:48:28] < General >   Results saved in file simpleExample~ 5.html
[19:48:28] < General >   Results saved in file simpleExample~ 5.pickle


In [25]:
results.getEstimatedParameters()

Unnamed: 0,Value,Std err,t-test,p-value,Rob. Std err,Rob. t-test,Rob. p-value,Bootstrap[10] Std err,Bootstrap t-test,Bootstrap p-value
beta1,-1.273264,0.115144,-11.057997,0.0,0.013724,-92.776664,0.0,0.012981,-98.086429,0.0
beta2,1.248769,0.08483,14.720836,0.0,0.059086,21.134794,0.0,0.055816,22.372835,0.0


The values of the intermediate results saved can be retrieved as follows. 

Formula before

In [26]:
myBiogeme.loglike

((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`))

Retrieving the values

In [27]:
myBiogeme.loadSavedIteration()
myBiogeme.loglike

[19:48:28] < Detailed >  Parameter values restored from __savedIterations.txt


((((-(beta1(-1.264979774201306) ** `2`)) * Variable1) - (exp((beta2(1.2842631765105266) * beta1(-1.264979774201306))) * Variable2)) - (beta2(1.2842631765105266) ** `4`))

A file name can be given. If the file does not exist, the statement is ignored. 

In [28]:
myBiogeme.loadSavedIteration(filename='fileThatDoesNotExist.txt')



## simulate

In [29]:
# Simulate with the default values for the parameters
simulationWithDefaultBetas = myBiogeme.simulate()
simulationWithDefaultBetas

Unnamed: 0,loglike,beta1,simul
0,-6.290439,-1.26498,-1.136553
1,-9.860583,-1.26498,-0.568277
2,-13.430726,-1.26498,-0.378851
3,-17.00087,-1.26498,-0.284138
4,-20.571013,-1.26498,-0.227311


In [30]:
# Simulate with the estimated values for the parameters
print(results.getBetaValues())
simulationWithEstimatedBetas = myBiogeme.simulate(results.getBetaValues())
simulationWithEstimatedBetas

{'beta1': -1.273263987213694, 'beta2': 1.2487688099301162}


Unnamed: 0,loglike,beta1,simul
0,-6.092234,-1.273264,-1.148387
1,-9.752666,-1.273264,-0.574194
2,-13.413098,-1.273264,-0.382796
3,-17.07353,-1.273264,-0.287097
4,-20.733962,-1.273264,-0.229677


## confidenceIntervals

In [31]:
drawsFromBetas = results.getBetasForSensitivityAnalysis(myBiogeme.freeBetaNames)
left, right = myBiogeme.confidenceIntervals(drawsFromBetas)
left

Unnamed: 0,loglike,beta1,simul
0,-6.72424,-1.292933,-1.176524
1,-10.139517,-1.292933,-0.588262
2,-13.651433,-1.292933,-0.392175
3,-17.402913,-1.292933,-0.294131
4,-21.294566,-1.292933,-0.235305


In [32]:
right

Unnamed: 0,loglike,beta1,simul
0,-5.727954,-1.250048,-1.115176
1,-9.621623,-1.250048,-0.557588
2,-13.413148,-1.250048,-0.371725
3,-16.965789,-1.250048,-0.278794
4,-20.385346,-1.250048,-0.223035


## validate

The validation consists in organizing the data into several slices of about the same size, randomly defined. 
Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.   

In [33]:
logger.setSilent()
validationResults = myBiogeme.validate(results, slices = 2)
validationResults

[   Loglikelihood
 4     -20.571013
 1      -9.860583
 0      -6.290439,
    Loglikelihood
 2     -13.430726
 3     -17.000870]