# Module biogeme.biogeme 

## Examples of use of each function

This webpage is for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2022-08-09 11:12:23.423610


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.9b [2022-08-09]
Version entirely written in Python
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import Beta, Variable, exp

Define the verbosity of Biogeme

In [4]:
import biogeme.messaging as msg
logger = msg.bioMessage()
logger.setDetailed()

##  Definition of a database

In [5]:
df = pd.DataFrame({'Person':[1,1,1,2,2],
                   'Exclude':[0,0,1,0,1],
                   'Variable1':[1,2,3,4,5],
                   'Variable2':[10,20,30,40,50],
                   'Choice':[1,2,3,1,2],
                   'Av1':[0,1,1,1,1],
                   'Av2':[1,1,1,1,1],
                   'Av3':[0,1,1,1,1]})
myData = db.Database('test', df)

## Definition of various expressions

In [6]:
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -beta1**2 * Variable1 - exp(beta2 * beta1) \
    * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike': likelihood, 
                     'beta1': beta1,
                     'simul': simul}

## Creation of the BIOGEME object

In [7]:
myBiogeme = bio.BIOGEME(myData, dictOfExpressions)
myBiogeme.modelName = 'simpleExample'
print(myBiogeme)

[11:12:24] < Detailed >  It is suggested to scale the following variables.
[11:12:24] < Detailed >  Multiply Variable2 by	0.01 because the largest (abs) value is	50
[11:12:24] < Detailed >  To remove this feature, set the parameter suggestScales to False when creating the BIOGEME object.
simpleExample: database [test]{'loglike': ((((-(beta1(init=-1.0)[elid:0 id:0] ** `2.0`)) * Variable1 [elid:4 id:2]) - (exp((beta2(init=2.0)[elid:1 id:1] * beta1(init=-1.0)[elid:0 id:0])) * Variable2 [elid:5 id:3])) - (beta2(init=2.0)[elid:1 id:1] ** `4.0`)), 'beta1': beta1(init=-1.0)[elid:0 id:0], 'simul': ((beta1(init=-1.0)[elid:0 id:0] / Variable1 [elid:4 id:2]) + (beta2(init=2.0)[elid:1 id:1] / Variable2 [elid:5 id:3]))}
simpleExample: database [test]{'loglike': ((((-(beta1(init=-1.0)[elid:0 id:0] ** `2.0`)) * Variable1 [elid:4 id:2]) - (exp((beta2(init=2.0)[elid:1 id:1] * beta1(init=-1.0)[elid:0 id:0])) * Variable2 [elid:5 id:3])) - (beta2(init=2.0)[elid:1 id:1] ** `4.0`)), 'beta1': beta1(init=-1.0

Note that, by default, Biogeme removes the unused variables from the database to optimize space.

In [8]:
myBiogeme.database.data.columns

Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
       'Av3'],
      dtype='object')

## calculateInitLikelihood

In [9]:
myBiogeme.calculateInitLikelihood()

[11:12:24] < Detailed >  Log likelihood (N = 5):  -115.3003


-115.30029248549191

## calculateLikelihood

In [10]:
x = myBiogeme.id_manager.free_betas_values
xplus = [v + 1 for v in x]
print(xplus)

[0.0, 3.0]


In [11]:
myBiogeme.calculateLikelihood(xplus, scaled=True)

[11:12:24] < Detailed >  Log likelihood (N = 5):       -555


-111.0

It is possible to calculate the likelihood based only on a sample of the data

In [12]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.5)

[11:12:24] < Detailed >  Use 50.0% of the data.
[11:12:24] < Detailed >  Log likelihood (N = 2):       -555


-277.5

In [13]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1


In [14]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[11:12:24] < Detailed >  Use 60.0% of the data.
[11:12:24] < Detailed >  Log likelihood (N = 3):       -555


-185.0

In [15]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
4,2,1,5,50,2,1,1,1
3,2,0,4,40,1,1,1,1
2,1,1,3,30,3,1,1,1


By default, each observation has the same probability to be selected in the sample. It is possible to define the selection probability to be proportional to the values of a column of the database, using the parameter 'weights'. 

In [16]:
myBiogeme.columnForBatchSamplingWeights = 'Variable2'
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[11:12:24] < Detailed >  Use 60.0% of the data.
[11:12:24] < Detailed >  Log likelihood (N = 3):       -555


-185.0

In [17]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
1,1,0,2,20,2,1,1,1
2,1,1,3,30,3,1,1,1
4,2,1,5,50,2,1,1,1


## calculateLikelihoodAndDerivatives

In [18]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=True,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[11:12:24] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f = -111.0
g = [ -90. -108.]
h = [[-270.  -30.]
 [ -30. -108.]]
bhhh = [[ 9900.  9720.]
 [ 9720. 11664.]]


Now the unscaled version

In [19]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=False,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[11:12:24] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f = -555.0
g = [-450. -540.]
h = [[-1350.  -150.]
 [ -150.  -540.]]
bhhh = [[49500. 48600.]
 [48600. 58320.]]


Using only a sample of the data

In [20]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=True,
                                      batch=0.5,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[11:12:24] < Detailed >  Use 50.0% of the data.
[11:12:24] < General >   Log likelihood (N = 2):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f = -277.5
g = [-225. -270.]
h = [[-675.  -75.]
 [ -75. -270.]]
bhhh = [[24750. 24300.]
 [24300. 29160.]]


## likelihoodFiniteDifferenceHessian

In [21]:
myBiogeme.likelihoodFiniteDifferenceHessian(xplus)

[11:12:24] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[11:12:24] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[11:12:24] < General >   Log likelihood (N = 5):  -555.0002 Gradient norm:      7e+02  


array([[-1380.00020229,  -150.        ],
       [ -150.0000451 ,  -540.00005396]])

## checkDerivatives

In [22]:
f, g, h, gdiff, hdiff = myBiogeme.checkDerivatives(verbose=True)

[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < Detailed >  x		Gradient	FinDiff		Difference
[11:12:24] < Detailed >  beta1          	-1.060058E+01	-1.060058E+01	-5.427932E-06
[11:12:24] < Detailed >  beta2          	-1.396997E+02	-1.396997E+02	+2.608000E-05
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:

In [23]:
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'gdiff = {gdiff}')
print(f'hdiff = {hdiff}')
hdiff

f = -115.30029248549191
g = [ -10.60058497 -139.69970751]
h = [[-111.20116994   20.30029249]
 [  20.30029249 -260.30029249]]
gdiff = [-5.42793187e-06  2.60800035e-05]
hdiff = [[-8.04552171e-06  7.36597983e-09]
 [-1.61387920e-07  2.22928137e-05]]


array([[-8.04552171e-06,  7.36597983e-09],
       [-1.61387920e-07,  2.22928137e-05]])

## estimate

During estimation, it is possible to save intermediate results, in case the estimation must be interrupted. 

In [24]:
results = myBiogeme.estimate(bootstrap=10)

[11:12:24] < General >   *** Initial values of the parameters are obtained from the file __simpleExample.iter
[11:12:24] < Detailed >  Parameter values restored from __simpleExample.iter
[11:12:24] < Detailed >  Log likelihood (N = 5):  -115.3003
[11:12:24] < Detailed >  ** Optimization: Newton with trust region for simple bounds
[11:12:24] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[11:12:24] < Detailed >  Log likelihood (N = 5):  -69.98205
[11:12:24] < General >   Log likelihood (N = 5):  -69.98205 Gradient norm:      3e+01 Hessian norm:       2e+02 
[11:12:24] < Detailed >  1 f=  13.99641 projected rel. grad.=  0.28 rel. change=  0.38 delta=     2 rho=   1.1 ++
[11:12:24] < Detailed >  Log likelihood (N = 5):  -67.07892
[11:12:24] < General >   Log likelihood (N = 5):  -67.07892 Gradient norm:          2 Hessian norm:       2e+02 
[11:12:24] < Detailed >  2 f=  13.41578 projected rel. grad.= 0.021 rel. change=  0.15 delta=  

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 331.01it/s]

[11:12:24] < General >   Results saved in file simpleExample~03.html
[11:12:24] < General >   Results saved in file simpleExample~03.pickle





In [25]:
results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
beta1,-1.273264,0.013724,-92.776664,0.0
beta2,1.248769,0.059086,21.134794,0.0


If the model has already been estimated, it is possible to recycle the estimation results. In that case, the other arguments are ignored, and the results are whatever is in the file.

In [26]:
recycled_results = myBiogeme.estimate(recycle=True, bootstrap=10)

[11:12:25] < General >   Estimation results read from simpleExample~03.pickle


In [27]:
print(recycled_results.shortSummary())

Results for model simpleExample
Nbr of parameters:		2
Sample size:			5
Excluded data:			0
Final log likelihood:		-67.06549
Akaike Information Criterion:	138.131
Bayesian Information Criterion:	137.3499



In [28]:
recycled_results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
beta1,-1.273264,0.013724,-92.776664,0.0
beta2,1.248769,0.059086,21.134794,0.0


## simulate

Simulate with the default values for the parameters

In [29]:
simulationWithDefaultBetas = myBiogeme.simulate()
simulationWithDefaultBetas

Unnamed: 0,loglike,beta1,simul
0,-23.060064,-1.0,-0.266667
1,-27.766769,-1.0,-0.16
2,-20.706712,-1.0,-0.4
3,-25.413417,-1.0,-0.2
4,-23.060064,-1.0,-0.266667


Simulate with the estimated values for the parameters

In [30]:
print(results.getBetaValues())
simulationWithEstimatedBetas =\
    myBiogeme.simulate(results.getBetaValues())
simulationWithEstimatedBetas

{'beta1': -1.2732639872136933, 'beta2': 1.2487688099301195}


Unnamed: 0,loglike,beta1,simul
0,-13.413098,-1.273264,-0.382796
1,-20.733962,-1.273264,-0.229677
2,-9.752666,-1.273264,-0.574194
3,-17.07353,-1.273264,-0.287097
4,-13.413098,-1.273264,-0.382796


## confidenceIntervals

In [31]:
drawsFromBetas =\
    results.getBetasForSensitivityAnalysis(myBiogeme.id_manager.free_betas.names)
left, right = myBiogeme.confidenceIntervals(drawsFromBetas)
left

Unnamed: 0,loglike,beta1,simul
0,-13.563315,-1.291034,-0.391319
1,-21.261787,-1.291034,-0.234791
2,-10.034047,-1.291034,-0.586978
3,-17.387198,-1.291034,-0.293489
4,-13.563315,-1.291034,-0.391319


In [32]:
right

Unnamed: 0,loglike,beta1,simul
0,-13.413098,-1.255223,-0.374218
1,-20.443982,-1.255223,-0.224531
2,-9.63802,-1.255223,-0.561327
3,-16.974003,-1.255223,-0.280663
4,-13.413098,-1.255223,-0.374218


## validate

The validation consists in organizing the data into several slices of about the same size, randomly defined. 
Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.   

In [33]:
logger.setSilent()
validationData = myData.split(slices=5)
validation_results = myBiogeme.validate(results, validationData)

In [34]:
validation_results

[   Loglikelihood
 3     -17.145326,
    Loglikelihood
 0       -6.34111,
    Loglikelihood
 2     -13.413098,
    Loglikelihood
 4     -21.037421,
    Loglikelihood
 1      -9.817721]

In [35]:
for slide in validation_results:
    print(f'Log likelihood for {slide.shape[0]} '
          f'validation data: {slide["Loglikelihood"].sum()}')


Log likelihood for 1 validation data: -17.14532644602357
Log likelihood for 1 validation data: -6.34111029142699
Log likelihood for 1 validation data: -13.413098095892842
Log likelihood for 1 validation data: -21.037421362921453
Log likelihood for 1 validation data: -9.817721165500888


## files_of_type

In [36]:
myBiogeme.files_of_type('pickle')

['simpleExample.pickle',
 'simpleExample~02.pickle',
 'simpleExample~00.pickle',
 'simpleExample~03.pickle',
 'simpleExample~01.pickle']