# Module biogeme.biogeme 

## Examples of use of each function

This webpage is for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2020-06-03 09:55:38.814782


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.6 [2020-06-03]
Version entirely written in Python
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import Beta, Variable, exp

Define the verbosity of Biogeme

In [4]:
import biogeme.messaging as msg
logger = msg.bioMessage()
logger.setDetailed()

##  Definition of a database

In [5]:
df = pd.DataFrame({'Person':[1,1,1,2,2],
                   'Exclude':[0,0,1,0,1],
                   'Variable1':[1,2,3,4,5],
                   'Variable2':[10,20,30,40,50],
                   'Choice':[1,2,3,1,2],
                   'Av1':[0,1,1,1,1],
                   'Av2':[1,1,1,1,1],
                   'Av3':[0,1,1,1,1]})
myData = db.Database('test', df)

## Definition of various expressions

In [6]:
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -beta1**2 * Variable1 - exp(beta2 * beta1) \
    * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike': likelihood, 
                     'beta1': beta1,
                     'simul': simul}

## Creation of the BIOGEME object

In [7]:
myBiogeme = bio.BIOGEME(myData, dictOfExpressions)
myBiogeme.modelName = 'simpleExample'
print(myBiogeme)

[09:55:40] < General >   Remove 6 unused variables from the database as only 2 are used.
[09:55:40] < Detailed >  It is suggested to scale the following variables.
[09:55:40] < Detailed >  Multiply Variable2 by	0.01 because the largest (abs) value is	50
[09:55:40] < Detailed >  To remove this feature, set the parametersuggestScales to False when creating theBIOGEME object.
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}


Note that, by default, Biogeme removes the unused variables from the database to optimize space.

In [8]:
myBiogeme.database.data.columns

Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
       'Av3'],
      dtype='object')

## calculateInitLikelihood

In [9]:
myBiogeme.calculateInitLikelihood()

[09:55:40] < Detailed >  Log likelihood (N = 5):  -115.3003


-115.30029248549191

## calculateLikelihood

In [10]:
x = myBiogeme.betaInitValues
xplus = [v + 1 for v in x]
print(xplus)

[0.0, 3.0]


In [11]:
myBiogeme.calculateLikelihood(xplus, scaled=True)

[09:55:40] < Detailed >  Log likelihood (N = 5):       -555


-111.0

It is possible to calculate the likelihood based only on a sample of the data

In [12]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.5)

[09:55:40] < Detailed >  Use 50.0% of the data.
[09:55:40] < Detailed >  Log likelihood (N = 2):       -222


-111.0

In [13]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,1,10,1,0,1,0
4,2,1,5,50,2,1,1,1


In [14]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[09:55:40] < Detailed >  Use 60.0% of the data.
[09:55:40] < Detailed >  Log likelihood (N = 3):       -303


-101.0

In [15]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
2,1,1,3,30,3,1,1,1
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1


By default, each observation has the same probability to be selected in the sample. It is possible to define the selection probability to be proportional to the values of a column of the database, using the parameter 'weights'. 

In [16]:
myBiogeme.columnForBatchSamplingWeights = 'Variable2'
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[09:55:40] < Detailed >  Use 60.0% of the data.
[09:55:40] < Detailed >  Log likelihood (N = 3):       -353


-117.66666666666667

In [17]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
1,1,0,2,20,2,1,1,1
4,2,1,5,50,2,1,1,1
3,2,0,4,40,1,1,1,1


## calculateLikelihoodAndDerivatives

In [18]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=True,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[09:55:40] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f = -111.0
g = [ -90. -108.]
h = [[-270.  -30.]
 [ -30. -108.]]
bhhh = [[ 9900.  9720.]
 [ 9720. 11664.]]


Now the unscaled version

In [19]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=False,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[09:55:40] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f = -555.0
g = [-450. -540.]
h = [[-1350.  -150.]
 [ -150.  -540.]]
bhhh = [[49500. 48600.]
 [48600. 58320.]]


Using only a sample of the data

In [20]:
f, g, h, bhhh = myBiogeme.\
    calculateLikelihoodAndDerivatives(xplus,
                                      scaled=True,
                                      batch=0.5,
                                      hessian=True,
                                      bhhh=True)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

[09:55:40] < Detailed >  Use 50.0% of the data.
[09:55:40] < General >   Log likelihood (N = 2):       -222 Gradient norm:      3e+02 Hessian norm:       6e+02 BHHH norm:       4e+04
f = -111.0
g = [ -90. -108.]
h = [[-270.  -30.]
 [ -30. -108.]]
bhhh = [[ 9000.  9720.]
 [ 9720. 11664.]]


## likelihoodFiniteDifferenceHessian

In [21]:
myBiogeme.likelihoodFiniteDifferenceHessian(xplus)

[09:55:40] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[09:55:40] < General >   Log likelihood (N = 5):       -555 Gradient norm:      7e+02  
[09:55:40] < General >   Log likelihood (N = 5):  -555.0002 Gradient norm:      7e+02  


array([[-1380.00020229,  -150.        ],
       [ -150.0000451 ,  -540.00005396]])

## checkDerivatives

In [22]:
f, g, h, gdiff, hdiff = myBiogeme.checkDerivatives(verbose=True)

[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < Detailed >  x		Gradient	FinDiff		Difference
[09:55:40] < Detailed >  beta1          	-1.060058E+01	-1.060058E+01	-5.427932E-06
[09:55:40] < Detailed >  beta2          	-1.396997E+02	-1.396997E+02	+2.608000E-05
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02 Hessian norm:

In [23]:
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'gdiff = {gdiff}')
print(f'hdiff = {hdiff}')
hdiff

f = -115.30029248549191
g = [ -10.60058497 -139.69970751]
h = [[-111.20116994   20.30029249]
 [  20.30029249 -260.30029249]]
gdiff = [-5.42793187e-06  2.60800035e-05]
hdiff = [[-8.04552172e-06  7.36597983e-09]
 [-1.61387920e-07  2.22928137e-05]]


array([[-8.04552172e-06,  7.36597983e-09],
       [-1.61387920e-07,  2.22928137e-05]])

## estimate

During estimation, it is possible to save intermediate results, in case the estimation must be interrupted. 

In [24]:
results = myBiogeme.estimate(bootstrap=10, saveIterations=True)

[09:55:40] < Detailed >  Log likelihood (N = 5):  -115.3003
[09:55:40] < Detailed >  ** Optimization: BFGS with trust region for simple bounds
[09:55:40] < General >   Log likelihood (N = 5):  -115.3003 Gradient norm:      1e+02  
[09:55:40] < Detailed >  Log likelihood (N = 5):  -85.30029
[09:55:40] < General >   Log likelihood (N = 5):  -85.30029 Gradient norm:      4e+01  
[09:55:40] < Detailed >  1 f=  17.06006 projected rel. grad.=  0.43 delta=     1 rho=  0.21 +
[09:55:40] < Detailed >  Log likelihood (N = 5):  -88.49387
[09:55:40] < Detailed >  2 f=  17.06006 projected rel. grad.=  0.43 delta=   0.5 rho=  -0.1 -
[09:55:40] < Detailed >  Log likelihood (N = 5):  -71.59478
[09:55:40] < General >   Log likelihood (N = 5):  -71.59478 Gradient norm:      3e+01  
[09:55:40] < Detailed >  3 f=  14.31896 projected rel. grad.=  0.24 delta=   0.5 rho=  0.76 +
[09:55:40] < Detailed >  Log likelihood (N = 5):  -72.68032
[09:55:40] < Detailed >  4 f=  14.31896 projected rel. grad.=  0.24 del

In [25]:
results.getEstimatedParameters()

Unnamed: 0,Value,Std err,t-test,p-value,Rob. Std err,Rob. t-test,Rob. p-value,Bootstrap[10] Std err,Bootstrap t-test,Bootstrap p-value
beta1,-1.273263,0.115144,-11.057993,0.0,0.013724,-92.778009,0.0,0.008758,-145.38657,0.0
beta2,1.248769,0.08483,14.720833,0.0,0.059086,21.134787,0.0,0.037993,32.868254,0.0


The values of the intermediate results saved can be retrieved as follows. 

Formula before

In [26]:
myBiogeme.loglike

((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`))

Retrieving the values

In [27]:
myBiogeme.loadSavedIteration()
myBiogeme.loglike

[09:55:40] < Detailed >  Parameter values restored from __savedIterations.txt


((((-(beta1(-1.2925578840958771) ** `2`)) * Variable1) - (exp((beta2(1.1643222545676006) * beta1(-1.2925578840958771))) * Variable2)) - (beta2(1.1643222545676006) ** `4`))

A file name can be given. If the file does not exist, the statement is ignored. 

In [28]:
myBiogeme.loadSavedIteration(filename='fileThatDoesNotExist.txt')



## simulate

Simulate with the default values for the parameters

In [29]:
simulationWithDefaultBetas = myBiogeme.simulate()
simulationWithDefaultBetas

Unnamed: 0,loglike,beta1,simul
0,-5.728758,-1.292558,-1.176126
1,-9.619739,-1.292558,-0.588063
2,-13.51072,-1.292558,-0.392042
3,-17.401702,-1.292558,-0.294031
4,-21.292683,-1.292558,-0.235225


Simulate with the estimated values for the parameters

In [30]:
print(results.getBetaValues())
simulationWithEstimatedBetas =\
    myBiogeme.simulate(results.getBetaValues())
simulationWithEstimatedBetas

{'beta1': -1.2732631104451704, 'beta2': 1.248768684068648}


Unnamed: 0,loglike,beta1,simul
0,-6.092233,-1.273263,-1.148386
1,-9.752666,-1.273263,-0.574193
2,-13.413098,-1.273263,-0.382795
3,-17.073531,-1.273263,-0.287097
4,-20.733963,-1.273263,-0.229677


## confidenceIntervals

In [31]:
drawsFromBetas =\
    results.getBetasForSensitivityAnalysis(myBiogeme.freeBetaNames)
left, right = myBiogeme.confidenceIntervals(drawsFromBetas)
left

Unnamed: 0,loglike,beta1,simul
0,-6.827964,-1.296711,-1.181929
1,-10.212145,-1.296711,-0.590964
2,-13.681666,-1.296711,-0.393976
3,-17.491057,-1.296711,-0.295482
4,-21.429878,-1.296711,-0.236386


In [32]:
right

Unnamed: 0,loglike,beta1,simul
0,-5.674594,-1.246905,-1.11068
1,-9.615192,-1.246905,-0.55534
2,-13.413484,-1.246905,-0.370227
3,-16.967242,-1.246905,-0.27767
4,-20.364691,-1.246905,-0.222136


## validate

The validation consists in organizing the data into several slices of about the same size, randomly defined. 
Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.   

In [33]:
logger.setSilent()
validationResults = myBiogeme.validate(results, slices = 2)
validationResults

[   Loglikelihood
 4     -21.292683
 3     -17.401702
 1      -9.619739,
    Loglikelihood
 0      -5.728758
 2     -13.510720]