# class BIOGEME: examples of use of each function

This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2020-03-10 16:27:38.266919


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.6a [2020-03-10]
Version entirely written in Python
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import Beta, Variable, exp

Define the verbosity of Biogeme

In [4]:
import biogeme.messaging as msg
logger = msg.bioMessage()
logger.setDetailed()

##  Definition of a database

In [5]:
df = pd.DataFrame({'Person':[1,1,1,2,2],
                   'Exclude':[0,0,1,0,1],
                   'Variable1':[1,2,3,4,5],
                   'Variable2':[10,20,30,40,50],
                   'Choice':[1,2,3,1,2],
                   'Av1':[0,1,1,1,1],
                   'Av2':[1,1,1,1,1],
                   'Av3':[0,1,1,1,1]})
myData = db.Database('test',df)

## Definition of various expressions

In [6]:
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
beta1 = Beta('beta1',-1.0,-3,3,0)
beta2 = Beta('beta2',2.0,-3,10,0)
likelihood = -beta1**2 * Variable1 - exp(beta2*beta1) * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike':likelihood,'beta1':beta1,'simul':simul}

## Creation of the BIOGEME object

In [7]:
myBiogeme = bio.BIOGEME(myData,dictOfExpressions)
myBiogeme.modelName = 'simpleExample'
print(myBiogeme)

[16:27:38] < General >   Remove 6 unused variables from the database as only 2 are used.
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}
simpleExample: database [test]{'loglike': ((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`)), 'beta1': beta1(-1.0), 'simul': ((beta1(-1.0) / Variable1) + (beta2(2.0) / Variable2))}


Note that, by default, Biogeme removes the unused variables from the database to optimize space.

In [8]:
myBiogeme.database.data.columns

Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
       'Av3'],
      dtype='object')

## calculateInitLikelihood

In [9]:
myBiogeme.calculateInitLikelihood()

[16:27:38] < General >   Log likelihood (N=5):  -115.3003


-115.30029248549191

## calculateLikelihood

In [10]:
x = myBiogeme.betaInitValues
xplus = [v+1 for v in x]
print(xplus)

[0.0, 3.0]


In [11]:
myBiogeme.calculateLikelihood(xplus,scaled=True)

[16:27:38] < General >   Log likelihood (N=5):       -555


-111.0

It is possible to calculate the likelihood based only on a sample of the data

In [12]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.5)

[16:27:38] < Detailed >  Use 50.0% of the data.
[16:27:38] < General >   Log likelihood (N=2):       -212


-106.0

In [13]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
1,1,0,2,20,2,1,1,1
2,1,1,3,30,3,1,1,1


In [14]:
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[16:27:38] < Detailed >  Use 60.0% of the data.
[16:27:38] < General >   Log likelihood (N=3):       -303


-101.0

In [15]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
2,1,1,3,30,3,1,1,1
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1


By default, each observation has the same probability to be selected in the sample. It is possible to define the selection probability to be proportional to the values of a column of the database, using the parameter 'weights'. 

In [16]:
myBiogeme.columnForBatchSamplingWeights = 'Variable2'
myBiogeme.calculateLikelihood(xplus, scaled=True, batch=0.6)

[16:27:38] < Detailed >  Use 60.0% of the data.
[16:27:38] < General >   Log likelihood (N=3):       -303


-101.0

In [17]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
2,1,1,3,30,3,1,1,1
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1


## calculateLikelihoodAndDerivatives

In [18]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=True,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[16:27:38] < General >   Log likelihood (N=5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f=-111.0
g=[ -90. -108.]
h=[[-270.  -30.]
 [ -30. -108.]]
bhhh=[[ 9900.  9720.]
 [ 9720. 11664.]]


Now the unscaled version

In [19]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=False,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[16:27:38] < General >   Log likelihood (N=5):       -555 Gradient norm:      7e+02 Hessian norm:       1e+03 BHHH norm:       1e+05
f=-555.0
g=[-450. -540.]
h=[[-1350.  -150.]
 [ -150.  -540.]]
bhhh=[[49500. 48600.]
 [48600. 58320.]]


Using only a sample of the data

In [20]:
f,g,h,bhhh = myBiogeme.calculateLikelihoodAndDerivatives(xplus,scaled=True,batch=0.5,hessian=True,bhhh=True)
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'bhhh={bhhh}')

[16:27:39] < Detailed >  Use 50.0% of the data.
[16:27:39] < General >   Log likelihood (N=2):       -252 Gradient norm:      3e+02 Hessian norm:       8e+02 BHHH norm:       6e+04
f=-126.0
g=[-135. -108.]
h=[[-405.  -45.]
 [ -45. -108.]]
bhhh=[[18450. 14580.]
 [14580. 11664.]]


## likelihoodFiniteDifferenceHessian

In [21]:
myBiogeme.likelihoodFiniteDifferenceHessian(xplus)

[16:27:39] < General >   Log likelihood (N=5):       -555 Gradient norm:      7e+02  
[16:27:39] < General >   Log likelihood (N=5):       -555 Gradient norm:      7e+02  
[16:27:39] < General >   Log likelihood (N=5):  -555.0002 Gradient norm:      7e+02  


array([[-1380.00020229,  -150.        ],
       [ -150.0000451 ,  -540.00005396]])

## checkDerivatives

In [22]:
f,g,h,gdiff,hdiff = myBiogeme.checkDerivatives(verbose=True)

[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < Detailed >  x		Gradient	FinDiff		Difference
[16:27:39] < Detailed >  beta1          	-1.060058E+01	-1.060058E+01	-5.427932E-06
[16:27:39] < Detailed >  beta2          	-1.396997E+02	-1.396997E+02	+2.608000E-05
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02 Hessian norm:       3e+02 


In [23]:
print(f'f={f}')
print(f'g={g}')
print(f'h={h}')
print(f'gdiff={gdiff}')
print(f'hdiff={hdiff}')
hdiff

f=-115.30029248549191
g=[ -10.60058497 -139.69970751]
h=[[-111.20116994   20.30029249]
 [  20.30029249 -260.30029249]]
gdiff=[-5.42793187e-06  2.60800035e-05]
hdiff=[[-8.04552172e-06  7.36597983e-09]
 [-1.61387920e-07  2.22928137e-05]]


array([[-8.04552172e-06,  7.36597983e-09],
       [-1.61387920e-07,  2.22928137e-05]])

## estimate

During estimation, it is possible to save intermediate results, in case the estimation must be interrupted. 

In [24]:
results = myBiogeme.estimate(bootstrap=10,saveIterations=True)

[16:27:39] < General >   Log likelihood (N=5):  -115.3003
[16:27:39] < General >   Minimize with tol 1e-07
[16:27:39] < General >   Log likelihood (N=5):  -115.3003 Gradient norm:      1e+02  
[16:27:39] < General >   Log likelihood (N=5):   -1216003 Gradient norm:      5e+06  
[16:27:39] < General >   Log likelihood (N=5):  -115.6422 Gradient norm:      1e+02  
[16:27:39] < General >   Log likelihood (N=5):  -67.11418 Gradient norm:          3  
[16:27:39] < General >   Log likelihood (N=5):  -67.07432 Gradient norm:          1  
[16:27:39] < General >   Log likelihood (N=5):   -67.0655 Gradient norm:       0.06  
[16:27:39] < General >   Log likelihood (N=5):  -67.06549 Gradient norm:      0.001  
[16:27:39] < General >   Log likelihood (N=5):  -67.06549 Gradient norm:      3e-07  
[16:27:39] < General >   Log likelihood (N=5):  -67.06549 Gradient norm:      3e-07 Hessian norm:       2e+02 BHHH norm:       7e+01
[16:27:39] < General >   Re-estimate the model 10 times for bootstrappin

[16:27:39] < General >   Log likelihood (N=5):   -1297042 Gradient norm:      6e+06  
[16:27:39] < General >   Log likelihood (N=5):  -123.3345 Gradient norm:      1e+02  
[16:27:39] < General >   Log likelihood (N=5):  -70.75787 Gradient norm:          3  
[16:27:39] < General >   Log likelihood (N=5):  -70.70991 Gradient norm:          1  
[16:27:39] < General >   Log likelihood (N=5):  -70.70227 Gradient norm:       0.05  
[16:27:39] < General >   Log likelihood (N=5):  -70.70226 Gradient norm:      0.004  
[16:27:39] < General >   Log likelihood (N=5):  -70.70226 Gradient norm:      7e-07  
[16:27:39] < General >   Log likelihood (N=5):  -70.70226 Gradient norm:      2e-09  
[16:27:39] < General >   Results saved in file simpleExample~11.html
[16:27:39] < General >   Results saved in file simpleExample~12.pickle


In [25]:
results.getEstimatedParameters()

Unnamed: 0,Value,Std err,t-test,p-value,Rob. Std err,Rob. t-test,Rob. p-value,Bootstrap[10] Std err,Bootstrap t-test,Bootstrap p-value
beta1,-1.273264,0.115144,-11.057997,0.0,0.013724,-92.776669,0.0,0.015274,-83.361706,0.0
beta2,1.248769,0.08483,14.720836,0.0,0.059086,21.134794,0.0,0.067467,18.509232,0.0


The values of the intermediate results saved can be retrieved as follows. 

Formula before

In [26]:
myBiogeme.loglike

((((-(beta1(-1.0) ** `2`)) * Variable1) - (exp((beta2(2.0) * beta1(-1.0))) * Variable2)) - (beta2(2.0) ** `4`))

Retrieving the values

In [27]:
myBiogeme.loadSavedIteration()
myBiogeme.loglike

[16:27:39] < Detailed >  Parameter values restored from __savedIterations.txt


((((-(beta1(-1.269026040514541) ** `2`)) * Variable1) - (exp((beta2(1.2669668816397543) * beta1(-1.269026040514541))) * Variable2)) - (beta2(1.2669668816397543) ** `4`))

A file name can be given. If the file does not exist, the statement is ignored. 

In [28]:
myBiogeme.loadSavedIteration(filename='fileThatDoesNotExist.txt')



## simulate

In [29]:
# Simulate with the default values for the parameters
simulationWithDefaultBetas = myBiogeme.simulate()
simulationWithDefaultBetas

Unnamed: 0,loglike,beta1,simul
0,-6.190361,-1.269026,-1.142329
1,-9.804039,-1.269026,-0.571165
2,-13.417716,-1.269026,-0.380776
3,-17.031394,-1.269026,-0.285582
4,-20.645071,-1.269026,-0.228466


In [30]:
# Simulate with the estimated values for the parameters
print(results.getBetaValues())
simulationWithEstimatedBetas = myBiogeme.simulate(results.getBetaValues())
simulationWithEstimatedBetas

{'beta1': -1.2732639841254711, 'beta2': 1.248768808907056}


Unnamed: 0,loglike,beta1,simul
0,-6.092234,-1.273264,-1.148387
1,-9.752666,-1.273264,-0.574194
2,-13.413098,-1.273264,-0.382796
3,-17.07353,-1.273264,-0.287097
4,-20.733962,-1.273264,-0.229677


## confidenceIntervals

In [31]:
drawsFromBetas = results.getBetasForSensitivityAnalysis(myBiogeme.freeBetaNames)
left, right = myBiogeme.confidenceIntervals(drawsFromBetas)
left

Unnamed: 0,loglike,beta1,simul
0,-6.765699,-1.293482,-1.17731
1,-10.168313,-1.293482,-0.588655
2,-13.616007,-1.293482,-0.392437
3,-17.415256,-1.293482,-0.294327
4,-21.313712,-1.293482,-0.235462


In [32]:
right

Unnamed: 0,loglike,beta1,simul
0,-5.719889,-1.248774,-1.113354
1,-9.619631,-1.248774,-0.556677
2,-13.413488,-1.248774,-0.371118
3,-16.967995,-1.248774,-0.278338
4,-20.376152,-1.248774,-0.222671


## validate

In [33]:
validationResults = myBiogeme.validate(results, slices = 2)

TypeError: 'module' object is not callable