# Module biogeme.biogeme 

## Examples of use of each function

This webpage is for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2023-08-04 18:41:51.056996


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.12 [2023-08-04]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import biogeme.biogeme as bio
import biogeme.database as db
import pandas as pd
import numpy as np
from biogeme.expressions import Beta, Variable, exp

Define the verbosity of Biogeme

In [4]:
import biogeme.logging as blog
logger = blog.get_screen_logger(level=blog.INFO)
logger.info('Logger initalized')


Logger initalized 


##  Definition of a database

In [5]:
df = pd.DataFrame({'Person':[1,1,1,2,2],
                   'Exclude':[0,0,1,0,1],
                   'Variable1':[1,2,3,4,5],
                   'Variable2':[10,20,30,40,50],
                   'Choice':[1,2,3,1,2],
                   'Av1':[0,1,1,1,1],
                   'Av2':[1,1,1,1,1],
                   'Av3':[0,1,1,1,1]})
myData = db.Database('test', df)

## Definition of various expressions

In [6]:
Variable1=Variable('Variable1')
Variable2=Variable('Variable2')
beta1 = Beta('beta1', -1.0, -3, 3, 0)
beta2 = Beta('beta2', 2.0, -3, 10, 0)
likelihood = -beta1**2 * Variable1 - exp(beta2 * beta1) \
    * Variable2 - beta2**4
simul = beta1 / Variable1 + beta2 / Variable2
dictOfExpressions = {'loglike': likelihood, 
                     'beta1': beta1,
                     'simul': simul}

## Creation of the BIOGEME object

In [7]:
myBiogeme = bio.BIOGEME(myData, dictOfExpressions)
myBiogeme.modelName = 'simple_example'
print(myBiogeme)

File biogeme.toml has been parsed. 


simple_example: database [test]{'loglike': ((((-(beta1(init=-1.0) ** `2.0`)) * Variable1) - (exp((beta2(init=2.0) * beta1(init=-1.0))) * Variable2)) - (beta2(init=2.0) ** `4.0`)), 'beta1': beta1(init=-1.0), 'simul': ((beta1(init=-1.0) / Variable1) + (beta2(init=2.0) / Variable2))}


In [8]:
myBiogeme.database.data.columns

Index(['Person', 'Exclude', 'Variable1', 'Variable2', 'Choice', 'Av1', 'Av2',
       'Av3'],
      dtype='object')

## calculateInitLikelihood

In [9]:
myBiogeme.calculateInitLikelihood()

-115.30029248549191

## calculateLikelihood

In [10]:
x = myBiogeme.id_manager.free_betas_values
xplus = [v + 1 for v in x]
print(xplus)

[0.0, 3.0]


In [11]:
myBiogeme.calculateLikelihood(xplus, scaled=True)

-111.0

In [12]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1
2,1,1,3,30,3,1,1,1
3,2,0,4,40,1,1,1,1
4,2,1,5,50,2,1,1,1


In [13]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1
2,1,1,3,30,3,1,1,1
3,2,0,4,40,1,1,1,1
4,2,1,5,50,2,1,1,1


By default, each observation has the same probability to be selected in the sample. It is possible to define the selection probability to be proportional to the values of a column of the database, using the parameter 'weights'. 

In [14]:
myBiogeme.columnForBatchSamplingWeights = 'Variable2'
myBiogeme.calculateLikelihood(xplus, scaled=True)

-111.0

In [15]:
myBiogeme.database.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,1,10,1,0,1,0
1,1,0,2,20,2,1,1,1
2,1,1,3,30,3,1,1,1
3,2,0,4,40,1,1,1,1
4,2,1,5,50,2,1,1,1


## calculateLikelihoodAndDerivatives

In [16]:
f, g, h, bhhh = myBiogeme.calculateLikelihoodAndDerivatives(
    xplus,
    scaled=True,
    hessian=True,
    bhhh=True
)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

f = -111.0
g = [ -90. -108.]
h = [[-270.  -30.]
 [ -30. -108.]]
bhhh = [[ 9900.  9720.]
 [ 9720. 11664.]]


Now the unscaled version

In [17]:
f, g, h, bhhh = myBiogeme.calculateLikelihoodAndDerivatives(
    xplus,
    scaled=False,
    hessian=True,
    bhhh=True
)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

f = -555.0
g = [-450. -540.]
h = [[-1350.  -150.]
 [ -150.  -540.]]
bhhh = [[49500. 48600.]
 [48600. 58320.]]


Using only a sample of the data

In [18]:
f, g, h, bhhh = myBiogeme.calculateLikelihoodAndDerivatives(
    xplus,
    scaled=True,
    hessian=True,
    bhhh=True
)
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'bhhh = {bhhh}')

f = -111.0
g = [ -90. -108.]
h = [[-270.  -30.]
 [ -30. -108.]]
bhhh = [[ 9900.  9720.]
 [ 9720. 11664.]]


## likelihoodFiniteDifferenceHessian

In [19]:
myBiogeme.likelihoodFiniteDifferenceHessian(xplus)

array([[-1380.00020229,  -150.        ],
       [ -150.0000451 ,  -540.00005396]])

## checkDerivatives

In [20]:
f, g, h, gdiff, hdiff = myBiogeme.checkDerivatives(xplus, verbose=True)

x		Gradient	FinDiff		Difference 


beta1          	-4.500000E+02	-4.500001E+02	+6.934970E-05 


beta2          	-5.400000E+02	-5.400001E+02	+8.087011E-05 


Row		Col		Hessian	FinDiff		Difference 


beta1          	beta1          	-1.350000E+03	-1.380000E+03	+3.000020E+01 


beta1          	beta2          	-1.500000E+02	-1.500000E+02	+2.425509E-10 


beta2          	beta1          	-1.500000E+02	-1.500000E+02	+4.509602E-05 


beta2          	beta2          	-5.400000E+02	-5.400001E+02	+5.396423E-05 


In [21]:
print(f'f = {f}')
print(f'g = {g}')
print(f'h = {h}')
print(f'gdiff = {gdiff}')
print(f'hdiff = {hdiff}')
hdiff

f = -555.0
g = [-450. -540.]
h = [[-1350.  -150.]
 [ -150.  -540.]]
gdiff = [6.93496986e-05 8.08701104e-05]
hdiff = [[3.00002023e+01 2.42550868e-10]
 [4.50960215e-05 5.39642255e-05]]


array([[3.00002023e+01, 2.42550868e-10],
       [4.50960215e-05, 5.39642255e-05]])

## estimate

During estimation, it is possible to save intermediate results, in case the estimation must be interrupted. 

In [22]:
myBiogeme.bootstrap_samples=10
results = myBiogeme.estimate(run_bootstrap=True)

*** Initial values of the parameters are obtained from the file __simple_example.iter 


Parameter values restored from __simple_example.iter 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0           -0.23             2.1      1.8e+02      0.019    1e+02      1.2   ++ 


    1            -0.6             1.5           93      0.013    1e+03      1.2   ++ 


    2              -1             1.3           70       0.01    1e+04      1.2   ++ 


    3            -1.2             1.3           67     0.0039    1e+05      1.1   ++ 


    4            -1.3             1.2           67    6.8e-05    1e+06        1   ++ 


    5            -1.3             1.2           67    1.2e-08    1e+06        1   ++ 


Re-estimate the model 10 times for bootstrapping 


  0%|                                                                                                                                                                                                                                                                                                                                                | 0/10 [00:00<?, ?it/s]

Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.3           74    0.00082    1e+02     0.99   ++ 


    1            -1.3             1.3           74    7.1e-08    1e+02        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           52     0.0069    1e+02        1   ++ 


    1            -1.3             1.2           52    6.2e-06    1e+03        1   ++ 


    2            -1.3             1.2           52    1.3e-07    1e+03        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           63    0.00029    1e+02        1   ++ 


    1            -1.3             1.2           63      1e-08    1e+02        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.1           44      0.021    1e+02        1   ++ 


    1            -1.3             1.1           44    5.9e-05    1e+03        1   ++ 


    2            -1.3             1.1           44    4.1e-08    1e+03        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           56     0.0034    1e+02        1   ++ 


    1            -1.3             1.2           56    1.4e-06    1e+02        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.3           81     0.0026    1e+02     0.99   ++ 


    1            -1.3             1.3           81    6.8e-07    1e+02        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           52     0.0069    1e+02        1   ++ 


    1            -1.3             1.2           52    6.2e-06    1e+03        1   ++ 


    2            -1.3             1.2           52    1.3e-07    1e+03        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.1           48      0.012    1e+02        1   ++ 


    1            -1.3             1.1           48    2.1e-05    1e+03        1   ++ 


    2            -1.3             1.1           48      6e-11    1e+03        1   ++ 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           63    0.00029    1e+02        1   ++ 


    1            -1.3             1.2           63      1e-08    1e+02        1   ++ 


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 374.45it/s]


Results saved in file simple_example.html 


Results saved in file simple_example.pickle 


In [23]:
results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
beta1,-1.273264,0.013724,-92.776769,0.0
beta2,1.248769,0.059086,21.134795,0.0


If the model has already been estimated, it is possible to recycle the estimation results. In that case, the other arguments are ignored, and the results are whatever is in the file.

In [24]:
recycled_results = myBiogeme.estimate(recycle=True, run_bootstrap=True)

Estimation results read from simple_example.pickle. There is no guarantee that they correspond to the specified model. 


In [25]:
print(recycled_results.shortSummary())

The syntax "shortSummary" is deprecated and is replaced by the syntax "short_summary". 


Results for model simple_example
Nbr of parameters:		2
Sample size:			5
Excluded data:			0
Final log likelihood:		-67.06549
Akaike Information Criterion:	138.131
Bayesian Information Criterion:	137.3499



In [26]:
recycled_results.getEstimatedParameters()

Unnamed: 0,Value,Rob. Std err,Rob. t-test,Rob. p-value
beta1,-1.273264,0.013724,-92.776769,0.0
beta2,1.248769,0.059086,21.134795,0.0


## simulate

Simulate with the default values for the parameters

In [27]:
simulationWithDefaultBetas = myBiogeme.simulate(myBiogeme.loglike.get_beta_values())
simulationWithDefaultBetas

Unnamed: 0,loglike,beta1,simul
0,-121.0,0.0,0.075
1,-121.0,0.0,0.075
2,-101.0,0.0,0.15
3,-91.0,0.0,0.3
4,-111.0,0.0,0.1


Simulate with the estimated values for the parameters

In [28]:
print(results.getBetaValues())
simulationWithEstimatedBetas =\
    myBiogeme.simulate(results.getBetaValues())
simulationWithEstimatedBetas

{'beta1': -1.273263915009374, 'beta2': 1.248768825523196}


Unnamed: 0,loglike,beta1,simul
0,-17.07353,-1.273264,-0.287097
1,-17.07353,-1.273264,-0.287097
2,-9.752666,-1.273264,-0.574194
3,-6.092234,-1.273264,-1.148387
4,-13.413098,-1.273264,-0.382796


## confidenceIntervals

In [29]:
drawsFromBetas = results.getBetasForSensitivityAnalysis(
    myBiogeme.id_manager.free_betas.names
)
left, right = myBiogeme.confidenceIntervals(drawsFromBetas)
left

Unnamed: 0,loglike,beta1,simul
0,-17.634327,-1.301351,-0.297227
1,-17.634327,-1.301351,-0.297227
2,-9.92983,-1.301351,-0.594453
3,-6.40308,-1.301351,-1.188907
4,-13.625745,-1.301351,-0.396302


In [30]:
right

Unnamed: 0,loglike,beta1,simul
0,-16.983328,-1.26081,-0.282652
1,-16.983328,-1.26081,-0.282652
2,-9.615808,-1.26081,-0.565305
3,-5.608581,-1.26081,-1.13061
4,-13.415398,-1.26081,-0.37687


## validate

The validation consists in organizing the data into several slices of about the same size, randomly defined. 
Each slide is considered as a validation dataset. The model is then re-estimated using all the data except the slice, and the estimated model is applied on the validation set (i.e. the slice). The value of the log likelihood for each observation in the validation set is reported in a dataframe. As this is done for each slice, the output is a list of dataframes, each corresponding to one of these exercises.   

In [31]:
validationData = myData.split(slices=5)
validation_results = myBiogeme.validate(results, validationData)

File biogeme.toml has been parsed. 


*** Initial values of the parameters are obtained from the file __simple_example_val_est_1.iter 


Cannot read file __simple_example_val_est_1.iter. Statement is ignored. 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           46     0.0022    1e+02        1   ++ 


    1            -1.3             1.2           46      6e-07    1e+02        1   ++ 


Results saved in file simple_example_val_est_1.html 


Results saved in file simple_example_val_est_1.pickle 


File biogeme.toml has been parsed. 


File biogeme.toml has been parsed. 


*** Initial values of the parameters are obtained from the file __simple_example_val_est_2.iter 


Cannot read file __simple_example_val_est_2.iter. Statement is ignored. 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.3           61     0.0012    1e+02     0.99   ++ 


    1            -1.3             1.3           61    1.5e-07    1e+02        1   ++ 


Results saved in file simple_example_val_est_2.html 


Results saved in file simple_example_val_est_2.pickle 


<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>


File biogeme.toml has been parsed. 


File biogeme.toml has been parsed. 


*** Initial values of the parameters are obtained from the file __simple_example_val_est_3.iter 


Cannot read file __simple_example_val_est_3.iter. Statement is ignored. 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.3           57    0.00035    1e+02        1   ++ 


    1            -1.3             1.3           57    1.3e-08    1e+02        1   ++ 


Results saved in file simple_example_val_est_3.html 


Results saved in file simple_example_val_est_3.pickle 


File biogeme.toml has been parsed. 


File biogeme.toml has been parsed. 


*** Initial values of the parameters are obtained from the file __simple_example_val_est_4.iter 


Cannot read file __simple_example_val_est_4.iter. Statement is ignored. 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Iter.           beta1           beta2     Function    Relgrad   Radius      Rho      


    0            -1.3             1.2           50    0.00048    1e+02        1   ++ 


    1            -1.3             1.2           50    2.6e-08    1e+02        1   ++ 


Results saved in file simple_example_val_est_4.html 


Results saved in file simple_example_val_est_4.pickle 


File biogeme.toml has been parsed. 


File biogeme.toml has been parsed. 


*** Initial values of the parameters are obtained from the file __simple_example_val_est_5.iter 


Cannot read file __simple_example_val_est_5.iter. Statement is ignored. 


Optimization algorithm: hybrid Newton/BFGS with simple bounds [simple_bounds] 


** Optimization: Newton with trust region for simple bounds 


Results saved in file simple_example_val_est_5.html 


Results saved in file simple_example_val_est_5.pickle 


File biogeme.toml has been parsed. 


Simulation results saved in file simple_example_validation.pickle 


In [32]:
validation_results

[   Loglikelihood
 4     -21.037421,
    Loglikelihood
 0      -6.341109,
    Loglikelihood
 1       -9.81772,
    Loglikelihood
 3     -17.145326,
    Loglikelihood
 2     -13.413098]

In [33]:
for slide in validation_results:
    print(f'Log likelihood for {slide.shape[0]} '
          f'validation data: {slide["Loglikelihood"].sum()}')


Log likelihood for 1 validation data: -21.03742136293277
Log likelihood for 1 validation data: -6.341108765392212
Log likelihood for 1 validation data: -9.81771976465043
Log likelihood for 1 validation data: -17.145326446024075
Log likelihood for 1 validation data: -13.413098095892746


## files_of_type

In [34]:
myBiogeme.files_of_type('pickle')

['simple_example.pickle']