# Module biogeme.expressions 

## Examples of use of each function

This webpage is for programmers who need examples of use of the functions of the module. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2023-08-04 17:51:52.498248


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.12 [2023-08-04]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import numpy as np
import pandas as pd

In [4]:
import biogeme.expressions as ex
import biogeme.database as db
import biogeme.exceptions as excep
import biogeme.models as models
from biogeme import tools
from biogeme.idmanager import IdManager
import biogeme.logging as blog
from biogeme.elementary_expressions import TypeOfElementaryExpression


In [5]:
logger = blog.get_screen_logger(level=blog.INFO)

We set the number of draws for Monte-Carlo integration. It should be a large number. For the sake of computational efficiency, as this notebook is designed to illustrate the various function, we use a  low value. 

In [6]:
number_of_draws = 100

We first create a small database

In [7]:
df = pd.DataFrame({'Person': [1, 1, 1, 2, 2],
                   'Exclude': [0, 0, 1, 0, 1],
                   'Variable1': [10, 20, 30, 40, 50],
                   'Variable2': [100, 200, 300, 400, 500],
                   'Choice': [2, 2, 3, 1, 2],
                   'Av1': [0, 1, 1, 1, 1],
                   'Av2': [1, 1, 1, 1, 1],
                   'Av3': [0, 1, 1, 1, 1]})
df

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3
0,1,0,10,100,2,0,1,0
1,1,0,20,200,2,1,1,1
2,1,1,30,300,3,1,1,1
3,2,0,40,400,1,1,1,1
4,2,1,50,500,2,1,1,1


In [8]:
myData = db.Database('test', df)

The following type of expression is a literal called Variable that corresponds to an entry in the database.

In [9]:
Person = ex.Variable('Person')
Variable1 = ex.Variable('Variable1')
Variable2 = ex.Variable('Variable2')
Choice = ex.Variable('Choice')
Av1 = ex.Variable('Av1')
Av2 = ex.Variable('Av2')
Av3 = ex.Variable('Av3')

It is possible to add a new column to the database, that creates a new variable that can be used in expressions.

In [10]:
newvar_b = myData.DefineVariable('newvar_b',
                           Variable1 + Variable2)
print(myData)

biogeme database test:
   Person  Exclude  Variable1  Variable2  Choice  Av1  Av2  Av3  newvar_b
0       1        0         10        100       2    0    1    0     110.0
1       1        0         20        200       2    1    1    1     220.0
2       1        1         30        300       3    1    1    1     330.0
3       2        0         40        400       1    1    1    1     440.0
4       2        1         50        500       2    1    1    1     550.0


It is equivalent to the following Pandas statement

In [11]:
myData.data['newvar_p'] = myData.data['Variable1'] + myData.data['Variable2']
myData.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3,newvar_b,newvar_p
0,1,0,10,100,2,0,1,0,110.0,110
1,1,0,20,200,2,1,1,1,220.0,220
2,1,1,30,300,3,1,1,1,330.0,330
3,2,0,40,400,1,1,1,1,440.0,440
4,2,1,50,500,2,1,1,1,550.0,550


**Do not use chaining comparison expressions with Biogeme. Not only it does not provide the expected expression, but it does not trigger a warning or an exception.**

In [12]:
my_expression = (200 <= Variable2 <= 400)
print(my_expression)

(Variable2 <= `400.0`)


The reason is that Python executes `200 <= Variable2 <= 400` as `(200 <= Variable2) and (Variable2 <= 400)`. The `and` operator cannot be overloaded in Python. Therefore, it does not return a Biogeme expression. Note that Pandas does not allow chaining either, and has implemented a `between` function instead. 

In [13]:
myData.data['chaining_p'] = myData.data['Variable2'].between(200, 400)

In [14]:
myData.data

Unnamed: 0,Person,Exclude,Variable1,Variable2,Choice,Av1,Av2,Av3,newvar_b,newvar_p,chaining_p
0,1,0,10,100,2,0,1,0,110.0,110,False
1,1,0,20,200,2,1,1,1,220.0,220,True
2,1,1,30,300,3,1,1,1,330.0,330,True
3,2,0,40,400,1,1,1,1,440.0,440,True
4,2,1,50,500,2,1,1,1,550.0,550,False


The following type of expression is another literal, corresponding to an unknown parameter. 

In [15]:
beta1 = ex.Beta('beta1', 0.2, None, None, 0)
beta2 = ex.Beta('beta2', 0.4, None, None, 0)
beta3 = ex.Beta('beta3', 1, None, None, 1)
beta4 = ex.Beta('beta4', 0, None, None, 1)

Arithmetic operators are overloaded to allow standard manipulations of expressions. The first expression is $$e_1 = 2  \beta_1 - \frac{\exp(-\beta_2)}{\beta_2 (\beta_3 \geq \beta_4) + \beta_1 (\beta_3 < \beta_4)},$$
where $(\beta_2 \geq \beta_1)$ equals 1 if $\beta_2 \geq \beta_1$ and 0 otherwise.

In [16]:
expr1 = 2 * beta1 - ex.exp(-beta2) / (beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4))
print(expr1)

((`2.0` * beta1(init=0.2)) - (exp((-beta2(init=0.4))) / ((beta2(init=0.4) * (beta3(fixed=1) >= beta4(fixed=0))) + (beta1(init=0.2) * (beta3(fixed=1) < beta4(fixed=0))))))


The evaluation of expressions can be done in two ways. For simple expressions, the fonction getValue(), implemented in Python, returns the value of the expression.  

In [17]:
expr1.getValue()

-1.275800115089098

It is possible to modify the values of the parameters

In [18]:
newvalues = {'beta1': 1, 'beta2': 2, 'beta3': 3, 'beta4': 2}
expr1.change_init_values(newvalues)
expr1.getValue()

1.9323323583816936

The function getValue_c() is implemented in C++, and works for any expression. When use outside a specific context, the IDs must be explicitly prepared. 

In [19]:
expr1.getValue_c(prepareIds=True)

1.9323323583816936

It actually calls the function getValueAndDerivates(), and returns its first output (without calculating the derivatives).

In [20]:
f, g, h, bhhh = expr1.getValueAndDerivatives(prepareIds=True)

In [21]:
f

1.9323323583816936

In [22]:
g

array([2.        , 0.10150146])

In [23]:
h

array([[ 0.       ,  0.       ],
       [ 0.       , -0.1691691]])

In [24]:
bhhh

array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])

Note that the BHHH matrix is the outer product of the gradient with itself.

In [25]:
np.outer(g, g)

array([[4.        , 0.20300292],
       [0.20300292, 0.01030255]])

If the derivatives are not needed, their calculation can be skipped. Here, we calculate the gradient, but not the hessian.

In [26]:
expr1.getValueAndDerivatives(gradient=True, hessian=False, bhhh=False, prepareIds=True)

(1.9323323583816936, array([2.        , 0.10150146]), None, None)

It can also generate a function that takes the value of the parameters as argument, and provides a tuple with the value of the expression and its derivatives. By default, it returns the value of the function, its gradient and its hessian.

In [27]:
the_function = expr1.createFunction()

We evaluate it at one point...

In [28]:
the_function([1, 2])

(1.9323323583816936,
 array([2.        , 0.10150146]),
 array([[ 0.       ,  0.       ],
        [ 0.       , -0.1691691]]))

... and at another point.

In [29]:
the_function([10, -2])

(23.694528049465326,
 array([ 2.        , -1.84726402]),
 array([[0.        , 0.        ],
        [0.        , 1.84726402]]))

We can use it to check the derivatives

In [30]:
tools.checkDerivatives(the_function, [1, 2], logg=True)

x		Gradient	FinDiff		Difference 


x[0]           	+2.000000E+00	+2.000000E+00	-1.167734E-09 


x[1]           	+1.015015E-01	+1.015014E-01	+1.629049E-08 


Row		Col		Hessian	FinDiff		Difference 


x[0]           	x[0]           	+0.000000E+00	+0.000000E+00	+0.000000E+00 


x[0]           	x[1]           	+0.000000E+00	+0.000000E+00	+0.000000E+00 


x[1]           	x[0]           	+0.000000E+00	+0.000000E+00	+0.000000E+00 


x[1]           	x[1]           	-1.691691E-01	-1.691691E-01	-3.203118E-08 


(1.9323323583816936,
 array([2.        , 0.10150146]),
 array([[ 0.       ,  0.       ],
        [ 0.       , -0.1691691]]),
 array([-1.16773435e-09,  1.62904950e-08]),
 array([[ 0.00000000e+00,  0.00000000e+00],
        [ 0.00000000e+00, -3.20311803e-08]]))

But it is possible to also obtain the BHHH matrix.

In [31]:
the_function = expr1.createFunction(bhhh=True)
the_function([1, 2])

(1.9323323583816936,
 array([2.        , 0.10150146]),
 array([[ 0.       ,  0.       ],
        [ 0.       , -0.1691691]]),
 array([[4.        , 0.20300292],
        [0.20300292, 0.01030255]]))

It can take a database as input, and evaluate the expression  and its derivatives for each entry in the database.
In the following example, as no variable of the database is involved in the expression, the output of the expression is the same for each entry.

In [32]:
expr1.getValueAndDerivatives(database=myData, aggregation=False)

(array([1.93233236, 1.93233236, 1.93233236, 1.93233236, 1.93233236]),
 array([[2.        , 0.10150146],
        [2.        , 0.10150146],
        [2.        , 0.10150146],
        [2.        , 0.10150146],
        [2.        , 0.10150146]]),
 array([[[ 0.       ,  0.       ],
         [ 0.       , -0.1691691]],
 
        [[ 0.       ,  0.       ],
         [ 0.       , -0.1691691]],
 
        [[ 0.       ,  0.       ],
         [ 0.       , -0.1691691]],
 
        [[ 0.       ,  0.       ],
         [ 0.       , -0.1691691]],
 
        [[ 0.       ,  0.       ],
         [ 0.       , -0.1691691]]]),
 array([[[4.        , 0.20300292],
         [0.20300292, 0.01030255]],
 
        [[4.        , 0.20300292],
         [0.20300292, 0.01030255]],
 
        [[4.        , 0.20300292],
         [0.20300292, 0.01030255]],
 
        [[4.        , 0.20300292],
         [0.20300292, 0.01030255]],
 
        [[4.        , 0.20300292],
         [0.20300292, 0.01030255]]]))

If `aggregation`is set to `True`, the results are accumulated as a sum.

In [33]:
expr1.getValueAndDerivatives(database=myData, aggregation=True)

(9.661661791908468,
 array([10.        ,  0.50750731]),
 array([[ 0.        ,  0.        ],
        [ 0.        , -0.84584552]]),
 array([[20.        ,  1.01501462],
        [ 1.01501462,  0.05151273]]))

The following function scans the expression and extracts a dict with all free parameters.

In [34]:
expr1.set_of_elementary_expression(TypeOfElementaryExpression.FREE_BETA)

{'beta1', 'beta2'}

Options can be set to extract free parameters, fixed parameters, or both. 

In [35]:
expr1.set_of_elementary_expression(TypeOfElementaryExpression.FIXED_BETA)

{'beta3', 'beta4'}

In [36]:
expr1.set_of_elementary_expression(TypeOfElementaryExpression.BETA)

{'beta1', 'beta2', 'beta3', 'beta4'}

It is possible also to extract an elementary expression from its name.

In [37]:
expr1.getElementaryExpression('beta2')

beta2(init=2)

Let's consider an expression involving two variables $V_1$ and $V_2$: $$e_2 = 2  \beta_1 V_1 - \frac{\exp(-\beta_2 V_2)}{\beta_2 (\beta_3 \geq \beta_4) + \beta_1 (\beta_3 < \beta_4)},$$
where $(\beta_2 \geq \beta_1)$ equals 1 if $\beta_2 \geq \beta_1$ and 0 otherwise. Note that, in our example, the second term is numerically negligible with respect to the first one.

In [38]:
expr2 = 2 * beta1 * Variable1 - ex.exp(-beta2 * Variable2) / (beta2 * (beta3 >= beta4) + beta1 * (beta3 < beta4))
print(expr2)

(((`2.0` * beta1(init=1)) * Variable1) - (exp(((-beta2(init=2)) * Variable2)) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2))))))


It is not a simple expression anymore, and only the function `getValue_c` can be invoked. If we try the `getValue`function, it raises an exception.

In [39]:
try:
    expr2.getValue()
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')

Exception raised: Evaluating Variable Variable1 requires a database. Use the function getValue_c instead.


As the expression is called out of a specific context, it should be ibnstructed to prepare its IDs. Note that if no database is provided, an exception is raised when the formula contains variables. Indeed, the values of these variables cannot be found anywhere. 

In [40]:
try:
    expr2.getValue_c(prepareIds=True)
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')

Exception raised: No database is provided and an expression contains variables: {'Variable2', 'Variable1'}


In [41]:
expr2.getValue_c(database=myData, aggregation=False, prepareIds=True)

array([ 20.,  40.,  60.,  80., 100.])

The following function extracts the names of the parameters apprearing in the expression

In [42]:
expr2.set_of_elementary_expression(TypeOfElementaryExpression.BETA)

{'beta1', 'beta2', 'beta3', 'beta4'}

The list of parameters can also be obtained in the form of a dictionary.

In [43]:
expr2.dict_of_elementary_expression(TypeOfElementaryExpression.BETA)

{'beta1': beta1(init=1),
 'beta2': beta2(init=2),
 'beta3': beta3(fixed=3),
 'beta4': beta4(fixed=2)}

The list of variables can also be obtained in the form of a dictionary

In [44]:
expr2.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)

{'Variable1': Variable1, 'Variable2': Variable2}

or a set...

In [45]:
expr2.set_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)

{'Variable1', 'Variable2'}

Expressions are defined recursively, using a tree representation. The following function describes the type of the upper most node of the tree.

In [46]:
expr2.getClassName()

'Minus'

The signature is a formal representation of the expression, assigning identifiers to each node of the tree, and representing them starting from the leaves. It is easy to parse, and is passed to the C++ implementation. 

As the expression is used out of a specific context, it must be prepared before using it.

In [47]:
expr2.prepare(database=myData, numberOfDraws=0)

In [48]:
expr2.getStatusIdManager()

({'Variable1', 'Variable2', 'beta1', 'beta2', 'beta3', 'beta4'}, set())

In [49]:
print(expr2)

(((`2.0` * beta1(init=1)) * Variable1) - (exp(((-beta2(init=2)) * Variable2)) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2))))))


In [50]:
expr2.getSignature()

[b'<Numeric>{6127446992},2.0',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Times>{6127437456}(2),6127446992,4399541712',
 b'<Variable>{6127297616}"Variable1",6,2',
 b'<Times>{6127441616}(2),6127437456,6127297616',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<UnaryMinus>{6127436752}(1),6127298512',
 b'<Variable>{6127297936}"Variable2",7,3',
 b'<Times>{6127358480}(2),6127436752,6127297936',
 b'<exp>{6127358608}(1),6127358480',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<Beta>{6127335888}"beta3"[1],2,0',
 b'<Beta>{6127329744}"beta4"[1],3,1',
 b'<GreaterOrEqual>{6127228304}(2),6127335888,6127329744',
 b'<Times>{6127437776}(2),6127298512,6127228304',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Beta>{6127335888}"beta3"[1],2,0',
 b'<Beta>{6127329744}"beta4"[1],3,1',
 b'<Less>{6127444560}(2),6127335888,6127329744',
 b'<Times>{6127446736}(2),4399541712,6127444560',
 b'<Plus>{6127228688}(2),6127437776,6127446736',
 b'<Divide>{6127417616}(2),6127358608,6127228688',
 b'<Minus>{6127336592}(2),6127441616,612

The elementary expressions are
- free parameters,
- fixed parameters,
- random variables (for numerical integration),
- draws (for Monte-Carlo integration), and
- variables from the database.

The following function extracts all elementary expressions from a list of formulas, give them a unique numbering, and return them organized by group, as defined above (with the exception of the variables, that are directly available in the database).

In [51]:
collectionOfFormulas = [expr1, expr2]
formulas = IdManager(collectionOfFormulas, myData, None)


Unique numbering for all elementary expressions

In [52]:
formulas.elementary_expressions.indices

{'beta1': 0,
 'beta2': 1,
 'beta3': 2,
 'beta4': 3,
 'Person': 4,
 'Exclude': 5,
 'Variable1': 6,
 'Variable2': 7,
 'Choice': 8,
 'Av1': 9,
 'Av2': 10,
 'Av3': 11,
 'newvar_b': 12,
 'newvar_p': 13,
 'chaining_p': 14}

In [53]:
formulas.free_betas

ElementsTuple(expressions={'beta1': beta1(init=1), 'beta2': beta2(init=2)}, indices={'beta1': 0, 'beta2': 1}, names=['beta1', 'beta2'])

Each elementary expression has two ids. One unique index across all elementary expressions, and one unique within each specific group

In [54]:
[(i.elementaryIndex, i.betaId) for k, i in formulas.free_betas.expressions.items()]

[(0, 0), (1, 1)]

In [55]:
formulas.free_betas.names

['beta1', 'beta2']

In [56]:
formulas.fixed_betas

ElementsTuple(expressions={'beta3': beta3(fixed=3), 'beta4': beta4(fixed=2)}, indices={'beta3': 0, 'beta4': 1}, names=['beta3', 'beta4'])

In [57]:
[(i.elementaryIndex, i.betaId) for k, i in formulas.fixed_betas.expressions.items()]

[(2, 0), (3, 1)]

In [58]:
formulas.fixed_betas.names

['beta3', 'beta4']

In [59]:
formulas.random_variables

ElementsTuple(expressions={}, indices={}, names=[])

Monte Carlo integration is based on draws. 

In [60]:
myDraws = ex.bioDraws('myDraws', 'UNIFORM')
expr3 = ex.MonteCarlo(myDraws * myDraws)

In [61]:
print(expr3)

MonteCarlo((bioDraws("myDraws", "UNIFORM") * bioDraws("myDraws", "UNIFORM")))


Note that draws are not random variables, used for numerical integration.

In [62]:
expr3.set_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)

set()

The following function reports the draws involved in an expression.

In [63]:
expr3.set_of_elementary_expression(TypeOfElementaryExpression.DRAWS)

{'myDraws'}

The following function checks if draws are defined outside MonteCarlo, and return their names.

In [64]:
wrong_expression = myDraws + ex.MonteCarlo(myDraws * myDraws)
wrong_expression.check_draws()

{'myDraws'}

Checking the correct expression returns an empty set

In [65]:
expr3.check_draws()

set()

The expression is a Monte-Carlo integration.

In [66]:
expr3.getClassName()

'MonteCarlo'

Note that the draws are associated with a database. Therefore, the evaluation of expressions involving Monte Carlo integration can only be done on a database. If none is provided, an exception is raised.

In [67]:
try:
    expr3.getValue_c(numberOfDraws=number_of_draws)
except excep.BiogemeError as e:
    print(f'Exception raised: {e}')

Exception raised: An expression involving MonteCarlo integration must be associated with a database.


Here is its value. It is an approximation of $\int_0^1 x^2 dx=\frac{1}{3}$.

In [68]:
expr3.getValue_c(database=myData, numberOfDraws=number_of_draws, prepareIds=True)

array([0.3934398 , 0.298537  , 0.28521049, 0.34738013, 0.36951969])

Here is its signature.

In [69]:
expr3.prepare(database=myData, numberOfDraws=number_of_draws)
expr3.getSignature()

[b'<bioDraws>{6127498640}"myDraws",0,0',
 b'<bioDraws>{6127498640}"myDraws",0,0',
 b'<Times>{4369506192}(2),6127498640,6127498640',
 b'<MonteCarlo>{6127326416}(1),4369506192']

The same integral can be calculated using numerical integration, declaring a random variable. 

In [70]:
omega = ex.RandomVariable('omega')

Numerical integration calculates integrals between $-\infty$ and $+\infty$. Here, the interval being $[0,1]$, a change of variables is required.

In [71]:
a = 0
b = 1
x = a + (b - a) / ( 1 + ex.exp(-omega))
dx = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega))**(-2) 
integrand = x * x
expr4 = ex.Integrate(integrand * dx /(b - a), 'omega')

In this case, omega is a random variable.

In [72]:
expr4.dict_of_elementary_expression(TypeOfElementaryExpression.RANDOM_VARIABLE)

{'omega': omega}

In [73]:
print(expr4)

Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * ((`1.0` + exp((-omega))) ** `-2.0`))) / `1.0`), "omega")


The folllowing function checks if random variables are defined outside an Integrate statement.

In [74]:
wrong_expression =  x * x
wrong_expression.check_rv()

{'omega'}

The same function called from the correct expression returns an empty set.

In [75]:
expr4.check_rv()

set()

Calculating its value requires the C++ implementation.

In [76]:
expr4.getValue_c(myData, prepareIds=True)

array([0.33333231, 0.33333231, 0.33333231, 0.33333231, 0.33333231])

We illustrate now the Elem function. It takes two arguments: a dictionary, and a formula for the key. For each entry in the database, the formula is evaluated, and its result identifies which formula in the dictionary should be evaluated.
Here is 'Person' is 1, the expression is $$e_1=2  \beta_1 - \frac{\exp(-\beta_2)}{\beta_3 (\beta_2 \geq \beta_1)},$$ and if 'Person' is 2, the expression is $$e_2=2 \beta_1  V_1 - \frac{\exp(-\beta_2 V_2) }{ \beta_3  (\beta_2 \geq \beta_1)}.$$ As it is a regular expression, it can be included in any formula. Here, we illustrate it by dividing the result by 10.

In [77]:
elemExpr = ex.Elem({1: expr1, 2: expr2}, Person) 
expr5 =  elemExpr / 10
print(expr5)

({{1:((`2.0` * beta1(init=1)) - (exp((-beta2(init=2))) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2)))))), 2:(((`2.0` * beta1(init=1)) * Variable1) - (exp(((-beta2(init=2)) * Variable2)) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2))))))}[Person] / `10.0`)


In [78]:
expr5.dict_of_elementary_expression(TypeOfElementaryExpression.VARIABLE)

{'Person': Person, 'Variable1': Variable1, 'Variable2': Variable2}

Note that ` Variable1` and `Variable2`have previously been involved in another formula. Therefore, they have been numbered according to this formula, and this numbering is invalid for the new expression `expr5`. An error is triggered

In [79]:
try:
    expr5.getValue_c(database=myData)
except excep.BiogemeError as e:
    print(e)

Expression evaluated out of context. Set prepareIds to True.


In [80]:
expr5.getValue_c(database=myData, prepareIds=True)

array([ 0.19323324,  0.19323324,  0.19323324,  8.        , 10.        ])

In [81]:
testElem = ex.MonteCarlo(ex.Elem({1: myDraws * myDraws}, 1))

In [82]:
testElem.audit()

([], [])

The next expression is simply the sum of multiple expressions. The argument is a list of expressions. 

In [83]:
expr6 = ex.bioMultSum([expr1, expr2, expr4])

In [84]:
print(expr6)

bioMultSum(((`2.0` * beta1(init=1)) - (exp((-beta2(init=2))) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2)))))), (((`2.0` * beta1(init=1)) * Variable1) - (exp(((-beta2(init=2)) * Variable2)) / ((beta2(init=2) * (beta3(fixed=3) >= beta4(fixed=2))) + (beta1(init=1) * (beta3(fixed=3) < beta4(fixed=2)))))), Integrate(((((`0.0` + (`1.0` / (`1.0` + exp((-omega))))) * (`0.0` + (`1.0` / (`1.0` + exp((-omega)))))) * ((`1.0` * exp((-omega))) * ((`1.0` + exp((-omega))) ** `-2.0`))) / `1.0`), "omega"))


In [85]:
expr6.getValue_c(database=myData, numberOfDraws=number_of_draws, prepareIds=True)

array([ 22.26566467,  42.26566467,  62.26566467,  82.26566467,
       102.26566467])

We now illustrate how to calculate a logit model, that is $$ \frac{y_1 e^{V_1}}{y_0 e^{V_0}+y_1 e^{V_1}+y_2 e^{V_2}}$$ where $V_0=-\beta_1$, $V_1=-\beta_2$ and $V_2=-\beta_1$, and $y_i = 1$, $i=1,2,3$.

In [86]:
V = {0: -beta1, 1: -beta2, 2: -beta1}
av = {0: 1, 1: 1, 2: 1}
expr7 = ex._bioLogLogit(V, av, 1)

In [87]:
expr7.getValue()

-1.861994804058251

If the alternative is not in the choice set, an exception is raised.

In [88]:
expr7_wrong = ex.LogLogit(V, av, 3)
try:
    expr7_wrong.getValue()
except excep.BiogemeError as e:
    print(f'Exception: {e}')

Exception: Alternative 3 does not appear in the list of utility functions: dict_keys([0, 1, 2])


It is actually better to use the C++ implementation, available in the module models

In [89]:
expr8 = models.loglogit(V, av, 1)

In [90]:
expr8.getValue_c(database=myData, prepareIds=True)

array([-1.8619948, -1.8619948, -1.8619948, -1.8619948, -1.8619948])

As the result is a numpy array, it can be used for any calculation. Here, we show how to calculate the logsum

In [91]:
for v in V.values():
    print(v.getValue_c(database=myData, prepareIds=True))

[-1. -1. -1. -1. -1.]
[-2. -2. -2. -2. -2.]
[-1. -1. -1. -1. -1.]


In [92]:
logsum = np.log(np.sum([np.exp(v.getValue_c(database=myData, prepareIds=True)) 
                        for v in V.values()], axis=1))
logsum

array([ 0.60943791, -0.39056209,  0.60943791])

It is possible to calculate the derivative of a formula with respect to a literal: $$e_9=\frac{\partial e_8}{\partial \beta_2}.$$

In [93]:
expr9 = ex.Derive(expr8, 'beta2')

In [94]:
expr9.getValue_c(database=myData, prepareIds=True)

array([-0.8446376, -0.8446376, -0.8446376, -0.8446376, -0.8446376])

In [95]:
expr9.elementaryName

'beta2'

Biogeme also provides an approximation of the CDF of the normal distribution: $$e_{10}= \frac{1}{{\sigma \sqrt {2\pi } }}\int_{-\infty}^t e^{{{ - \left( {x - \mu } \right)^2 } \mathord{\left/ {\vphantom {{ - \left( {x - \mu } \right)^2 } {2\sigma ^2 }}} \right. } {2\sigma ^2 }}}dx$$

In [96]:
expr10 = ex.bioNormalCdf(Variable1 / 10 - 1)

In [97]:
expr10.getValue_c(database=myData, prepareIds=True)

array([0.5       , 0.84134475, 0.97724987, 0.9986501 , 0.99996833])

Min and max operators are also available. To avoid any ambiguity with the Python operator, they are called bioMin and bioMax. 

In [98]:
expr11 = ex.bioMin(expr5, expr10)
expr11.getValue_c(database=myData, prepareIds=True)

array([0.19323324, 0.19323324, 0.19323324, 0.9986501 , 0.99996833])

In [99]:
expr12 = ex.bioMax(expr5, expr10)
expr12.getValue_c(database=myData, prepareIds=True)

array([ 0.5       ,  0.84134475,  0.97724987,  8.        , 10.        ])

For the sake of efficiency, it is possible to specify explicitly a linear function, where each term is the product of a parameter and a variable.

In [100]:
terms = [(beta1, ex.Variable('Variable1')),
         (beta2, ex.Variable('Variable2')),
         (beta3, ex.Variable('newvar_b'))]

In [101]:
expr13 = ex.bioLinearUtility(terms)

In [102]:
expr13.getValue_c(database=myData, prepareIds=True)

array([ 540., 1080., 1620., 2160., 2700.])

In terms of specification, it is equivalent to the expression below. But the calculation of the derivatives is more efficient, as the linear structure of the specification is exploited.

In [103]:
expr13bis = beta1 * Variable1 + beta2 * Variable2 + beta3 * newvar_b

In [104]:
expr13bis.getValue_c(database=myData, prepareIds=True)

array([ 540., 1080., 1620., 2160., 2700.])

A Pythonic way to write a linear utility function

In [105]:
variables = ['v1', 'v2', 'v3', 'cost', 'time', 'headway']
coefficients = {f'{v}': ex.Beta(f'beta_{v}', 0, None, None, 0) 
                for v in variables}
terms = [coefficients[v] * ex.Variable(v) for v in variables]
util = sum(terms)
print(util)

((((((`0.0` + (beta_v1(init=0) * v1)) + (beta_v2(init=0) * v2)) + (beta_v3(init=0) * v3)) + (beta_cost(init=0) * cost)) + (beta_time(init=0) * time)) + (beta_headway(init=0) * headway))


If the data is organized a panel data, it means that several rows correspond to the same individual. The expression `PanelLikelihoodTrajectory` calculates the product of the expression evaluated for each row. If Monte Carlo integration is involved, the same draws are used for each them.

Our database contains 5 observations.

In [106]:
myData.getSampleSize()

5

In [107]:
myData.panel('Person')

Once the data has been labeled as "panel", it is considered that there are only two series of observations, corresponding to each person. Each of these observations is associated with several rows of observations.

In [108]:
myData.getSampleSize()

2

If we try to evaluate again the integral $\int_0^1 x^2 dx=\frac{1}{3}$, an exception is raised.

In [109]:
try:
    expr3.getValue_c(database=myData)
except excep.BiogemeError as e:
    print(f'Exception: {e}')

As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("myDraws", "UNIFORM") * bioDraws("myDraws", "UNIFORM"))) 


Exception: As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("myDraws", "UNIFORM") * bioDraws("myDraws", "UNIFORM")))


This is detected by the `audit` function, called before the expression is evaluated.

In [110]:
expr3.audit(database=myData)

(['As the database is panel, the argument of MonteCarlo must contain a PanelLikelihoodTrajectory: MonteCarlo((bioDraws("myDraws", "UNIFORM") * bioDraws("myDraws", "UNIFORM")))'],
 [])

We now evaluate an expression for panel data.

In [111]:
c1 = ex.bioDraws('draws1', 'NORMAL_HALTON2')
c2 = ex.bioDraws('draws2', 'NORMAL_HALTON2')
U1 = ex.Beta('beta1', 0, None, None, 0) * Variable1 + 10 * c1
U2 = ex.Beta('beta2', 0, None, None, 0) * Variable2 + 10 * c2
U3 = 0
U = {1: U1, 2: U2, 3: U3}
av = {1: Av1, 2: Av2, 3: Av3}
expr14 = ex.log(ex.MonteCarlo(ex.PanelLikelihoodTrajectory(models.logit(U, av, Choice))))

In [112]:
expr14.prepare(database=myData, numberOfDraws=number_of_draws)
expr14

log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((beta1(init=0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((beta2(init=0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))

In [113]:
expr14.getValue_c(database=myData, numberOfDraws=number_of_draws, prepareIds=True)

array([-3.91914292, -2.11209896])

In [114]:
expr14.getValueAndDerivatives(database=myData, numberOfDraws=number_of_draws, gradient=True, hessian=True, aggregation=False)

(array([-3.91914292, -2.11209896]),
 array([[-12.31921998,  76.80780015],
        [ -3.14130423,  68.58695767]]),
 array([[[  -165.65755306,   1546.42166536],
         [  1546.42166536, -16565.75530623]],
 
        [[  -987.62129533,   9777.04786414],
         [  9777.04786414, -98762.12953279]]]),
 array([[[ 151.76318103, -946.21218663],
         [-946.21218663, 5899.4381646 ]],
 
        [[   9.86779229, -215.45250047],
         [-215.45250047, 4704.17076195]]]))

In [115]:
expr14.getValueAndDerivatives(database=myData, numberOfDraws=number_of_draws, gradient=True, hessian=True, aggregation=True)

(-6.0312418791725335,
 array([-15.46052422, 145.39475782]),
 array([[  -1153.27884839,   11323.46952949],
        [  11323.46952949, -115327.88483902]]),
 array([[  161.63097331, -1161.6646871 ],
        [-1161.6646871 , 10603.60892655]]))

A Python function can also be obtained for this expression. Note that it is available only for the aggregated version, summing over the database.

In [116]:
the_function = expr14.createFunction(database=myData, numberOfDraws=number_of_draws, gradient=True, hessian=True)

In [117]:
the_function([0, 0])

(-6.0312418791725335,
 array([-15.46052422, 145.39475782]),
 array([[  -1153.27884839,   11323.46952949],
        [  11323.46952949, -115327.88483902]]))

In [118]:
the_function([0.1, 0.1])

(-49.645666583910895,
 array([  39.99999992, -553.04056325]),
 array([[-1.62802916e-06,  1.46098013e-05],
        [ 1.46098013e-05, -1.18518603e+03]]))

It is possible to fix the value of some (or all) beta parameters

In [119]:
print(expr14)

log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((beta1(init=0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((beta2(init=0) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))


In [120]:
expr14.fix_betas({'beta2': 0.123})

In [121]:
print(expr14)

log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((beta1(init=0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((beta2(fixed=0.123) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))


The name of the parameter can also be changed while fixing its value.

In [122]:
expr14.fix_betas({'beta2': 123}, prefix='prefix_', suffix='_suffix')

In [123]:
print(expr14)

log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((beta1(init=0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((prefix_beta2_suffix(fixed=123) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))


It can also be renamed using the following function.

In [124]:
expr14.rename_elementary(['prefix_beta2_suffix'], prefix='PREFIX_', suffix='_SUFFIX')

In [125]:
print(expr14)

log(MonteCarlo(PanelLikelihoodTrajectory(exp(_bioLogLogit[choice=Choice]U=(1:((beta1(init=0) * Variable1) + (`10.0` * bioDraws("draws1", "NORMAL_HALTON2"))), 2:((PREFIX_prefix_beta2_suffix_SUFFIX(fixed=123) * Variable2) + (`10.0` * bioDraws("draws2", "NORMAL_HALTON2"))), 3:`0.0`)av=(1:Av1, 2:Av2, 3:Av3)))))


# Signatures

The Python library communicates the expressions to the C++ library using a syntax called a "signature". We describe and illustrate now the signature for each expression. Each expression is identified by an identifier provided by Python using the function 'id'. 

In [126]:
id(expr1)

6125185744

## Numerical expression

&lt;Numeric&gt;{identifier},value

In [127]:
ex.Numeric(0).getSignature()

[b'<Numeric>{6129194448},0.0']

## Beta parameters

&lt;Beta&gt;{identifier}"name"[status],uniqueId,betaId'
where 
- status is 0 for free parameters, and non zero for fixed parameters,
- uniqueId is a unique index given by Biogeme to all elementary expressions,
- betaId is a unique index given by Biogeme to all free parameters, and to all fixed parameters.

As the signature requires an Id, we need to prepare the expression first. 

In [128]:
beta1.prepare(database=myData, numberOfDraws=0)
beta1.getSignature()

[b'<Beta>{4399541712}"beta1"[0],0,0']

In [129]:
beta3.prepare(database=myData, numberOfDraws=0)
beta3.getSignature()

[b'<Beta>{6127335888}"beta3"[1],0,0']

## Variables

&lt;Variable&gt;{identifier}"name",uniqueId,variableId 
where
- uniqueId is a unique index given by Biogeme to all elementary expressions,
- variableId is a unique index given by Biogeme to all variables.


In [130]:
Variable1.getSignature()

[b'<Variable>{6127297616}"Variable1",6,2']

## Random variables

&lt;RandomVariable&gt;{identifier}"name",uniqueId,randomVariableId
where
- uniqueId is a unique index given by Biogeme to all elementary expressions,
- randomVariableId is a unique index given by Biogeme to all random variables.

In [131]:
omega.prepare(database=myData, numberOfDraws=0)
omega.getSignature()

[b'<RandomVariable>{6127177744}"omega",0,0']

## Draws

&lt;bioDraws&gt;{identifier}"name",uniqueId,drawId
where
- uniqueId is a unique index given by Biogeme to all elementary expressions,
- drawId is a unique index given by Biogeme to all draws.


In [132]:
myDraws.prepare(database=myData, numberOfDraws=number_of_draws)
myDraws.getSignature()

[b'<bioDraws>{6127498640}"myDraws",0,0']

## General expression

<code>&lt;operator&gt;{identifier}(numberOfChildren),idFirstChild,idSecondChild,idThirdChild,</code> etc...
where the number of identifiers given after the comma matches the reported number of children. 

Specific examples are reported below.

### Binary operator

<code>&lt;operator&gt;{identifier}(2),idFirstChild,idSecondChild </code>
where operator is one of: 
    - 'Plus'
    - 'Minus'
    - 'Times'
    - 'Divide'
    - 'Power'
    - 'bioMin'
    - 'bioMax'
    - 'And'
    - 'Or'
    - 'Equal'
    - 'NotEqual'
    - 'LessOrEqual'
    - 'GreaterOrEqual'
    - 'Less'
    - 'Greater'


In [133]:
sum = beta1 + Variable1

In [134]:
sum.getSignature()

[b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Variable>{6127297616}"Variable1",6,2',
 b'<Plus>{6129459728}(2),4399541712,6127297616']

### Unary operator

&lt;operator&gt;{identifier}(1),idChild, 
where operator is one of: 
    - 'UnaryMinus'
    - 'MonteCarlo'
    - 'bioNormalCdf'
    - 'PanelLikelihoodTrajectory'
    - 'exp'
    - 'log'

In [135]:
m = -beta1

In [136]:
m.getSignature()

[b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<UnaryMinus>{6129462160}(1),4399541712']

## LogLogit

&lt;LogLogit&gt;{identifier}(nbrOfAlternatives),chosenAlt,altNumber,utility,availability,altNumber,utility,availability, etc.

In [137]:
expr7.prepare(database=myData, numberOfDraws=number_of_draws)
expr7.getSignature()

[b'<Numeric>{6127334928},1.0',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<UnaryMinus>{6127463696}(1),4399541712',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<UnaryMinus>{6127522256}(1),6127298512',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<UnaryMinus>{6129189840}(1),4399541712',
 b'<Numeric>{6127358800},1.0',
 b'<Numeric>{6127325392},1.0',
 b'<Numeric>{6127325456},1.0',
 b'<_bioLogLogit>{6129203152}(3),6127334928,0,6127463696,6127358800,1,6127522256,6127325392,2,6129189840,6127325456']

## Derive

&lt;Derive&gt;{identifier},id of expression to derive,unique index of elementary expression

In [138]:
expr9.prepare(database=myData, numberOfDraws=number_of_draws)
expr9.getSignature()

[b'<Numeric>{6129249168},1.0',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<UnaryMinus>{6127463696}(1),4399541712',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<UnaryMinus>{6127522256}(1),6127298512',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<UnaryMinus>{6129189840}(1),4399541712',
 b'<Numeric>{6129201872},1.0',
 b'<Numeric>{6129203600},1.0',
 b'<Numeric>{6127530768},1.0',
 b'<_bioLogLogit>{6129192272}(3),6129249168,0,6127463696,6129201872,1,6127522256,6129203600,2,6129189840,6127530768',
 b'<Derive>{6129250256},6129192272,1']

## Integrate

&lt;Integrate&gt;{identifier},id of expression to derive,index of random variable

In [139]:
expr4.prepare(database=myData, numberOfDraws=number_of_draws)
expr4.getSignature()

[b'<Numeric>{6127518480},0.0',
 b'<Numeric>{6059787920},1.0',
 b'<Numeric>{6125182096},1.0',
 b'<RandomVariable>{6127177744}"omega",0,0',
 b'<UnaryMinus>{6127524816}(1),6127177744',
 b'<exp>{6127415760}(1),6127524816',
 b'<Plus>{6059169616}(2),6125182096,6127415760',
 b'<Divide>{6127445584}(2),6059787920,6059169616',
 b'<Plus>{6059787728}(2),6127518480,6127445584',
 b'<Numeric>{6127518480},0.0',
 b'<Numeric>{6059787920},1.0',
 b'<Numeric>{6125182096},1.0',
 b'<RandomVariable>{6127177744}"omega",0,0',
 b'<UnaryMinus>{6127524816}(1),6127177744',
 b'<exp>{6127415760}(1),6127524816',
 b'<Plus>{6059169616}(2),6125182096,6127415760',
 b'<Divide>{6127445584}(2),6059787920,6059169616',
 b'<Plus>{6059787728}(2),6127518480,6127445584',
 b'<Times>{6127518928}(2),6059787728,6059787728',
 b'<Numeric>{6127525136},1.0',
 b'<RandomVariable>{6127177744}"omega",0,0',
 b'<UnaryMinus>{6127518800}(1),6127177744',
 b'<exp>{6127522512}(1),6127518800',
 b'<Times>{6127526672}(2),6127525136,6127522512',
 b'<Num

## Elem

&lt;Elem&gt;{identifier}(numberOfExpressions),keyId,value1,expression1,value2,expression2, etc...

where
- keyId is the identifier of the expression calculating the key,
- the number of pairs valuex,expressionx must correspond to the value of numberOfExpressions

In [140]:
elemExpr.prepare(database=myData, numberOfDraws=number_of_draws)
elemExpr.getSignature()

[b'<Variable>{6127300048}"Person",4,0',
 b'<Numeric>{6127330704},2.0',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Times>{6127326224}(2),6127330704,4399541712',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<UnaryMinus>{6127322576}(1),6127298512',
 b'<exp>{6127177040}(1),6127322576',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<Beta>{6127335888}"beta3"[1],2,0',
 b'<Beta>{6127329744}"beta4"[1],3,1',
 b'<GreaterOrEqual>{4599328976}(2),6127335888,6127329744',
 b'<Times>{6127175632}(2),6127298512,4599328976',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Beta>{6127335888}"beta3"[1],2,0',
 b'<Beta>{6127329744}"beta4"[1],3,1',
 b'<Less>{4610699536}(2),6127335888,6127329744',
 b'<Times>{6127325648}(2),4399541712,4610699536',
 b'<Plus>{4610700304}(2),6127175632,6127325648',
 b'<Divide>{4679416976}(2),6127177040,4610700304',
 b'<Minus>{6125185744}(2),6127326224,4679416976',
 b'<Numeric>{6127446992},2.0',
 b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Times>{6127437456}(2),6127446992,4399541712',
 b'<Variable>{612

## bioLinearUtility

&lt;bioLinearUtility&gt;{identifier}(numberOfTerms), beta1_exprId, beta1_uniqueId, beta1_name, variable1_exprId, variable1_uniqueId, variable1_name, etc...

where 6 entries are provided for each term:
    - beta1_exprId is the expression id of the beta parameter
    - beta1_uniqueId is the unique id of the beta parameter
    - beta1_name is the name of the parameter
    - variable1_exprId is the expression id of the variable
    - variable1_uniqueId is the unique id of the variable
    - variable1_name is the name of the variable


In [141]:
expr13.prepare(database=myData, numberOfDraws=number_of_draws)
expr13.getSignature()

[b'<Beta>{4399541712}"beta1"[0],0,0',
 b'<Beta>{6127298512}"beta2"[0],1,1',
 b'<Beta>{6127335888}"beta3"[1],2,0',
 b'<Variable>{6040895184}"Variable1",5,2',
 b'<Variable>{6127325968}"Variable2",6,3',
 b'<Variable>{6127358288}"newvar_b",11,8',
 b'<bioLinearUtility>{6129204240}(3),4399541712,0,beta1,6040895184,5,Variable1,6127298512,1,beta2,6127325968,6,Variable2,6127335888,2,beta3,6127358288,11,newvar_b']