In [1]:
import numpy as np
import pandas as pd
import biogeme.expressions as ex
import biogeme.exceptions as excep
import biogeme.database as db

# Simple expressions

Simple expressions can be calculated both with the functions `getValue`(implemented in Python) and the `getValue_c` (implemented in C++). They do not require a database.

In [2]:
x = ex.Beta('x', 2, None, None, 0)
y = ex.Beta('y', 3, None, None, 0)
one = ex.Numeric(1)

## Addition

In [3]:
z = x + y
z.getValue()

5

In [4]:
z.getValue_c()

5.0

## Substraction

In [5]:
z = x - y
z.getValue()

-1

In [6]:
z.getValue_c()

-1.0

## Multiplication

In [7]:
z = x * y
z.getValue()

6

In [8]:
z.getValue_c()

6.0

## Division

In [9]:
z = x / y
z.getValue()

0.6666666666666666

In [10]:
z.getValue_c()

0.6666666666666666

## Power

In [11]:
z = x ** y
z.getValue()

8

In [12]:
z.getValue_c()

8.0

## Exponential

In [13]:
z = ex.exp(x)
z.getValue()

7.38905609893065

In [14]:
z.getValue_c()

7.38905609893065

## Logarithm

In [15]:
z = ex.log(x)
z.getValue()

0.6931471805599453

In [16]:
z.getValue_c()

0.6931471805599453

## Minimum

In [17]:
z = ex.bioMin(x, y)
z.getValue()

2

In [18]:
z.getValue_c()

2.0

## Maximum

In [19]:
z = ex.bioMax(x, y)
z.getValue()

3

In [20]:
z.getValue_c()

3.0

## And

In [21]:
z = x & y
z.getValue()

1.0

In [22]:
z.getValue_c()

1.0

In [23]:
z = x & 0
z.getValue()

0.0

In [24]:
z.getValue_c()

0.0

## Or

In [25]:
z = x | y
z.getValue()

1.0

In [26]:
z.getValue_c()

1.0

In [27]:
z = ex.Numeric(0) | ex.Numeric(0)
z.getValue()

0.0

In [28]:
z.getValue_c()

0.0

## Equal

In [29]:
z = x == y
z.getValue()

0

In [30]:
z.getValue_c()

0.0

In [31]:
z = (x + 1) == y
z.getValue()

1

In [32]:
z.getValue_c()

1.0

## Not equal

In [33]:
z = x != y
z.getValue()

1

In [34]:
z.getValue_c()

1.0

In [35]:
z = (x + 1) != y
z.getValue()

0

In [36]:
z.getValue_c()

0.0

## Lesser or equal

In [37]:
z = x <= y
z.getValue()

1

In [38]:
z.getValue_c()

1.0

## Greater or equal

In [39]:
z = x >= y
z.getValue()

0

In [40]:
z.getValue_c()

0.0

## Lesser than

In [41]:
z = x < y
z.getValue()

1

In [42]:
z.getValue_c()

1.0

## Greater than

In [43]:
z = x > y
z.getValue()

0

In [44]:
z.getValue_c()

0.0

## Opposite

In [45]:
z = -x
z.getValue()

-2

In [46]:
z.getValue_c()

-2.0

## Sum of multiples expressions

In [47]:
listOfExpressions = [x, y, 1+x, 1+y]
z = ex.bioMultSum(listOfExpressions)
z.getValue()

12.0

In [48]:
z.getValue_c()

12.0

The result is the same as the following, but it implements the sum in a more efficient way. 

In [49]:
z = x + y + 1 + x + 1 + y
z.getValue()

12

In [50]:
z.getValue_c()

12.0

## Element

This expression considers a dictionary of expressions, and an expression for the index. The index is evaluated, and the value of the corresponding expression in the dictionary is returned. 

In [51]:
my_dict = {1: ex.exp(-1), 2: ex.log(1.2), 3: 1234}

In [52]:
index = x
index.getValue()

2

In [53]:
z = ex.Elem(my_dict, index)
z.getValue()

0.1823215567939546

In [54]:
z.getValue_c()

0.18232159653038368

In [55]:
index = x - 1
index.getValue()

1

In [56]:
z = ex.Elem(my_dict, index)
z.getValue()

0.36787944117144233

In [57]:
z.getValue_c()

0.36787944117144233

In [58]:
index = x - 2
index.getValue()

0

If the value returned as index does not corresponds to aan entry in the dictionary, an exception is raised.

In [59]:
z = ex.Elem(my_dict, index)
try:
    z.getValue()
except excep.biogemeError as e:
    print(f'Exception raised: {e}')

Exception raised: Key 0 is not present in the dictionary. Available keys: dict_keys([1, 2, 3])


In [60]:
z = ex.Elem(my_dict, index)
try:
    z.getValue_c()
except RuntimeError as e:
    print(f'Exception raised: {e}')

Exception raised: src/bioExprElem.cc:57: Biogeme exception: Key (-(x lit[0],free[0](2),2)=0) is not present in dictionary: 
  1: exp(-1)
  2: log(1.2)
  3: 1234



# Complex expressions

An expressions is deemed complex in Biogeme when the `getValue` function is not available. Only the `getValue_c` function must be used. It calculates the expressions using a C++ implementation of the expression.

## Normal CDF

It calculates the $$\Phi(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x} e^{-\frac{1}{2}\omega^2}d\omega.$$

In [61]:
z = ex.bioNormalCdf(x)
z.getValue_c()

0.9772498680518218

In [62]:
z = ex.bioNormalCdf(0)
z.getValue_c()

0.5

## Derivative

In [63]:
z = 30 * x + 20 * y

In [64]:
zx = ex.Derive(z, 'x')
zx.getValue_c()

30.0

In [65]:
zx = ex.Derive(z, 'y')
zx.getValue_c()

20.0

## Integral

Let's calculate the integral of the pdf of a normal distribution: $$\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{+\infty} e^{-\frac{1}{2}\omega^2}d\omega = 1.$$

In [66]:
omega = ex.RandomVariable('omega')
pdf = ex.exp(-omega * omega / 2)
z = ex.Integrate(pdf, 'omega') / np.sqrt(2 * np.pi)
z.getValue_c()

0.9999999998856622

In order to change the bounds of integration, a change of variables must be performed. Let's calculate $$\int_0^1 x^2 dx=\frac{1}{3}.$$
If $a$ is the lower bound of integration, and $b$ is the upper bound, the change of variable is $$x = a + \frac{b-a}{1+e^{-\omega}},$$ and $$dx = \frac{(b-a)e^{-\omega}}{(1+e^{-\omega})^2}d\omega.$$

In [67]:
a = 0
b = 1
t = a + (b - a) / (1 + ex.exp(-omega))
dt = (b - a) * ex.exp(-omega) * (1 + ex.exp(-omega)) ** -2 
integrand = t * t
z = ex.Integrate(integrand * dt / (b - a), 'omega')
z.getValue_c()

0.3333323120662823

# Expressions using a database

In [68]:
df = pd.DataFrame({'Person': [1, 1, 1, 2, 2],
                   'Exclude': [0, 0, 1, 0, 1],
                   'Variable1': [10, 20, 30, 40, 50],
                   'Variable2': [100, 200, 300, 400, 500],
                   'Choice': [2, 2, 3, 1, 2],
                   'Av1': [0, 1, 1, 1, 1],
                   'Av2': [1, 1, 1, 1, 1],
                   'Av3': [0, 1, 1, 1, 1]})
myData = db.Database('test', df)

## Linear utility

It defines a linear conbinations of parameters are variables.

In [69]:
beta1 = ex.Beta('beta1', 10, None, None, 0)
beta2 = ex.Beta('beta2', 20, None, None, 0)
v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')

In [70]:
listOfTerms = [
    (beta1, v1),
    (beta2, v2),
]
z = ex.bioLinearUtility(listOfTerms)
z.getValue_c(database=myData)

array([ 2100.,  4200.,  6300.,  8400., 10500.])

It is equivalent to the following, but implemented in a more efficient way.

In [71]:
z = beta1 * v1 + beta2 * v2
z.getValue_c(database=myData)

array([ 2100.,  4200.,  6300.,  8400., 10500.])

## Monte Carlo

We approximate the integral $$\int_0^1 x^2 dx=\frac{1}{3}$$ using Monte-Carlo integration. As draws require a database, it is calculated for each entry in the database.

In [72]:
draws = ex.bioDraws('draws', 'UNIFORM')
z = ex.MonteCarlo(draws * draws)
z.getValue_c(database=myData)

array([0.32652819, 0.35369321, 0.33668803, 0.33242872, 0.31823826])

## Panel Trajectory

We first calculate a quantity for each entry in the database.

In [73]:
v1 = ex.Variable('Variable1')
v2 = ex.Variable('Variable2')
p = v1 / (v1 + v2)
p.getValue_c(database=myData)

array([0.09090909, 0.09090909, 0.09090909, 0.09090909, 0.09090909])

We now declare the data as "panel", based on the identified `Person`. It means that the first three rows correspond to a sequence of three observations for individual 1, and the last two, a sequence of two observations for individual 2. The panel trajectory calculates the expression for each row associated with an individual, and calculate the product. 

In [74]:
myData.panel('Person')

In this case, we expect the following for individual 1:

In [75]:
0.09090909 ** 3

0.0007513147783621339

And the following for individual 2:

In [76]:
0.09090909 ** 2

0.0082644626446281

We verify that it is indeed the case:

In [77]:
z = ex.PanelLikelihoodTrajectory(p)
z.getValue_c(database=myData)

array([0.00075131, 0.00826446])