# Module biogeme.tools

## Examples of use of each function

This webpage is for programmers who need examples of use of the functions of the class. The examples are designed to illustrate the syntax. They do not correspond to any meaningful model. For examples of models, visit  [biogeme.epfl.ch](http://biogeme.epfl.ch).

In [1]:
import datetime
print(datetime.datetime.now())

2023-08-04 17:51:56.941883


In [2]:
import biogeme.version as ver
print(ver.getText())

biogeme 3.2.12 [2023-08-04]
Home page: http://biogeme.epfl.ch
Submit questions to https://groups.google.com/d/forum/biogeme
Michel Bierlaire, Transport and Mobility Laboratory, Ecole Polytechnique Fédérale de Lausanne (EPFL)



In [3]:
import numpy as np
import pandas as pd

In [4]:
import biogeme.tools as tools
import biogeme.logging as blog
import biogeme.exceptions as excep

In [5]:
logger = blog.get_screen_logger(level=blog.INFO)

Define a function and its derivatives: $$f = \log(x_0) + \exp(x_1),$$ $$g = \left( \begin{array}{c} \frac{1}{x_0} \\ \exp(x_1)\end{array}\right),$$ $$h=\left( \begin{array}{cc} -\frac{1}{x_0^2} & 0 \\ 0 & \exp(x_1)\end{array}\right).$$

In [6]:
def myFunction(x):
    f = np.log(x[0]) + np.exp(x[1])
    g = np.empty(2)
    g[0] = 1.0 / x[0]
    g[1] = np.exp(x[1])
    H = np.empty((2, 2))
    H[0,0] = - 1.0 / x[0]**2
    H[0,1] = 0.0
    H[1,0] = 0.0
    H[1,1] = np.exp(x[1])
    return f, g, H

Evaluate the function at the point $$x = \left( \begin{array}{c}1 \\ 1 \end{array}\right).$$

In [7]:
x = np.array([1.1, 1.1])
f, g, H = myFunction(x)

In [8]:
f

3.099476203750758

In [9]:
g

array([0.90909091, 3.00416602])

In [10]:
H

array([[-0.82644628,  0.        ],
       [ 0.        ,  3.00416602]])

Calculates an approximation of the gradient by finite differences.

In [11]:
g_fd = tools.findiff_g(myFunction, x)

In [12]:
g_fd

array([0.90909087, 3.00416619])

Check the precision of the approximation

In [13]:
g - g_fd

array([ 4.18595454e-08, -1.64594663e-07])

Calculates an approximation of the Hessian by finite differences.

In [14]:
H_fd = tools.findiff_H(myFunction, x)

In [15]:
H_fd

array([[-0.8264462 ,  0.        ],
       [ 0.        ,  3.00416619]])

Check the precision of the approximation

In [16]:
H - H_fd

array([[-8.26465610e-08,  0.00000000e+00],
       [ 0.00000000e+00, -1.64594663e-07]])

There is a function that checks the analytical derivatives by comparing them to their finite difference approximation.

In [17]:
f, g, h, gdiff, hdiff = \
    tools.checkDerivatives(myFunction, x, names=None, logg=True)

x		Gradient	FinDiff		Difference 


x[0]           	+9.090909E-01	+9.090909E-01	+4.185955E-08 


x[1]           	+3.004166E+00	+3.004166E+00	-1.645947E-07 


Row		Col		Hessian	FinDiff		Difference 


x[0]           	x[0]           	-8.264463E-01	-8.264462E-01	-8.264656E-08 


x[0]           	x[1]           	+0.000000E+00	+0.000000E+00	+0.000000E+00 


x[1]           	x[0]           	+0.000000E+00	+0.000000E+00	+0.000000E+00 


x[1]           	x[1]           	+3.004166E+00	+3.004166E+00	-1.645947E-07 


To help reading the reporting, it is possible to give names to variables.

In [18]:
f, g, h, gdiff, hdiff = tools.checkDerivatives(myFunction,
                                               x,
                                               names=['First',
                                                      'Second'],
                                               logg=True)

x		Gradient	FinDiff		Difference 


First          	+9.090909E-01	+9.090909E-01	+4.185955E-08 


Second         	+3.004166E+00	+3.004166E+00	-1.645947E-07 


Row		Col		Hessian	FinDiff		Difference 


First          	First          	-8.264463E-01	-8.264462E-01	-8.264656E-08 


First          	Second         	+0.000000E+00	+0.000000E+00	+0.000000E+00 


Second         	First          	+0.000000E+00	+0.000000E+00	+0.000000E+00 


Second         	Second         	+3.004166E+00	+3.004166E+00	-1.645947E-07 


In [19]:
gdiff

array([ 4.18595454e-08, -1.64594663e-07])

In [20]:
hdiff

array([[-8.26465610e-08,  0.00000000e+00],
       [ 0.00000000e+00, -1.64594663e-07]])

# Prime numbers

Calculate prime numbers lesser or equal to an upper bound

In [21]:
myprimes = tools.calculate_prime_numbers(10)

In [22]:
myprimes

[2, 3, 5, 7]

In [23]:
myprimes = tools.calculate_prime_numbers(100)

In [24]:
myprimes

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97]

Calculate a given number of prime numbers

In [25]:
myprimes = tools.get_prime_numbers(7)
myprimes

[2, 3, 5, 7, 11, 13, 17]

# Counting groups of data

In [26]:
alist = [1, 2, 2, 3, 3, 3, 4, 1, 1]

In [27]:
df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 1, 2, 3], 
                   'value':[1000, 
                            2000, 
                            3000, 
                            4000, 
                            5000, 
                            5000, 
                            10000, 
                            20000]})

In [28]:
tools.countNumberOfGroups(df,'ID')

6

In [29]:
tools.countNumberOfGroups(df,'value')

7

# Likelihood ratio test

In [30]:
model1 = (-1340.8, 5)
model2 = (-1338.49, 7)

A likelihood ratio test is performed. The function returns the outcome of the test, the statistic, and the threshold. 

In [31]:
tools.likelihood_ratio_test(model1, model2)

LRTuple(message='H0 cannot be rejected at level 5.0%', statistic=4.619999999999891, threshold=5.991464547107979)

The default level of significance is 0.95. It can be changed. 

In [32]:
tools.likelihood_ratio_test(model1, model2, significance_level=0.9)

LRTuple(message='H0 can be rejected at level 90.0%', statistic=4.619999999999891, threshold=0.21072103131565265)

The order in which the models are presented is irrelevant

In [33]:
tools.likelihood_ratio_test(model2, model1)

LRTuple(message='H0 cannot be rejected at level 5.0%', statistic=4.619999999999891, threshold=5.991464547107979)

But the unrestricted model must have a higher loglikelihood than the restricted one.

In [34]:
model1 = (-1340.8, 7)
model2 = (-1338.49, 5)

In [35]:
try:
    tools.likelihood_ratio_test(model1, model2)
except excep.BiogemeError as e:
    print(e)

The unrestricted model (-1340.8, 7) has a lower log likelihood than the restricted one (-1338.49, 5)
