File 02-binary_netherlands-heterogeneity


Michel Bierlaire

Sun Aug 11 17:48:19 2024




In [None]:



import pandas as pd
import biogeme.database as db
import biogeme.biogeme as bio
from IPython.core.display_functions import display
from biogeme.expressions import Beta, Variable, log, exp


The goal of this computer session is to investigate the
heterogeneity of taste in the population.

We are using the binary transportation mode choice data, collected in
the Netherlands. The data set is available as
http://transp-or.epfl.ch/data/netherlands.dat
and its description is available
http://transp-or.epfl.ch/documents/technicalReports/CS_NetherlandsDescription.pdf.

# Data preparation

In [None]:
df = pd.read_csv('netherlands.dat', sep='\t')
database = db.Database('netherlands', df)

sp = Variable('sp')
rail_ivtt = Variable('rail_ivtt')
rail_acc_time = Variable('rail_acc_time')
rail_egr_time = Variable('rail_egr_time')
car_ivtt = Variable('car_ivtt')
car_walk_time = Variable('car_walk_time')
car_cost = Variable('car_cost')
rail_cost = Variable('rail_cost')
choice = Variable('choice')

exclude = sp != 0
database.remove(exclude)

rail_time = rail_ivtt + rail_acc_time + rail_egr_time
car_time = car_ivtt + car_walk_time
DUTCH_GUILDERS_TO_EUROS = 0.44378022
car_cost_euro = car_cost * DUTCH_GUILDERS_TO_EUROS
rail_cost_euro = rail_cost * DUTCH_GUILDERS_TO_EUROS


# Base model
We first define the base model, where no heterogeneity of taste is considered.

In [None]:
asc_car = Beta('asc_car', 0, None, None, 0)
beta_cost = Beta('beta_cost', 0, None, None, 0)
beta_time_car = Beta('beta_time_car', 0, None, None, 0)
beta_time_rail = Beta('beta_time_rail', 0, None, None, 0)

v_car = asc_car + beta_cost * car_cost_euro + beta_time_car * car_time
v_rail = beta_cost * rail_cost_euro + beta_time_rail * rail_time
prob_car = 1 / (1 + exp(v_rail - v_car))
prob_rail = 1 - prob_car
prob_observation = prob_car * (choice == 0) + prob_rail * (choice == 1)
logprob = log(prob_observation)
biogeme = bio.BIOGEME(database, logprob)
biogeme.modelName = 'binary_netherlands_socio_eco_base'
results_base = biogeme.estimate()

print(results_base.print_general_statistics())


In [None]:
display(results_base.get_estimated_parameters())


# Questions
1. Split the database into two parts: one with only business trips, and one with only non business trips, and
re-estimate the base model on each of them.
2. Write a model that includes two set of parameters, one for business and one for other purposes, and estimate it
on the full data set. Compare the results with the models estimated on the separate data sets.
3. Now, impose the coefficient of travel time by car to be the same for business trip and non business trips. Which
model would you prefer?