# Using NP4VTT: Random valuation (RV) model

In this notebook, we show how to use a RV model to estimate the distribution of the Value of Travel Time (VTT) from the Norway data.

## Step 1: Load modules and data, and create arrays

We first import the NP4VTT modules for creating the arrays, the RV model, and Pandas to load the dataset:

In [11]:
import pandas as pd

from py_np4vtt.data_format import Vars
from py_np4vtt.model_rv import ModelRV, ConfigRV
from py_np4vtt.data_import import make_modelarrays, compute_descriptives


Now we read the CSV file:

In [12]:
df = pd.read_table('../data/Norway2009VTT_demodata.txt')
df.head()

Unnamed: 0,RespID,Mode,TravTime,BaseCost,Gender,AgeClass,IncClass,TravTimeClass,Purpose,CardID,...,TimeL,TimeR,Chosen,Quadrant2,Purpose2,Mode2,Income2,ExclGroup,Exclude_CDF,CS
0,88,1,25,27,1,3,5,1,1,1,...,32,25,1,4,1,1,3,4,1,1
1,88,1,25,27,1,3,5,1,1,2,...,25,28,2,2,1,1,3,4,1,2
2,88,1,25,27,1,3,5,1,1,3,...,29,25,1,4,1,1,3,4,1,3
3,88,1,25,27,1,3,5,1,1,4,...,32,25,1,2,1,1,3,4,1,4
4,88,1,25,27,1,3,5,1,1,5,...,29,32,2,2,1,1,3,4,1,5


The dataset contains 22 variables. Each row is a binary choice task. We will use:

* `RespID`: ID of each respondent.
* `Chosen`: Chosen alternative.
* `CostL` and `CostR`: Travel cost of alternatives 1 and 2, respectively.
* `TimeL` and `TimeR`: Travel time of alternatives 1 and 2, respectively.

NP4VTT detects automatically the _slow-cheap_ and _fast-expensive_ alternative, for each choice situation. If NP4VTT finds violations to those two options (e.g., a fast-cheap) alternative, it will raise an error message.

Now we create a dictionary to map the required variables for NP4VTT with the variables of the dataset:

Change currency of travel time to euros and change unit of travel time to hours

In [13]:
# Convert to euros
NOK2euro_exchange_rate = 9
df[['CostL','CostR']] = df[['CostL','CostR']] .div(NOK2euro_exchange_rate)

# convert to hours
df[['TimeL','TimeR']] = df[['TimeL','TimeR']] .div(60)

Now we create a dictionary to map the required variables for NP4VTT with the variables of the dataset:

In [14]:
columnarrays = {
    Vars.Id: 'RespID',
    Vars.ChosenAlt: 'Chosen',
    Vars.Cost1: 'CostL',
    Vars.Cost2: 'CostR',
    Vars.Time1: 'TimeL',
    Vars.Time2: 'TimeR',
}

And we create the required arrays:

In [15]:
model_arrays = make_modelarrays(df, columnarrays)

The function `make_modelarrays` creates six elements used by NP4VTT to estimate/train a model:

* `BVTT`: Contains the Boundary VTT per choice situation, computed from costs and time.
* `Choice`: A matrix of dummy variables that are equal to one if the respondent choose the fast-expensive alternative on each choice situation.
* `Accepts`: Number of times a respondent chose the fast-expensive alternative.
* `ID`: Unique identifier of each respondent.
* `NP`: Number of respondents in the dataset.
* `T`: Number of choice situations per respondent.

## Step 2: Compute descriptives

The function `compute_descriptives` provides a small overview of the dataset characteristics:

In [16]:
descriptives = compute_descriptives(model_arrays)
print(descriptives)

Balanced panel: True
No. individuals: 5832
Sets per indiv.: 9

Number of non-traders:
Fast-exp. alt.: 144
Slow-cheap alt.: 808

BVTT statistics:
Mean chosen BVTT: 10.2977
Minimum of BVTT: 0.6667
Maximum of BVTT: 113.5632


## Step 3: Configure a RV model

The RV model requires the following parameters from the user:

* `startScale`: The starting value of the scale parameter.
* `startVTT`: The starting value of the VTT parameter.
* `maxIterations:` Maximum iterations of the maximum likelihood estimation routine.


The function `ConfigRV` takes the configuration parameters of the RV and creates an object that is used by the optimisation routine:

In [17]:
config = ConfigRV(startScale=0, startVTT=1, maxIterations=10000)

Now, we create the RV model object that contains the configuration parameters and the data arrays. Then, we initialise the arguments and the initial value of the likelihood function:

In [18]:
rv = ModelRV(config, model_arrays)

## Step 4: Estimate a RV model

Once the RV is initialised, the `run` method starts the optimisation process:

In [19]:
x, se, init_ll, ll, exitflag, est_time = rv.run()

Initial F-value: 36381.91
Iter No. 1: F-value: 36028.77 / Step size: 0.0 / G-norm: 96071780265.34265
Iter No. 2: F-value: 31186.54 / Step size: 0.007812 / G-norm: 991353.857739
Iter No. 3: F-value: 30451.17 / Step size: 1 / G-norm: 838.343691
Iter No. 4: F-value: 28647.52 / Step size: 0.5 / G-norm: 5201.306707
Iter No. 5: F-value: 28633.81 / Step size: 1 / G-norm: 233.083029
Iter No. 6: F-value: 28571.24 / Step size: 1 / G-norm: 134.796506
Iter No. 7: F-value: 28565.15 / Step size: 1 / G-norm: 7.037818
Iter No. 8: F-value: 28558.26 / Step size: 1 / G-norm: 14.345597
Iter No. 9: F-value: 28558.24 / Step size: 1 / G-norm: 0.056745
Iter No. 10: F-value: 28558.24 / Step size: 1 / G-norm: 0.000256
Iter No. 11: F-value: 28558.24 / Step size: 1 / G-norm: 4e-06
Iter No. 12: F-value: 28558.24 / Step size: 0.25 / G-norm: 0.0

Local minimum found. G-norm below tolerance


The estimated model returns the following information:

* `x:` The estimated parameters for the intercept, the VTT parameter, and the scale.
* `se:` The standard error of the estimated parameters.
* `p`: Choice probabilities evaluated at each point of the VTT grid.
* `init_ll:` Value of the likelihood function evaluated in the starting values.
* `ll:` Value of the likelihood function in the optimum.
* `exitflag:` Exit flag of the optimisation routine. If `exitflag=0` the optimisation routine succeeded.
* `est_time`: The estimation time in seconds.

The following lines present the estimated results:

In [20]:
import numpy as np

# Create dataframe
results = pd.DataFrame(np.c_[x,se],columns=['Estimate','Std.Err'],index=['Scale','VTT'])

print('Estimation results:\n')
print('Initial log-likelihood: ' + str(round(init_ll,2)))
print('Final log-likelihood: ' + str(round(ll,2)))
print('Estimation time: ' + str(round(est_time,4)) + ' seconds.'+'\n')
print('Estimates:')
print(results)

Estimation results:

Initial log-likelihood: -36381.91
Final log-likelihood: -28558.24
Estimation time: 0.1249 seconds.

Estimates:
       Estimate   Std.Err
Scale  0.084008  0.001039
VTT    7.958176  0.133105


The mean VTT is directly interpretable from the `VTT` estimated parameter of the RV model.