## Conventional Modes - Choice Model

Variables: 
    Travel time: cartime, transittime, rdtime, walktime, biketime
    Travel cost: carfeetotal, transitcost, rdcost, walktime, biketime

Utility functions:

    ASC_CAR + B_CARTIME*cartime + B_COST*carfee_total
    
    ASC_TRANSIT + B_TRANSITTIME*transittime + B_COST*transitcost
    
    ASC_RH + B_RHTIME*rdtime + B_COST*rdcost
    
    ASC_WALK + B_WALKTIME*walktime
    
    ASC_BIKE + B_BIKETIME*biketime

### Importing Packages and Data

In [1]:
import pandas as pd
import matplotlib.pyplot as plot
import numpy as np
import biogeme as biogeme
import biogeme.distributions as dist
import biogeme.database as db
import biogeme.biogeme as bio
from biogeme import models
import biogeme.messaging as msg
#from biogeme.expressions import Beta
from biogeme.expressions import (
    Beta,
    DefineVariable,
    bioDraws,
    PanelLikelihoodTrajectory,
    MonteCarlo,
    log,
    Derive
)
import math
from datetime import datetime

In [3]:
# conventional sp data alone
con_sp = pd.read_csv('Data/ConventionalOnly.CSV')
con_sp
df = pd.read_csv("Data/ConventionalOnly.csv")

In [27]:
database = db.Database('con_sp',con_sp)
# They are organized as panel data. The variable who identifies each individual.
#database.panel("who") # remember to sort data by individual
globals().update(database.variables)

### Data Visualizations

In [68]:
# database.getSampleSize()

df2 = df.groupby(by='choice')
print(df2)



<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f98222b7d30>


## Model Specification
### Defining the Parameters to be Estimated

Notes about the function Beta
- takes five arguments
- first argument = name of parameter, advised to use same name var name
- second argument = starting value of parameter estimate (usually 0)
- third argument = lower bound on parameter ('None' if unbounded)
- fourth argument = upper bound on parameter ('None' if unbounded)
- fifth argument = {0, 1} 0 if want to estimate parameter; 1 if keep fixed to starting value

Questions
- can you fix the starting value of a parameter to the value of another parameter?

In [34]:
ASC_CAR = Beta('ASC_CAR', 0, None, None, 1)
ASC_TRANSIT = Beta('ASC_Transit', 0, None, None, 0)
ASC_RH = Beta('ASC_RH', 0, None, None, 0)
ASC_WALK = Beta('ASC_WALK', 0, None, None, 0)
ASC_BIKE = Beta('ASC_Bike', 0, None, None, 0)

B_CARTIME = Beta('B_CARTIME', 0, None, None, 0)
B_TRANSITTIME = Beta('B_TRANSITTIME', 0, None, None, 0)
B_RHTIME = Beta('B_RHTIME', 0, None, None, 0)
B_WALKTIME = Beta('B_WALKTIME', 0, None, None, 0)
B_BIKETIME = Beta('B_BIKETIME', 0, None, None, 0)

B_COST = Beta('B_COST', 0, None, None, 0)
B_COST2 = Beta('B_COST2', 0, None, None, 0)
B_COST3 = Beta('B_COST3', 0, None, None, 0)

B_AGE_CAR = Beta('B_AGE_CAR', 0, None, None, 0)
B_AGE_TRANSIT = Beta('B_AGE_TRANSIT', 0, None, None, 0)
B_AGE_RH = Beta('B_AGE_RH', 0, None, None, 0)
B_AGE_WALK = Beta('B_AGE_WALK', 0, None, None, 0)
B_AGE_BIKE = Beta('B_AGE_BIKE', 0, None, None, 0)

B_KIDS_CAR = Beta('B_KIDS_CAR', 0, None, None, 0)
B_KIDS_TRANSIT = Beta('B_KIDS_TRANSIT', 0, None, None, 0)
B_KIDS_RH = Beta('B_KIDS_RH', 0, None, None, 0)
B_KIDS_WALK = Beta('B_KIDS_WALK', 0, None, None, 0)
B_KIDS_BIKE = Beta('B_KIDS_Bike', 0, None, None, 0)

### Defining the Utility Functions

In [35]:
V1 = ASC_CAR + B_CARTIME*cartime + B_COST*carfeetotal + B_AGE_CAR*age + B_KIDS_CAR*child
V2 = ASC_TRANSIT + B_TRANSITTIME*transittime + B_COST*transitcost + B_AGE_TRANSIT*age + B_KIDS_TRANSIT*child
V3 = ASC_RH + B_RHTIME*rdtime + B_COST*rdcost + B_AGE_RH*age + B_KIDS_RH*child
V4 = ASC_WALK + B_WALKTIME*walktime + B_AGE_WALK*age + B_KIDS_WALK*child
V5 = ASC_BIKE + B_BIKETIME*biketime + B_AGE_BIKE*age + B_KIDS_BIKE*child

2/14 - notes
ways of modeling heterogeneity:
1. age (as age increases, utility of ridehailing decreases)
2. trip purpose
3. gender?
4. kids?

### Associating the Utility Functions with the Numbering of the Alternatives

In [36]:
V = {1: V1, 2: V2, 3: V3, 4: V4, 5: V5}

### Associating the Availability Conditions with the Alternatives

In [37]:
av = {1: car_av, 2: transit_av, 3: rd_av, 4: walk_av, 5: bike_av}

### Log Logit Model

In [38]:
logprob = models.loglogit(V, av, choice)

In [39]:
biogeme = bio.BIOGEME(database, logprob)
biogeme.modelName = '01conv_sp_alone'

biogeme.calculateNullLoglikelihood(av)
model_results = biogeme.estimate()
pandasResults = model_results.getEstimatedParameters()
print(pandasResults)

                   Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC_Bike       -0.223636      0.165803    -1.348805  1.773995e-01
ASC_RH         -2.022559      0.237190    -8.527174  0.000000e+00
ASC_Transit    -1.412554      0.148706    -9.498969  0.000000e+00
ASC_WALK       -2.144009      0.289474    -7.406579  1.296740e-13
B_AGE_BIKE     -0.004126      0.003214    -1.283692  1.992499e-01
B_AGE_CAR       0.005360      0.002286     2.345104  1.902178e-02
B_AGE_RH       -0.013268      0.004787    -2.771355  5.582358e-03
B_AGE_TRANSIT   0.001712      0.002935     0.583256  5.597212e-01
B_AGE_WALK      0.010321      0.004269     2.417878  1.561133e-02
B_BIKETIME     -0.012239      0.002037    -6.008350  1.874209e-09
B_CARTIME      -0.001868      0.003144    -0.594319  5.522990e-01
B_COST         -0.053466      0.006595    -8.107270  4.440892e-16
B_KIDS_Bike     0.078405      0.031887     2.458870  1.393749e-02
B_KIDS_CAR     -0.084029      0.027567    -3.048225  2.301977e-03
B_KIDS_RH 

In [None]:
## t-test is the significance of the effect of the predictor variable
## coefficient is the effect of the predictor variable

### Logit Model

In [17]:
prob = (models.logit(V, av, choice))
logprob2 = ln(prob)
# what are the probabilities? b/w 0-1
# try with exp(v)

In [18]:
biogeme = bio.BIOGEME(database, prob)
biogeme.modelName = 'logit_model_test'

biogeme.calculateNullLoglikelihood(av)
model_results = biogeme.estimate()
pandasResults = model_results.getEstimatedParameters()
print(pandasResults)

                    Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC_Bike       181.932874     31.529783     5.770191  7.918171e-09
ASC_RH         -10.331573      3.844559    -2.687323  7.202726e-03
ASC_Transit     35.333914      8.745035     4.040454  5.334777e-05
ASC_WALK       214.479276     37.516093     5.716994  1.084250e-08
B_BIKETIME       2.182438      0.412444     5.291473  1.213351e-07
B_CARTIME       34.662214      6.107442     5.675406  1.383599e-08
B_COST        -144.383835     25.490186    -5.664291  1.476336e-08
B_RHTIME         1.657408      0.890772     1.860641  6.279489e-02
B_TRANSITTIME   -1.868569      0.823605    -2.268768  2.328245e-02
B_WALKTIME     -11.086372      1.936895    -5.723785  1.041764e-08


### Log Logit Model - Varying Cost Parameter for Each Mode

In [7]:
V11 = ASC_CAR + B_CARTIME*cartime + B_COST*carfeetotal
V21 = ASC_TRANSIT + B_TRANSITTIME*transittime + B_COST2*transitcost
V31 = ASC_RH + B_RHTIME*rdtime + B_COST3*rdcost
V41 = ASC_WALK + B_WALKTIME*walktime
V51 = ASC_BIKE + B_BIKETIME*biketime

V61 = {1: V11, 2: V21, 3: V31, 4: V41, 5: V51}
av = {1: car_av, 2: transit_av, 3: rd_av, 4: walk_av, 5: bike_av}

logprob = models.loglogit(V61, av, choice)
biogeme = bio.BIOGEME(database, logprob)
biogeme.modelName = 'Test with Varying Cost Param'

biogeme.calculateNullLoglikelihood(av)
model_results = biogeme.estimate()
pandasResults = model_results.getEstimatedParameters()
print(pandasResults)

                  Value  Rob. Std err  Rob. t-test  Rob. p-value
ASC_Bike      -0.527579      0.080250    -6.574179  4.892242e-11
ASC_RH        -2.703210      0.088692   -30.478752  0.000000e+00
ASC_Transit   -1.182570      0.125677    -9.409572  0.000000e+00
ASC_WALK      -2.099000      0.180567   -11.624479  0.000000e+00
B_BIKETIME    -0.011184      0.001985    -5.635253  1.748016e-08
B_CARTIME      0.001348      0.003082     0.437448  6.617867e-01
B_COST        -0.068237      0.006463   -10.558643  0.000000e+00
B_COST2       -0.212781      0.046355    -4.590256  4.427030e-06
B_COST3       -0.008367      0.009069    -0.922573  3.562298e-01
B_RHTIME      -0.000249      0.006060    -0.041094  9.672210e-01
B_TRANSITTIME -0.003049      0.002105    -1.448298  1.475338e-01
B_WALKTIME    -0.004852      0.001690    -2.870906  4.092976e-03


### Mixed Logit Model

In [32]:
omega = RandomVariable('omega')
density = dist.normalpdf(omega)

MU = Beta('MU',1,0.1,10,0)

# Define the distribution of the random component
# LogNormal distribution with mean MU and standard deviation 1
LOGIT_RANDOM_DISTRIBUTION = dist.LogNormal(MU, 1)

# Define the probability of choosing each alternative
prob = bioLogit(V, LOGIT_RANDOM_DISTRIBUTION)
biogeme = bio.BIOGEME(database, prob)
biogeme.modelName = 'Test2'

NameError: name 'RandomVariable' is not defined

## Resources

https://www.youtube.com/watch?v=OiM94B8WayA

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=dfcb10a7238ddb4895f52ce996ccfba5979fcea5