# Exampville Notebook
This Jupyter Notebook demostrates a part of a typical workflow for a transportation model.  


### Simulating the Travel Survey

We begin by generating a simulated travel survey for the people of Exampville.  The latest version of Larch includes a helpful simulation tool, which allows us to scale the size of Examplville to the amount of time we want to spend processing the example (and the power of our computer).  Although we can expect better model estimates when the sample size is larger, we're mostly concerned with the technical operation of Larch and not with getting the best model fit we can, so we'll simulate only a small amount of data.

In [1]:
import larch, numpy, pandas, os
from larch.roles import P,X
from larch.examples.exampville import builder
numpy.set_printoptions(linewidth=160)

In [2]:
nZones = 15
transit_scope = (4,15)
n_HH = 2000

directory, omx, f_hh, f_pp, f_tour = builder(
    nZones=nZones, 
    transit_scope=transit_scope, 
    n_HH=n_HH,
)

f_hh.export_idco(os.path.join(directory,'exampville_hh.csv'))
f_pp.export_idco(os.path.join(directory,'exampville_person.csv'))
f_tour.export_idco(os.path.join(directory,'exampville_tours.csv'))

In [3]:
omx

<larch.OMX> /var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmpwxyznsyk/exampville.omx
 |  shape:(15, 15)
 |  data:
 |    AUTO_TIME (float64)
 |    DIST      (float64)
 |    RAIL_FARE (float64)
 |    RAIL_TIME (float64)
 |  lookup:
 |    EMPLOYMENT    (15 float64)
 |    EMP_NONRETAIL (15 float64)
 |    EMP_RETAIL    (15 float64)
 |    LAT           (15 int64)
 |    LON           (15 int64)
 |    TAZID         (15 int64)

In [4]:
f_hh

<larch.DT> …/T/tmpwxyznsyk/exampville_hh.h5
 |  > file is opened for read/write <
 |  nCases: 2000
 |  nAlts: <missing>
 |  idco:
 |    HHSIZE  	int64  
 |    HOMETAZ 	int64  
 |    INCOME  	int64  

In [5]:
f_pp

<larch.DT> …/T/tmpwxyznsyk/exampville_person.h5
 |  > file is opened for read/write <
 |  nCases: 3462
 |  nAlts: <missing>
 |  idco:
 |    AGE         	int64  
 |    HHID        	int64  
 |    N_OTHERTOURS	int64  
 |    N_TOTALTOURS	int64  
 |    N_WORKTOURS 	int64  
 |    WORKS       	int64  

In [6]:
f_tour

<larch.DT> …/T/tmpwxyznsyk/exampville_tours.h5
 |  > file is opened for read/write <
 |  nCases: 6123
 |  nAlts: <missing>
 |  idco:
 |    DTAZ    	int64  
 |    HHID    	int64  
 |    PERSONID	int64  
 |    TOURMODE	int64  
 |    TOURPURP	int64  

### Compiling Mode Choice Data

The Exampville data output contains a set of files similar to what we might find for a real travel survey: network skims, and tables of households, persons, and tours.  We'll need to merge these tables to create a composite dataset for mode choice model estimation.

In [7]:
flog = larch.logging.flogger('ModeData')


### MODE CHOICE DATA

# Define numbers and names for modes
DA = 1
SR = 2
Walk = 3
Bike = 4
Transit = 5

d = larch.DT() 
# By omitting a filename here, we create a temporary HDF5 store.

d.set_alternatives( [DA,SR,Walk,Bike,Transit], ['DA','SR','Walk','Bike','Transit'] )

flog('merging survey datasets')
d.new_caseids( f_tour.caseids() )
d.merge_into_idco(f_tour, "caseid")
d.merge_into_idco(f_pp, "PERSONID")
d.merge_into_idco(f_hh, "HHID")

flog('merging skims')
# Create a new variables with zero-based home TAZ numbers
d.new_idco("HOMETAZi", "HOMETAZ-1", dtype=int)
d.new_idco("DTAZi", "DTAZ-1", dtype=int)

# Pull in plucked data from Matrix file
d.pluck_into_idco(omx, "HOMETAZi", "DTAZi")
# This command is new as of Larch 3.3.15
# It loads all the matrix DATA from an OMX based on OTAZ and DTAZ 
# columns that already exist in the DT

flog('prep data')
d.choice_idco = {
    DA: 'TOURMODE==1',
    SR: 'TOURMODE==2',
    Walk: 'TOURMODE==3',
    Bike: 'TOURMODE==4',
    Transit: 'TOURMODE==5',
}
# Alternately:   d.choice_idco = {i:'TOURMODE=={}'.format(i) for i in [1,2,3,4,5]}



d.avail_idco = {
    DA: '(AGE>=16)',
    SR: '1',
    Walk: 'DIST<=3',
    Bike: 'DIST<=15',
    Transit: 'RAIL_TIME>0',
}

# Let's define some variables for clarity.
d.new_idco("WALKTIME", "DIST / 2.5 * 60 * (DIST<=3)") # 2.5 mph, 60 minutes per hour, max 3 miles
d.new_idco("BIKETIME", "DIST / 12 * 60 * (DIST<=15)")  # 12 mph, 60 minutes per hour, max 15 miles
d.new_idco("CARCOST", "DIST * 0.20")  # 20 cents per mile

d.info(1)

[12:51:23 PM]root: Connected log to stream <ipykernel.iostream.OutStream object at 0x10463cef0>
[12:51:23 PM]larch: merging survey datasets
[12:51:23 PM]larch: merging skims
[12:51:23 PM]larch: prep data


Variable,dtype,Shape,Original Source
AGE,int64,"(6123,)",tmpwxyznsyk​/exampville_person.h5
AUTO_TIME,float64,"(6123,)",tmpwxyznsyk​/exampville.omx
BIKETIME,float64,"(6123,)",= DIST ​/ 12 * 60 * (DIST<=15)
CARCOST,float64,"(6123,)",= DIST * 0.20
DIST,float64,"(6123,)",tmpwxyznsyk​/exampville.omx
DTAZ,int64,"(6123,)",tmpwxyznsyk​/exampville_tours.h5
DTAZi,int64,"(6123,)",= DTAZ-1
HHID,int64,"(6123,)",tmpwxyznsyk​/exampville_tours.h5
HHSIZE,int64,"(6123,)",tmpwxyznsyk​/exampville_hh.h5
HOMETAZ,int64,"(6123,)",tmpwxyznsyk​/exampville_hh.h5


### Mode Choice Model Estimation

In [8]:
flog = larch.logging.flogger('ModeModel')

d.exclude_idco("TOURPURP != 1")
flog("nCases={}", d.nCases())

### MODE CHOICE MODEL

m = larch.Model(d)
m.title = "Exampville Work Tour Mode Choice v1"

# In order to create a shadow (alias) parameter, the source parameter must already exist,
# so we create it explicitly.
m.parameter("InVehTime")

# Then we can create the shadow, as a multiple of the original parameter.
m.shadow_parameter.NonMotorTime = P.InVehTime * 2



m.utility.co[DA] = (
    + P.InVehTime * X.AUTO_TIME
    + P.Cost * X.CARCOST # dollars per mile
)

m.utility[SR] = (
    + P.ASC_SR
    + P.InVehTime * X.AUTO_TIME
    + P.Cost * (X.CARCOST * 0.5) # dollars per mile, half share
    + P("HighInc:SR") * X("INCOME>75000")
)

m.utility[Walk] = (
    + P.ASC_Walk
    + P.NonMotorTime * X.WALKTIME 
    + P("HighInc:Walk") * X("INCOME>75000")
)

m.utility[Bike] = (
    + P.ASC_Bike
    + P.NonMotorTime * X.BIKETIME
    + P("HighInc:Bike") * X("INCOME>75000")
)

m.utility[Transit] = (
    + P.ASC_Transit
    + P.InVehTime * X.RAIL_TIME
    + P.Cost * X.RAIL_FARE
    + P("HighInc:Transit") * X("INCOME>75000")
)

Car = m.new_nest('Nest:Car', children=[DA,SR])
NonMotor = m.new_nest('Nest:NonMotor', children=[Walk,Bike])
Motor = m.new_nest('Nest:Motorized', children=[Car,Transit])

from larch.util.categorize import Categorizer

m.parameter_groups = (
    Categorizer("Level of Service",
        ".*Time.*",
        ".*Cost.*",
    ),
    Categorizer("Alternative Specific Constants",
        "ASC.*",
    ),
    Categorizer("Income",
        ".*HighInc.*",
        ".*LowInc.*",
    ),
    Categorizer("Logsum Parameters",
        "Nest.*",
    ),
)

m.maximize_loglike()


Parameter,Parameter.1,Estimated Value,Std Error,t-Stat,Null Value
,,,,,
Level of Service,Level of Service,Level of Service,Level of Service,Level of Service,Level of Service
InVehTime,InVehTime,-0.1127,0.01356,-8.31,0
NonMotorTime,NonMotorTime,-0.2255,= InVehTime * 2,= InVehTime * 2,= InVehTime * 2
Cost,Cost,-0.3407,0.1812,-1.88,0
Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants
ASC_SR,ASC_SR,-1.394,0.9541,-1.46,0
ASC_Walk,ASC_Walk,0.7318,0.3015,2.43,0
ASC_Bike,ASC_Bike,-1.667,0.2739,-6.09,0
ASC_Transit,ASC_Transit,-1.677,0.4977,-3.37,0

Statistic,Aggregate,Per Case
Number of Cases,1897.0,1897.0
Log Likelihood at Convergence,-963.84,-0.51
Log Likelihood at Null Parameters,-2592.7,-1.37
Rho Squared w.r.t. Null Parameters,0.628,0.628

0,1,2
Estimation Date,Estimation Date,"Saturday, March 11 2017, 12:51:24 PM"
Results,Results,success
Message,Message,Optimization terminated successfully. [SLSQP]
Optimization Method,Optimization Method,SLSQP
Number of Iterations,Number of Iterations,42
Running Time,Total,2.829	seconds
Running Time,setup,0:00.21
Running Time,null_likelihood,0:00
Running Time,weight choice rebalance,0:00
Running Time,weight autorescale,0:00


          messages: Optimization terminated successfully. [SLSQP]:
                  |     SLSQP:Optimization terminated successfully.
              ctol: 1.6786447329196825e-10
               fun: 963.8426415309914
  installed_memory: '16.0 GiB'
               jac: array([ -3.62717938e-04,  -2.89715275e-04,  -2.03759982e-05,  -3.39214049e-05,   1.10969699e-05,  -1.03181211e-04,  -4.70245877e-05,  -1.15846527e-06,
                  |          3.85895274e-06,  -1.88196022e-05,  -5.68615985e-06,  -4.56523512e-06,  -9.30196611e-06,   0.00000000e+00])
           loglike: -963.8426415309914
      loglike_null: -2592.698004024908
           message: 'Optimization terminated successfully. [SLSQP]'
               nit: 42
             niter: [('SLSQP', 42)]
 peak_memory_usage: '142.828125 MiB'
             stats: <larch.core.runstats, success in 0:02.83>
            status: 0
           success: True
                 x: array([-0.11274266,  0.74170258,  0.86969656,  0.84439805, -0.34073929, -1.

In [9]:
## Save results to a report
m.xhtml_report(filename=os.path.join(directory,'mode_choice_model.html'), cats='**')



## View Mode Choice Model Results
# Here we lump together a bunch of report sections into a single report that will
# display neatly in a jupyter notebook.
sections = ['params','ll',
 'nesting_tree',
 'latest','UTILITYSPEC','PROBABILITYSPEC',
 'DATA', 
 'excludedcases','NOTES','options',
 'possible_overspecification']
m.jupyter(*sections)

Parameter,Parameter.1,Estimated Value,Std Error,t-Stat,Null Value
,,,,,
Level of Service,Level of Service,Level of Service,Level of Service,Level of Service,Level of Service
InVehTime,InVehTime,-0.1127,0.01356,-8.31,0
NonMotorTime,NonMotorTime,-0.2255,= InVehTime * 2,= InVehTime * 2,= InVehTime * 2
Cost,Cost,-0.3407,0.1812,-1.88,0
Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants,Alternative Specific Constants
ASC_SR,ASC_SR,-1.394,0.9541,-1.46,0
ASC_Walk,ASC_Walk,0.7318,0.3015,2.43,0
ASC_Bike,ASC_Bike,-1.667,0.2739,-6.09,0
ASC_Transit,ASC_Transit,-1.677,0.4977,-3.37,0


Statistic,Aggregate,Per Case
Number of Cases,1897.0,1897.0
Log Likelihood at Convergence,-963.84,-0.51
Log Likelihood at Null Parameters,-2592.7,-1.37
Rho Squared w.r.t. Null Parameters,0.628,0.628


0,1,2
Estimation Date,Estimation Date,"Saturday, March 11 2017, 12:51:24 PM"
Results,Results,success
Message,Message,Optimization terminated successfully. [SLSQP]
Optimization Method,Optimization Method,SLSQP
Number of Iterations,Number of Iterations,42
Running Time,Total,2.829	seconds
Running Time,setup,0:00.21
Running Time,null_likelihood,0:00
Running Time,weight choice rebalance,0:00
Running Time,weight autorescale,0:00


Code,Alternative,Resolved Utility
1,DA,- 0.1127*AUTO_TIME - 0.3407*CARCOST
2,SR,- 1.394 - 0.1127*AUTO_TIME - 0.3407*CARCOST*(0.5) - 1.756*INCOME>75000
3,Walk,0.7318 - 0.2255*WALKTIME - 0.7958*INCOME>75000
4,Bike,- 1.667 - 0.2255*BIKETIME - 1.053*INCOME>75000
5,Transit,- 1.677 - 0.1127*RAIL_TIME - 0.3407*RAIL_FARE - 1.241*INCOME>75000
Code,Nest,Resolved Utility
6,Nest:Car,0.7417 * log( exp(Utility[DA]/0.7417) + exp(Utility[SR]/0.7417) )
7,Nest:NonMotor,0.8697 * log( exp(Utility[Walk]/0.8697) + exp(Utility[Bike]/0.8697) )
8,Nest:Motorized,0.8444 * log( exp(Utility[Transit]/0.8444) + exp(Utility[Nest:Car]/0.8444) )
0,ROOT,log( exp(Utility[Nest:Motorized]) + exp(Utility[Nest:NonMotor]) )

Code,Alternative,Formulaic Utility
1,DA,InVehTime*AUTO_TIME + Cost*CARCOST
2,SR,ASC_SR + InVehTime*AUTO_TIME + Cost*CARCOST*(0.5) + HighInc:SR*INCOME>75000
3,Walk,ASC_Walk + NonMotorTime*WALKTIME + HighInc:Walk*INCOME>75000
4,Bike,ASC_Bike + NonMotorTime*BIKETIME + HighInc:Bike*INCOME>75000
5,Transit,ASC_Transit + InVehTime*RAIL_TIME + Cost*RAIL_FARE + HighInc:Transit*INCOME>75000
Code,Nest,Formulaic Utility
6,Nest:Car,Nest:Car * log( exp(Utility[DA]/Nest:Car) + exp(Utility[SR]/Nest:Car) )
7,Nest:NonMotor,Nest:NonMotor * log( exp(Utility[Walk]/Nest:NonMotor) + exp(Utility[Bike]/Nest:NonMotor) )
8,Nest:Motorized,Nest:Motorized * log( exp(Utility[Transit]/Nest:Motorized) + exp(Utility[Nest:Car]/Nest:Motorized) )
0,ROOT,log( exp(Utility[Nest:Motorized]) + exp(Utility[Nest:NonMotor]) )


Code,Alternative,Resolved Probability
1,DA,exp(Utility[DA]/0.7417)/exp(Utility[Nest:Car]/0.7417) * exp(Utility[Nest:Car]/0.8444)/exp(Utility[Nest:Motorized]/0.8444) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])
2,SR,exp(Utility[SR]/0.7417)/exp(Utility[Nest:Car]/0.7417) * exp(Utility[Nest:Car]/0.8444)/exp(Utility[Nest:Motorized]/0.8444) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])
3,Walk,exp(Utility[Walk]/0.8697)/exp(Utility[Nest:NonMotor]/0.8697) * exp(Utility[Nest:NonMotor])/exp(Utility[ROOT])
4,Bike,exp(Utility[Bike]/0.8697)/exp(Utility[Nest:NonMotor]/0.8697) * exp(Utility[Nest:NonMotor])/exp(Utility[ROOT])
5,Transit,exp(Utility[Transit]/0.8444)/exp(Utility[Nest:Motorized]/0.8444) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])

Code,Alternative,Formulaic Probability
1,DA,exp(Utility[DA]/Nest:Car)/exp(Utility[Nest:Car]/Nest:Car) * exp(Utility[Nest:Car]/Nest:Motorized)/exp(Utility[Nest:Motorized]/Nest:Motorized) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])
2,SR,exp(Utility[SR]/Nest:Car)/exp(Utility[Nest:Car]/Nest:Car) * exp(Utility[Nest:Car]/Nest:Motorized)/exp(Utility[Nest:Motorized]/Nest:Motorized) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])
3,Walk,exp(Utility[Walk]/Nest:NonMotor)/exp(Utility[Nest:NonMotor]/Nest:NonMotor) * exp(Utility[Nest:NonMotor])/exp(Utility[ROOT])
4,Bike,exp(Utility[Bike]/Nest:NonMotor)/exp(Utility[Nest:NonMotor]/Nest:NonMotor) * exp(Utility[Nest:NonMotor])/exp(Utility[ROOT])
5,Transit,exp(Utility[Transit]/Nest:Motorized)/exp(Utility[Nest:Motorized]/Nest:Motorized) * exp(Utility[Nest:Motorized])/exp(Utility[ROOT])


Code,Alternative,# Avail,# Chosen,Availability Condition
1,DA,1897,1573,(AGE>=16)
2,SR,1897,171,1
3,Walk,662,47,DIST<=3
4,Bike,1882,28,DIST<=15
5,Transit,1245,78,RAIL_TIME>0


Unnamed: 0,Criteria,Data Source,# Cases Excluded,# Cases Remaining
0,TOURPURP != 1,idco,4226,1897


0,1
author,jpn
autocreate_parameters,True
calc_null_likelihood,True
calc_std_errors,True
enforce_bounds,True
enforce_constraints,True
enforce_network_constraints,False
force_finite_diff_grad,False
force_recalculate,False
gradient_diagnostic,0


In [10]:
m.jupyter.choice_distributions

Variable,Value[s],DA,SR,Walk,Bike,Transit,Count
AUTO_TIME,> 8.6,83.31%,10.81%,0.00%,0.16%,5.72%,629.0
AUTO_TIME,4.1 to 8.6,86.88%,9.72%,0.00%,0.77%,2.62%,648.0
AUTO_TIME,< 4.1,78.39%,6.45%,7.58%,3.55%,4.03%,620.0
CARCOST,> 1.3,84.31%,11.89%,0.00%,0.00%,3.80%,631.0
CARCOST,0.39 to 1.3,86.80%,8.50%,0.00%,0.61%,4.10%,659.0
CARCOST,< 0.39,77.27%,6.59%,7.74%,3.95%,4.45%,607.0
CARCOST*(0.5),> 0.64,84.31%,11.89%,0.00%,0.00%,3.80%,631.0
CARCOST*(0.5),0.2 to 0.64,86.80%,8.50%,0.00%,0.61%,4.10%,659.0
CARCOST*(0.5),< 0.2,77.27%,6.59%,7.74%,3.95%,4.45%,607.0
INCOME>75000,Yes,94.20%,1.73%,1.60%,0.74%,1.73%,810.0


In [11]:
m.jupyter.datasummary

Data,Mean,Std.Dev.,Minimum,Maximum,Zeros,Mean(NonZero),Positives,Distribution
AUTO_TIME,7.7198,5.637,2.0,49.678,0,7.7198,1897,
CARCOST,0.97796,0.71928,0.0040437,3.621,0,0.97796,1897,
1,1.0,0.0,1.0,1.0,0,1.0,1897,
CARCOST*(0.5),0.48898,0.35964,0.0020218,1.8105,0,0.48898,1897,
INCOME>75000,0.42699,0.49464,0.0,1.0,1087,1.0,810,
WALKTIME,10.386,17.018,0.0,71.245,1235,29.763,662,
BIKETIME,23.819,17.419,0.0,73.871,15,24.009,1882,
RAIL_TIME,4.1426,5.4032,0.0,28.821,652,6.312,1245,
RAIL_FARE,0.98445,0.71241,0.0,1.5,652,1.5,1245,

Alternative,Data,Filter,Mean,Std.Dev.,Minimum,Maximum,Mean (Nonzeros),# Zeros,# Positives,Distribution
DA,AUTO_TIME,Chosen,7.8347,5.607,2.0,49.67825818294802,7.8347,0,1573,
DA,,Unchosen,7.1621,5.7478,2.0,33.84280990076548,7.1621,0,324,
DA,CARCOST,Chosen,1.0037,0.70675,0.0040436794880651,3.62103154879924,1.0037,0,1573,
DA,,Unchosen,0.85277,0.765,0.0040436794880651,3.02732395233208,0.85277,0,324,
SR,AUTO_TIME,Chosen,8.4195,5.3067,2.0,31.692757725069843,8.4195,0,171,
SR,,Unchosen,7.6505,5.664,2.0,49.67825818294802,7.6505,0,1726,
SR,CARCOST*(0.5),Chosen,0.55414,0.3506,0.0118274425868933,1.4774221314319649,0.55414,0,171,
SR,,Unchosen,0.48253,0.35988,0.0020218397440325,1.81051577439962,0.48253,0,1726,
SR,INCOME>75000,Chosen,0.081871,0.27417,0.0,1.0,1.0,157,14,
SR,,Unchosen,0.46118,0.49849,0.0,1.0,1.0,930,796,


In [12]:
## Saving the model
m_filename = os.path.join(directory, 'exampville_workmodechoice_v1.larchmodel')
m.save(m_filename)
m_filename

'/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmpwxyznsyk/exampville_workmodechoice_v1.larchmodel'

In [13]:
m2 = larch.Model.load(m_filename)
print(m2)

Exampville Work Tour Mode Choice v1
Model Parameter Estimates
--------------------------------------------------------------------------------------------
Parameter      	InitValue   	FinalValue  	StdError    	t-Stat      	NullValue   
~~ Level of Service ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
InVehTime      	 0          	-0.112743   	 0.0135618  	-8.31328    	 0          
Cost           	 0          	-0.340739   	 0.181214   	-1.88032    	 0          
~~ Alternative Specific Constants ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ASC_SR         	 0          	-1.39432    	 0.954061   	-1.46145    	 0          
ASC_Walk       	 0          	 0.731772   	 0.301454   	 2.42747    	 0          
ASC_Bike       	 0          	-1.66675    	 0.273898   	-6.08529    	 0          
ASC_Transit    	 0          	-1.67707    	 0.497688   	-3.36972    	 0          
~~ Income ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

### Building Mode Choice Logsums

We're going to build the mode choice logsums, and store them in the DT
for use in the destination choice model.  

One important thing to recall is that we applied some exclusion filters to
the DT, to get only work tours (as opposed to all tours).  In order to 
remain consistent, new arrays we add to the DT need to have the same number
of rows as existing DT arrays (and the same number of caseids). 

In [14]:
print(d.nCases())
print(d.nAllCases())

1897
6123


The `nCases` method returns the number of active (non-excluded) cases, while the `nAllCases` method counts _all_ of the cases in the DT, whether they are active or not.

Another concern is that the non-excluded cases are not a contiguous set; they are spread around the list of all cases.  Fortunately, we have a method to extract the current active case indexes, which we will use to expand the logsums we create and push them back into the DT in the correct places.

In [15]:
screen_idx = d.get_screen_indexes()
print(screen_idx)
print(screen_idx.shape)

[   0    2    3 ..., 6112 6116 6119]
(1897,)


In [16]:
flog = larch.logging.flogger('modelogsum')

# First create a new blank array for the logsums.
d.new_idca('MODECHOICELOGSUM', numpy.zeros([d.nAllCases(), nZones], dtype=numpy.float32))

# We will also create a seperate array in memory, to cache the logsums we calculate 
# so that we can push all the calculated values into the DT on disk in one pass at the end.
# Note the in-memory version is sized for nCases, while the disk version used nAllCases.
modechoicelogsums = numpy.zeros([d.nCases(), nZones], dtype=numpy.float32)

m.setUp()
# If the model was just estimated, this is unnecessary; otherwise this is necessary
# to allocate the needed arrays for storing data and results.

# Loop over all TAZ indexes based on the number of zones
for dtazi in range(nZones):
    # First we pull in replacement data from the skims,
    # replacing the data for the actually chosen destination with alternative
    # data from the proposed destination
    d.pluck_into_idco(omx, "HOMETAZi", dtazi, overwrite=True)
    
    # We need to refresh these derived variables too, as they are derived
    # from other data columns we just overloaded...
    d.new_idco("WALKTIME", "DIST / 2.5 * 60 * (DIST<=3)", overwrite=True) # 2.5 mph, 60 minutes per hour, max 3 miles
    d.new_idco("BIKETIME", "DIST / 12 * 60 * (DIST<=15)", overwrite=True)  # 12 mph, 60 minutes per hour, max 15 miles
    d.new_idco("CARCOST", "DIST * 0.20", overwrite=True)  # 20 cents per mile
    
    # Then we will load the relevant data into the Model
    m.provision()
    
    # Then we calculate and extract the logsums using the model and store
    # them in the correct indexes of the relevant column in the output array
    modechoicelogsums[:,dtazi] = m.logsums(hardway=True)
    
d.idca.MODECHOICELOGSUM[screen_idx, :] = modechoicelogsums



[12:51:50 PM]larch.Model: Provisioning UtilityCO data...
[12:51:50 PM]larch.Model: Provisioning Avail data...
[12:51:50 PM]larch.Model: Provisioning Choice data...
[12:51:50 PM]larch.Model: Provisioning Weight data...
[12:51:51 PM]larch.Model: Provisioning UtilityCO data...
[12:51:51 PM]larch.Model: Provisioning Avail data...
[12:51:51 PM]larch.Model: Provisioning Choice data...
[12:51:51 PM]larch.Model: Provisioning Weight data...
[12:51:51 PM]larch.Model: Provisioning UtilityCO data...
[12:51:51 PM]larch.Model: Provisioning Avail data...
[12:51:51 PM]larch.Model: Provisioning Choice data...
[12:51:51 PM]larch.Model: Provisioning Weight data...
[12:51:51 PM]larch.Model: Provisioning UtilityCO data...
[12:51:51 PM]larch.Model: Provisioning Avail data...
[12:51:51 PM]larch.Model: Provisioning Choice data...
[12:51:51 PM]larch.Model: Provisioning Weight data...
[12:51:52 PM]larch.Model: Provisioning UtilityCO data...
[12:51:52 PM]larch.Model: Provisioning Avail data...
[12:51:52 PM]larch

### Destination Choice Data

In [17]:
d.set_alternatives( numpy.arange(1,nZones+1) )

# Due to a limitation of HDF5, we are not able to 
# give a choice_idco dictionary for models with lots of alternatives,
# as there is a hard limit on the quantity of data in the header for
# each array.  This limitation will probably be circumvented in a
# future version of Larch, but for now we must manually create and 
# store an entire idca array.
ch = numpy.zeros([d.nAllCases(), nZones ], dtype=numpy.float32)
dtaz = d.idco.DTAZ[:]
ch[range(dtaz.shape[0]), dtaz-1] = 1
d.new_idca_from_array('_choice_', ch, overwrite=True)

# We're also gonna overwrite the _avail_ array, so that
# all destinations are available for all tours.
d.new_idca_from_array('_avail_', numpy.ones([d.nAllCases(), nZones ], dtype=bool), overwrite=True)

In [18]:
# First we purge all of the OMX columns we plucked in the past
for i in omx.data:
    d.delete_data(i.name)
for i in omx.lookup:
    d.delete_data(i.name)


# Then we can attached the OMX as an external data souce, linking in
# entire rows from the skim matrixes, plus the lookups which are OTAZ-generic
d.idca.add_external_omx(omx, rowindexnode=d.idco.HOMETAZi, n_alts=nZones)

d.info(1)

Variable,dtype,Source File,Shape,Original Source
AGE,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_person.h5
BIKETIME,float64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",= DIST ​/ 12 * 60 * (DIST<=15)
CARCOST,float64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",= DIST * 0.20
DTAZ,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_tours.h5
DTAZi,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",= DTAZ-1
HHID,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_tours.h5
HHSIZE,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_hh.h5
HOMETAZ,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_hh.h5
HOMETAZi,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",= HOMETAZ-1
INCOME,int64,/var/folders/gg/1fy5jt8j3m59tht8b3ft59l40000gn/T/tmp94xaxc_1.h5f,"(6123,)",tmpwxyznsyk​/exampville_hh.h5


### Destination Choice Model Estimation

In [19]:
### DEST CHOICE MODEL

m = larch.Model(d)

m.title = "Exampville Work Tour Destination Choice"

from larch.util.piecewise import log_and_linear_function
dist_func = log_and_linear_function('DIST', baseparam='Distance')

m.utility.ca = (
    + P.ModeChoiceLogSum * X.MODECHOICELOGSUM
    + dist_func
#    + P.Distance * X.DIST
#    + P.Log1pDist * X("log1p(DIST)")
)

m.quantity = (
    + P("EmpRetail_HighInc") * X('EMP_RETAIL * (INCOME>50000)')
    + P("EmpNonRetail_HighInc") * X('EMP_NONRETAIL') * X("INCOME>50000")
    + P("EmpRetail_LowInc") * X('EMP_RETAIL') * X("INCOME<=50000")
    + P("EmpNonRetail_LowInc") * X('EMP_NONRETAIL') * X("INCOME<=50000")

)

m.quantity_scale = P.Theta
m.parameter.Theta(value=0.5, min=0.001, max=1.0, null_value=1.0)

m.parameter.EmpRetail_HighInc(holdfast=1, value=0)
m.parameter.EmpRetail_LowInc(holdfast=1, value=0)

# m.computed_factor_figure_with_derivative(
#             dist_func.evaluator1d('U'),
#             max_x=20,
#             header="Utility by Distance",
#             xaxis_label="Freeflow Highway Distance (miles)",
#             yaxis_label="Utility",
#             short_header="U by Distance",
#         )

m.maximize_loglike()
m.xhtml_report(filename=os.path.join(directory,'destination_choice_model.html'), cats='**')

Parameter,Parameter.1,Estimated Value,Std Error,t-Stat,Null Value
ModeChoiceLogSum,ModeChoiceLogSum,1.053,0.08195,12.85,0
Distance,Distance,-0.000936,0.02497,-0.04,0
logDistanceP1,logDistanceP1,-0.05208,0.1165,-0.45,0
Theta,Theta,0.8115,0.04343,-4.34,1
EmpRetail_HighInc,EmpRetail_HighInc,0.0,fixed value,fixed value,0
EmpRetail_LowInc,EmpRetail_LowInc,0.0,fixed value,fixed value,0
EmpNonRetail_HighInc,EmpNonRetail_HighInc,0.464,0.2182,2.13,0
EmpNonRetail_LowInc,EmpNonRetail_LowInc,-0.8178,0.3048,-2.68,0

Statistic,Aggregate,Per Case
Number of Cases,1897.0,1897.0
Log Likelihood at Convergence,-3670.6,-1.93
Log Likelihood at Null Parameters,-4493.08,-2.37
Rho Squared w.r.t. Null Parameters,0.183,0.183

0,1,2
Estimation Date,Estimation Date,"Saturday, March 11 2017, 12:51:56 PM"
Results,Results,success
Message,Message,Optimization terminated successfully. [SLSQP]
Optimization Method,Optimization Method,SLSQP
Number of Iterations,Number of Iterations,17
Running Time,Total,1.140	seconds
Running Time,setup,0:00.29
Running Time,null_likelihood,0:00
Running Time,weight choice rebalance,0:00
Running Time,weight autorescale,0:00


In [20]:
m.svg_computed_factor_figure_with_derivative(dist_func)

In [21]:
m.jupyter('params','ll','latest','UTILITYSPEC','PROBABILITYSPEC','DATA', 'excludedcases', 'NOTES')

Parameter,Parameter.1,Estimated Value,Std Error,t-Stat,Null Value
ModeChoiceLogSum,ModeChoiceLogSum,1.053,0.08195,12.85,0
Distance,Distance,-0.000936,0.02497,-0.04,0
logDistanceP1,logDistanceP1,-0.05208,0.1165,-0.45,0
Theta,Theta,0.8115,0.04343,-4.34,1
EmpRetail_HighInc,EmpRetail_HighInc,0.0,fixed value,fixed value,0
EmpRetail_LowInc,EmpRetail_LowInc,0.0,fixed value,fixed value,0
EmpNonRetail_HighInc,EmpNonRetail_HighInc,0.464,0.2182,2.13,0
EmpNonRetail_LowInc,EmpNonRetail_LowInc,-0.8178,0.3048,-2.68,0


Statistic,Aggregate,Per Case
Number of Cases,1897.0,1897.0
Log Likelihood at Convergence,-3670.6,-1.93
Log Likelihood at Null Parameters,-4493.08,-2.37
Rho Squared w.r.t. Null Parameters,0.183,0.183


0,1,2
Estimation Date,Estimation Date,"Saturday, March 11 2017, 12:51:56 PM"
Results,Results,success
Message,Message,Optimization terminated successfully. [SLSQP]
Optimization Method,Optimization Method,SLSQP
Number of Iterations,Number of Iterations,17
Running Time,Total,1.140	seconds
Running Time,setup,0:00.29
Running Time,null_likelihood,0:00
Running Time,weight choice rebalance,0:00
Running Time,weight autorescale,0:00


Code,Alternative,Resolved Utility
*,all elemental alternatives,1.053*MODECHOICELOGSUM - 0.000936*DIST - 0.05208*log1p(DIST) + 0.8115 * log(  1*EMP_RETAIL * (INCOME>50000)  + 1.59*EMP_NONRETAIL*(INCOME>50000)  + 1*(EMP_RETAIL*(INCOME<=50000))  + 0.4414*(EMP_NONRETAIL*(INCOME<=50000)) )

Code,Alternative,Formulaic Utility
*,all elemental alternatives,ModeChoiceLogSum*MODECHOICELOGSUM + Distance*DIST + logDistanceP1*log1p(DIST) + Theta * log(  exp(EmpRetail_HighInc)*EMP_RETAIL * (INCOME>50000)  + exp(EmpNonRetail_HighInc)*EMP_NONRETAIL*(INCOME>50000)  + exp(EmpRetail_LowInc)*(EMP_RETAIL*(INCOME<=50000))  + exp(EmpNonRetail_LowInc)*(EMP_NONRETAIL*(INCOME<=50000)) )


Code,Alternative,Resolved Probability
1,a1,exp(Utility[a1])/exp(Utility[ROOT])
2,a2,exp(Utility[a2])/exp(Utility[ROOT])
3,a3,exp(Utility[a3])/exp(Utility[ROOT])
4,a4,exp(Utility[a4])/exp(Utility[ROOT])
5,a5,exp(Utility[a5])/exp(Utility[ROOT])
6,a6,exp(Utility[a6])/exp(Utility[ROOT])
7,a7,exp(Utility[a7])/exp(Utility[ROOT])
8,a8,exp(Utility[a8])/exp(Utility[ROOT])
9,a9,exp(Utility[a9])/exp(Utility[ROOT])
10,a10,exp(Utility[a10])/exp(Utility[ROOT])

Code,Alternative,Formulaic Probability
1,a1,exp(Utility[a1])/exp(Utility[ROOT])
2,a2,exp(Utility[a2])/exp(Utility[ROOT])
3,a3,exp(Utility[a3])/exp(Utility[ROOT])
4,a4,exp(Utility[a4])/exp(Utility[ROOT])
5,a5,exp(Utility[a5])/exp(Utility[ROOT])
6,a6,exp(Utility[a6])/exp(Utility[ROOT])
7,a7,exp(Utility[a7])/exp(Utility[ROOT])
8,a8,exp(Utility[a8])/exp(Utility[ROOT])
9,a9,exp(Utility[a9])/exp(Utility[ROOT])
10,a10,exp(Utility[a10])/exp(Utility[ROOT])


Code,Alternative,# Avail,# Chosen,Availability Condition
1,a1,1897,10,
2,a2,1897,24,
3,a3,1897,42,
4,a4,1897,69,
5,a5,1897,183,
6,a6,1897,256,
7,a7,1897,248,
8,a8,1897,305,
9,a9,1897,259,
10,a10,1897,249,


Unnamed: 0,Criteria,Data Source,# Cases Excluded,# Cases Remaining
0,TOURPURP != 1,idco,4226,1897
