# 200: Exampville Simulated Data

In [1]:
import larch, numpy, pandas, os
larch.__version__

'5.0.5'

Welcome to Exampville, the best simulated town in this here part of the internet!

Exampville is a simulation tool provided with Larch that can quickly simulate the
kind of data that a transportation planner might have available when building
a travel model.  We will use the first Exampville builder to generate some
simulated data.


In [2]:
import larch.exampville
from larch.roles import P,X
directory, skims, f_hh, f_pp, f_tour = larch.exampville.builder_1(
    nZones=15,
    transit_scope=(4,15),
    n_HH=2000,
    output_format='csv'
)

The builder creates a temporary directory, and produces a set of files
representing some network skim matrixes in openmatrix (OMX) format, as well
as three DT files containing data on households, persons, and tours, respectively.
We can take a quick peek at what is inside each file:


In [3]:
skims

<larch.OMX> ⋯/T/tmpk50hcwwc/exampville.omx
 |  shape:(15, 15)
 |  data:
 |    AUTO_TIME (float64)
 |    DIST      (float64)
 |    RAIL_FARE (float64)
 |    RAIL_TIME (float64)
 |  lookup:
 |    EMPLOYMENT    (15 float64)
 |    EMP_NONRETAIL (15 float64)
 |    EMP_RETAIL    (15 float64)
 |    LAT           (15 int64)
 |    LON           (15 int64)
 |    TAZID         (15 int64)

In [4]:
f_hh.info()
f_hh.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 4 columns):
HHID       2000 non-null int64
INCOME     2000 non-null int64
HHSIZE     2000 non-null int64
HOMETAZ    2000 non-null int64
dtypes: int64(4)
memory usage: 62.6 KB


Unnamed: 0,HHID,INCOME,HHSIZE,HOMETAZ
0,50000,133000,3,6
1,50001,48000,1,8
2,50002,72000,1,4
3,50003,103000,3,5
4,50004,77000,1,10


In [5]:
f_pp.info()
f_pp.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3462 entries, 0 to 3461
Data columns (total 7 columns):
PERSONID        3462 non-null int64
HHID            3462 non-null int64
AGE             3462 non-null int64
WORKS           3462 non-null int64
N_WORKTOURS     3462 non-null int64
N_OTHERTOURS    3462 non-null int64
N_TOTALTOURS    3462 non-null int64
dtypes: int64(7)
memory usage: 189.4 KB


Unnamed: 0,PERSONID,HHID,AGE,WORKS,N_WORKTOURS,N_OTHERTOURS,N_TOTALTOURS
0,60000,50000,82,0,0,0,0
1,60001,50000,26,0,0,0,0
2,60002,50000,19,1,1,1,2
3,60003,50001,46,1,1,0,1
4,60004,50002,32,1,1,1,2


In [6]:
f_tour.info()
f_tour.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6123 entries, 0 to 6122
Data columns (total 6 columns):
TOURID      6123 non-null int64
HHID        6123 non-null int64
PERSONID    6123 non-null int64
DTAZ        6123 non-null int64
TOURMODE    6123 non-null int64
TOURPURP    6123 non-null int64
dtypes: int64(6)
memory usage: 287.1 KB


Unnamed: 0,TOURID,HHID,PERSONID,DTAZ,TOURMODE,TOURPURP
0,0,50000,60002,6,1,1
1,1,50000,60002,5,1,2
2,2,50001,60003,4,1,1
3,3,50002,60004,8,2,1
4,4,50002,60004,8,1,2


The Exampville data output contains a set of files similar to what we might
find for a real travel survey: network skims, and tables of households, persons,
and tours.  We'll need to merge these tables to create a composite dataset
for mode choice model estimation.

We can merge data from other tables using the usual pandas syntax for merging.


In [7]:
f_tour = f_tour.merge(f_hh, on='HHID')
f_tour = f_tour.merge(f_pp, on=('HHID', 'PERSONID'))

Our zone numbering system starts with zone 1, as is common for many TAZ numbering 
systems seen in practice.  But, for looking up data in the skims matrix, we'll need
to use zero-based numbering that is standard in Python.  So we'll create two new 
TAZ-index columns to assist this process.

In [8]:
f_tour["HOMETAZi"] = f_tour["HOMETAZ"] - 1
f_tour["DTAZi"] = f_tour["DTAZ"] - 1

Let's define some variables for clarity.  We could in theory call the complete formula
"DIST / 2.5 * 60 * (DIST<=3)" every time we want to refer to the walk time (given as
distance in miles, divided by 2.5 miles per hour, times 60 minutes per hour, but only up to 3 miles).
But it's much easier, and potentially faster, to pre-compute the walk time and use it directly.


In [9]:
skims.change_mode('a')
skims["WALKTIME"] = skims.DIST[:] / 2.5 * 60 * (skims.DIST[:] <= 3)
skims["BIKETIME"] = skims.DIST[:] / 12  * 60 * (skims.DIST[:] <= 15)
skims["CARCOST"] = skims.DIST[:] * 0.20
skims.change_mode('r')

<larch.OMX> ⋯/T/tmpk50hcwwc/exampville.omx
 |  shape:(15, 15)
 |  data:
 |    AUTO_TIME (float64)
 |    BIKETIME  (float64)
 |    CARCOST   (float64)
 |    DIST      (float64)
 |    RAIL_FARE (float64)
 |    RAIL_TIME (float64)
 |    WALKTIME  (float64)
 |  lookup:
 |    EMPLOYMENT    (15 float64)
 |    EMP_NONRETAIL (15 float64)
 |    EMP_RETAIL    (15 float64)
 |    LAT           (15 int64)
 |    LON           (15 int64)
 |    TAZID         (15 int64)

In [10]:
DA = 1
SR = 2
Walk = 3
Bike = 4
Transit = 5