## 202: Exampville Mode Choice Logsums

Welcome to Exampville, the best simulated town in this here part of the internet!

Exampville is a demonstration provided with Larch that walks through some of the 
data and tools that a transportation planner might use when building a travel model. 

In [1]:
import larch, numpy, pandas, os
from larch import P, X

In [2]:
larch.__version__

'5.4.0'

In this example notebook, we will walk through the creation of logsums from
an existing tour mode choice model.  First, let's load the data files from
our example.

In [3]:
hh, pp, tour, skims = larch.example(200, ['hh', 'pp', 'tour', 'skims'])

We'll also load the saved model from the mode choice estimation.

In [4]:
exampville_mode_choice_file = larch.example(201, output_file='/tmp/exampville_mode_choice.html')
m = larch.read_metadata(exampville_mode_choice_file)

We'll replicate the pre-processing used in the mode choice estimation,
to merge the household and person characteristics into the tours data,
add the index values for the home TAZ's, filter to include only 
work tours, and merge with the level of service skims.  (If this 
pre-processing was computationally expensive, it would probably have
been better to save the results to disk and reload them as needed,
but for this model these commands will run almost instantaneously.)

In [5]:
raw = tour.merge(hh, on='HHID').merge(pp, on=('HHID', 'PERSONID'))
raw["HOMETAZi"] = raw["HOMETAZ"] - 1
raw["DTAZi"] = raw["DTAZ"] - 1
raw = raw[raw.TOURPURP == 1]
raw = raw.join(
    skims.get_rc_dataframe(
        raw.HOMETAZi, raw.DTAZi,
    )
)

Then we bundle the raw data into the `larch.DataFrames` structure,
as we did for estimation, and attach this structure to the model
as its `dataservice`.

In [6]:
# For clarity, we can define numbers as names for modes
DA = 1
SR = 2
Walk = 3
Bike = 4
Transit = 5

In [7]:
dfs = larch.DataFrames(
    co=raw, 
    alt_codes=[DA,SR,Walk,Bike,Transit], 
    alt_names=['DA','SR','Walk','Bike','Transit'],
    ch_name='TOURMODE',
)

m.dataservice = dfs

We'll also initialize a DataFrame to hold the computed logsums.
This data will have one row for each case in our source data,
and a column for each possible destination zone.

In [8]:
logsums = pandas.DataFrame(
    data=0.0,
    index=raw.index, 
    columns=skims.TAZ_ID
)

The logsums from a Model can be computed using the `Model.logsums` method.
However, if we want the logsums for each possible destination, we'll need
to replace the part of our data that depends on the destination zone, 
writing in the appropriate values for each.  We can simply iterate over the
zones, which is a little bit slow but easy to code.  This isn't a big 
deal, as generating the logsums only needs to be done once after the 
mode choice model is finalized.

In [9]:
for destination_i, dtaz in enumerate(logsums.columns):
    # Get the LOS data for this destination
    new_data = skims.get_rc_dataframe(
        raw.HOMETAZi, destination_i,
    )
    # Write this data into the model's dataservice.
    dfs.data_co[new_data.columns] = new_data
    # Loading this data runs the pre-processing on
    # the dataservice, to create the arrays needed
    # for computation.
    m.load_data()
    # Lastly, compute the logsums and save them
    # to the new DataFrame.
    logsums[dtaz] = m.logsums()


In [10]:
logsums.head()

TAZ_ID,1,2,3,4,5,6,7,8,9,10,...,31,32,33,34,35,36,37,38,39,40
_caseid_,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,-2.935611,-2.660436,-1.745654,-2.229281,-2.403117,-1.624193,-2.689783,-2.704533,-3.25863,-2.718634,...,-1.70228,-1.717749,-2.340898,-2.760539,-1.02927,-2.249755,-2.048357,-2.160251,-2.286809,-1.959727
1,-2.935611,-2.660436,-1.745654,-2.229281,-2.403117,-1.624193,-2.689783,-2.704533,-3.25863,-2.718634,...,-1.70228,-1.717749,-2.340898,-2.760539,-1.02927,-2.249755,-2.048357,-2.160251,-2.286809,-1.959727
3,-2.935611,-2.660436,-1.745654,-2.229281,-2.403117,-1.624193,-2.689783,-2.704533,-3.25863,-2.718634,...,-1.70228,-1.717749,-2.340898,-2.760539,-1.02927,-2.249755,-2.048357,-2.160251,-2.286809,-1.959727
7,-3.046071,-2.769949,-1.823372,-2.320338,-2.494063,-1.705313,-2.785841,-2.806107,-3.3655,-2.830964,...,-1.784186,-1.797482,-2.430807,-2.863272,-1.100473,-2.343704,-2.133276,-2.247024,-2.380627,-2.047688
10,-3.051611,-2.775413,-1.827178,-2.324782,-2.498425,-1.709122,-2.790498,-2.811107,-3.370804,-2.836603,...,-1.788032,-1.801228,-2.435109,-2.868261,-1.103813,-2.348381,-2.137301,-2.251151,-2.385124,-2.051987


Then we can persist the logsums dataframe to disk, for use in the next
example, where we will estimate a destination choice model.

In [11]:
logsums.to_pickle('/tmp/logsums.pkl.gz')