In [6]:
import os
import pickle
import numpy as np
import pandas as pd
import sklearn

### Download models

Download this folder: https://polybox.ethz.ch/index.php/s/U6Ge2Sb49rnRzV6 and copy the two files `xgb.p` and `xgb_simple.p` into the `trained_models` folder (=the directory of this notebook)

### Models can be loaded as a pickle file

In [54]:
print("Available models:")
print([m for m in os.listdir(".") if m[-2:]==".p"])

Available models:
['xgb_simple.p', 'xgb.p']


#### Load model

In [55]:
with open("xgb.p", "rb") as infile:
    model = pickle.load(infile)

## Model inputs

#### Print the required input columns

In [68]:
model.feat_columns

Index(['feat_age', 'feat_sex', 'feat_caraccess', 'feat_employed',
       'feat_distance', 'feat_purpose_destination_home',
       'feat_purpose_destination_leisure', 'feat_purpose_destination_work',
       'feat_purpose_destination_shopping',
       'feat_purpose_destination_education', 'feat_purpose_origin_home',
       'feat_purpose_origin_leisure', 'feat_purpose_origin_work',
       'feat_purpose_origin_shopping', 'feat_purpose_origin_education',
       'feat_pt_accessibilityorigin', 'feat_pt_accessibilitydestination',
       'feat_distance_to_station_origin',
       'feat_distance_to_station_destination', 'feat_origin_hour',
       'feat_origin_day', 'feat_destination_hour', 'feat_destination_day'],
      dtype='object')

### Input format

The variables are not normalized. The units are the following:

* feat_age: Age in years
* feat_sex: 1 = female, 0 = male
* feat_caraccess: 1 if the person owns a car, 0 else:
* feat_employed: 1 if the person is employed, 0.5 if part time, 0 if unemployed
* feat_distance: Distance from origin to destination (in m)
* feat_purpose_destination_home: 1 if the destination is the home location, else 0
* feat_purpose_destination_leisure: 1 if the destination is visited for leisure, else 0
* feat_purpose_destination_work: 1 if the destination is visited for work, else 0
* feat_purpose_destination_shopping: 1 if the destination is visited for shopping, else 0
* feat_purpose_destination_education: 1 if the destination is visited for education, else 0
* feat_purpose_origin_home: 1 if the origin is visited for leisure, else 0
* feat_purpose_origin_leisure: 1 if the origin is visited for leisure, else 0
* feat_purpose_origin_work: 1 if the origin is the home work, else 0
* feat_purpose_origin_shopping: 1 if the origin is visited for shopping, else 0
* feat_purpose_origin_education: 1 if the origin is visited for education, else 0
* feat_pt_accessibilityorigin: Public transport accessibility at the origin location (Score from 0 to 4, 0=unknown, 1=bad PT accessibility (class D in Swiss system), 4=excellent accessibility (class A in Swiss system) - see [here](file:///Users/ninawiedemann/Downloads/oev-gueteklassen-berechnungsmethodikare.pdf) for the Swiss system)
* feat_pt_accessibilitydestination: Public transport accessibility at the destination location (Score from 0 to 4, 0=unknown, 1=bad PT accessibility, 4=excellent accessibility)
* feat_distance_to_station_origin: Distance of the origin location to the closest car sharing station (in m)
* feat_distance_to_station_destination: Distance of the destination location to the closest car sharing station (in m)
* feat_origin_hour: Hour of the day when starting the activity at the origin location - number from 0 to 24
* feat_origin_day: Weekday when at origin location (Number from 0 to 6)
* feat_destination_hour: Hour of the day when starting the activity at the origin location - number from 0 to 24
* feat_destination_day: Weekday when at destination location (Number from 0 to 6)

#### Simplest mode choice model: only based on distance and daytime

In [57]:
with open("xgb_simple.p", "rb") as infile:
    simple_model = pickle.load(infile)
simple_model.feat_columns

Index(['feat_distance', 'feat_distance_to_station_origin',
       'feat_distance_to_station_destination', 'feat_origin_hour',
       'feat_origin_day', 'feat_destination_hour', 'feat_destination_day'],
      dtype='object')

## Model output

The output will be a string that describes a transport mode. You can see the supported models in the `labels_meaning` attribute:

In [58]:
simple_model.label_meanings

array(['Mode::Bicycle', 'Mode::Bus', 'Mode::Car',
       'Mode::CarsharingMobility', 'Mode::LightRail',
       'Mode::RegionalTrain', 'Mode::Train', 'Mode::Tram', 'Mode::Walk'],
      dtype=object)

## Example application

#### Create dataframe with example input (only two rows)

In [72]:
example_input = pd.DataFrame(np.array([
# The following would be a 6km trip with start activity at 8am and end activity at 12pm on a Wednesday (2),
    # starting 300m from a car sharing station and ending 200m away from the closest car sharing station
    [60000, 300, 200, 8, 2, 12, 2], 
    [20000, 500, 600, 8, 2, 12, 2]
]), columns=simple_model.feat_columns, index=["sample_0", "sample1"])
example_input

Unnamed: 0,feat_distance,feat_distance_to_station_origin,feat_distance_to_station_destination,feat_origin_hour,feat_origin_day,feat_destination_hour,feat_destination_day
sample_0,60000,300,200,8,2,12,2
sample1,20000,500,600,8,2,12,2


In [73]:
predicted_mode = simple_model(example_input)
predicted_mode

array(['Mode::Train', 'Mode::Car'], dtype=object)