## Transportation Forecasting
For TRBAM 2019 TRANSFOR19 Forecasting competition

### This is Part 2 of 3 part series:

Use the following to navigate

- [Part 1: data processing](01_processing.ipynb)
- [Part 2: data preparation](02_preparation.ipynb)
- [Part 3: model training](03_training.ipynb)

-----

### 5. Preparing Train, Validation and Test datasets

- to run our machine learning model, we create 3 exclusive datasets: train, valid and test

In [1]:
# read from csv file

import pandas as pd
import pytz

datatable = pd.read_csv('datatable_full.csv', index_col=0)
speedtable = pd.read_csv('speedtable_full.csv', index_col=0)
predictions = pd.read_csv('Predictions.csv', index_col=0)

predictions.index = pd.to_datetime(predictions.index).tz_localize('UTC').tz_convert(pytz.timezone('Asia/Shanghai'))
datatable.index = pd.to_datetime(datatable.index).tz_localize('UTC').tz_convert(pytz.timezone('Asia/Shanghai'))
speedtable.index = pd.to_datetime(speedtable.index).tz_localize('UTC').tz_convert(pytz.timezone('Asia/Shanghai'))

In [2]:
train_y = speedtable.loc[
    ~speedtable.index.hour.isin([0, 1, 2, 3, 4, 5]) &
    ~(speedtable.index.month_name().isin(['December']) &
        speedtable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20])) &
    ~(speedtable.index.month_name().isin(['November']) & 
      speedtable.index.day_name().isin(['Thursday']) &
      speedtable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20]) &
      (speedtable.index.week > 45))
]

train_y.to_csv('train_y.csv')
print(train_y.shape)

(13032, 2)


In [3]:
train_x = datatable.loc[
    ~datatable.index.hour.isin([0, 1, 2, 3, 4, 5]) &
    ~(datatable.index.month_name().isin(['December']) &
        datatable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20])) &
    ~(datatable.index.month_name().isin(['November']) & 
      datatable.index.day_name().isin(['Thursday']) &
      datatable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20]) &
      (datatable.index.week > 45))
]

train_x = ((train_x - datatable.mean(axis=0))/datatable.std(axis=0)).fillna(0.)
train_x.to_csv('train_x.csv')
print(train_x.shape)

(13032, 1024)


In [4]:
valid_y = speedtable.loc[
    speedtable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20]) &
    speedtable.index.day_name().isin(['Thursday']) &
    speedtable.index.month_name().isin(['November']) &
    (speedtable.index.week > 46)
]
valid_y.to_csv('valid_y.csv')
print(valid_y.shape)

(120, 2)


In [5]:
valid_x = datatable.loc[
    datatable.index.hour.isin([6, 7, 8, 9, 10, 16, 17, 18, 19, 20]) &
    datatable.index.day_name().isin(['Thursday']) &
    datatable.index.month_name().isin(['November']) &
    (speedtable.index.week > 46)
]
valid_x = ((valid_x - datatable.mean(axis=0))/datatable.std(axis=0)).fillna(0.)
valid_x.to_csv('valid_x.csv')
print(valid_x.shape)

(120, 1024)


In [6]:
test_y = predictions.loc['2016-12-01 06:00:00+08:00':].fillna(0.)
test_y.to_csv('test_y.csv')
print(test_y.shape)

(216, 2)


In [7]:
test_x = datatable.loc[
    (datatable.index.hour >= 6) &
    datatable.index.month_name().isin(['December'])
]
test_x = ((test_x - datatable.mean(axis=0))/datatable.std(axis=0)).fillna(0.)
test_x.to_csv('test_x.csv')
print(test_x.shape)

(216, 1024)


[Continue](03_training.ipynb) to next step for model training