## Dynamic Training

There are two ways to train a model:

A static model is trained offline. That is, we train the model exactly once and then use that trained model for a while.

A dynamic model is trained online. That is, data is continually entering the system and we're incorporating that data into the model through continuous updates.

### Resources:
+ [vowpal-wabbit wiki](https://github.com/JohnLangford/vowpal_wabbit/wiki)
+ [Data Science at the Command Line](https://github.com/jeroenjanssens/data-science-at-the-command-line)
+ [John Langford - Vowpal Wabbit, the Next Generation](https://www.youtube.com/watch?v=HEob9cRkpDk)
+ [Solving NLP problems with Vowpal Wabbit](https://github.com/hal3/vwnlp)

In [13]:
import logging, xgboost as xgb, numpy as np
from sklearn.metrics import mean_absolute_error
import joblib
import pandas as pd
from datetime import datetime
import pickle
import time

DATA_DIR="../data/processed/"

train = pd.read_csv(DATA_DIR + "train.csv")
test = pd.read_csv(DATA_DIR + "test.csv") 
print(train.shape)
print(test.shape)

(29465, 32)
(14513, 283)


In [14]:
train.head()

Unnamed: 0.1,Unnamed: 0,Year,Month,DayofMonth,DayOfWeek,DepTime,CRSDepTime,ArrTime,CRSArrTime,UniqueCarrier,...,Cancelled,CancellationCode,Diverted,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay,IsArrDelayed,IsDepDelayed
0,31391,2002,1,6,7,1817.0,1755,2106.0,2012,US,...,0,,0,,,,,,YES,YES
1,29297,2001,1,24,3,716.0,720,832.0,841,US,...,0,,0,,,,,,NO,NO
2,29967,2001,1,17,3,758.0,800,1028.0,1007,US,...,0,,0,,,,,,YES,NO
3,27492,2000,1,6,4,1648.0,1645,1843.0,1853,DL,...,0,,0,,,,,,NO,YES
4,17146,1995,1,18,3,1402.0,1400,1508.0,1512,US,...,0,,0,,,,,,NO,YES


In [15]:
train.columns

Index(['Unnamed: 0', 'Year', 'Month', 'DayofMonth', 'DayOfWeek', 'DepTime',
       'CRSDepTime', 'ArrTime', 'CRSArrTime', 'UniqueCarrier', 'FlightNum',
       'TailNum', 'ActualElapsedTime', 'CRSElapsedTime', 'AirTime', 'ArrDelay',
       'DepDelay', 'Origin', 'Dest', 'Distance', 'TaxiIn', 'TaxiOut',
       'Cancelled', 'CancellationCode', 'Diverted', 'CarrierDelay',
       'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
       'IsArrDelayed', 'IsDepDelayed'],
      dtype='object')

In [16]:
!csv2vw --help

csv2vw version [unknown] calling Getopt::Std::getopts (version 1.07 [paranoid]),
running under Perl version 5.18.2.

Usage: csv2vw [-OPTIONS [-MORE_OPTIONS]] [--] [PROGRAM_ARG1 ...]

The following single-character options are accepted:
	With arguments: -s
	Boolean (without arguments): -v -h

Options may be merged together.  -- stops processing of options.
Space is not required between options and their arguments.
  [Now continuing due to backward compatibility and excessive paranoia.
   See 'perldoc Getopt::Std' about $Getopt::Std::STANDARD_HELP_VERSION.]
Usage: csv2vw [options] [<label_column>] files...
    Options:
        -v          verbose
        -h          first line is header
        -s<sep>     explicitly specify field separator (perl-regexp)
                    the default is: '(?^:[,\t])'

    Args:
        If a numeric arg <label_column> is specified, it will be
        used as the index (1st index is 0) of the label (target
        feature) column.

In [17]:
!csv2vw -h -- -1 ../data/processed/train.csv > ../data/processed/train.vw

In [18]:
!head ../data/processed/train.vw

1 1|f :31391 Year:2002 Month:1 DayofMonth:6 DayOfWeek:7 DepTime:1817.0 CRSDepTime:1755 ArrTime:2106.0 CRSArrTime:2012 UniqueCarrier=US FlightNum:781 TailNum=N435äâ ActualElapsedTime:169.0 CRSElapsedTime:137.0 AirTime:127.0 ArrDelay:54.0 DepDelay:22.0 Origin=PIT Dest=MCO Distance:834.0 TaxiIn:5.0 TaxiOut:37.0 Cancelled:0 CancellationCode= Diverted:0 CarrierDelay= WeatherDelay= NASDelay= SecurityDelay= LateAircraftDelay= IsArrDelayed=YES
2 2|f :29297 Year:2001 Month:1 DayofMonth:24 DayOfWeek:3 DepTime:716.0 CRSDepTime:720 ArrTime:832.0 CRSArrTime:841 UniqueCarrier=US FlightNum:426 TailNum=N375äâ ActualElapsedTime:76.0 CRSElapsedTime:81.0 AirTime:61.0 ArrDelay:-9.0 DepDelay:-4.0 Origin=CHS Dest=DCA Distance:444.0 TaxiIn:5.0 TaxiOut:10.0 Cancelled:0 CancellationCode= Diverted:0 CarrierDelay= WeatherDelay= NASDelay= SecurityDelay= LateAircraftDelay= IsArrDelayed=NO
2 3|f :29967 Year:2001 Month:1 DayofMonth:17 DayOfWeek:3 DepTime:758.0 CRSDepTime:800 ArrTime:1028.0 CRSArrTime:1007 UniqueCa

In [19]:
!sed 's/^2 /-1 /g' ../data/processed/train.vw > ../data/processed/train_transformed.vw

In [20]:
!head ../data/processed/train_transformed.vw

1 1|f :31391 Year:2002 Month:1 DayofMonth:6 DayOfWeek:7 DepTime:1817.0 CRSDepTime:1755 ArrTime:2106.0 CRSArrTime:2012 UniqueCarrier=US FlightNum:781 TailNum=N435äâ ActualElapsedTime:169.0 CRSElapsedTime:137.0 AirTime:127.0 ArrDelay:54.0 DepDelay:22.0 Origin=PIT Dest=MCO Distance:834.0 TaxiIn:5.0 TaxiOut:37.0 Cancelled:0 CancellationCode= Diverted:0 CarrierDelay= WeatherDelay= NASDelay= SecurityDelay= LateAircraftDelay= IsArrDelayed=YES
-1 2|f :29297 Year:2001 Month:1 DayofMonth:24 DayOfWeek:3 DepTime:716.0 CRSDepTime:720 ArrTime:832.0 CRSArrTime:841 UniqueCarrier=US FlightNum:426 TailNum=N375äâ ActualElapsedTime:76.0 CRSElapsedTime:81.0 AirTime:61.0 ArrDelay:-9.0 DepDelay:-4.0 Origin=CHS Dest=DCA Distance:444.0 TaxiIn:5.0 TaxiOut:10.0 Cancelled:0 CancellationCode= Diverted:0 CarrierDelay= WeatherDelay= NASDelay= SecurityDelay= LateAircraftDelay= IsArrDelayed=NO
-1 3|f :29967 Year:2001 Month:1 DayofMonth:17 DayOfWeek:3 DepTime:758.0 CRSDepTime:800 ArrTime:1028.0 CRSArrTime:1007 Unique

In [21]:
!vw --help

Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = 
num sources = 1

VW options:
  --ring_size arg                       size of example ring
  --onethread                           Disable parse thread

Update options:
  -l [ --learning_rate ] arg            Set learning rate
  --power_t arg                         t power value
  --decay_learning_rate arg             Set Decay factor for learning_rate 
                                        between passes
  --initial_t arg                       initial t value
  --feature_mask arg                    Use existing regressor to determine 
                                        which parameters may be updated.  If no
                                        initial_regressor given, also used for 
                                        initial weights.

Weight options:
  -i [ --initial_regressor ] arg        Initial regressor(s)
  --initial_weight arg          

                                        m=multiclass, c=cost sensitive] with 
                                        specified buffer size

  --replay_m_count arg (=1)             how many times (in expectation) should 
                                        each example be played (default: 1 = 
                                        permuting)

Binary loss:
  --binary                              report loss as binary classification on
                                        -1,1


Bootstrap:
  --bootstrap arg                       k-way bootstrap by online importance 
                                        resampling

  --bs_type arg                         prediction type {mean,vote}

scorer options:
  --link arg (=identity)                Specify the link function: identity, 
                                        logistic, glf1 or poisson

Stagewise polynomial options:
  --stage_poly                          use stagewise polynomial feature 
          

In [22]:
!vw \
  --data=../data/processed/train_transformed.vw \
  --binary \
  --loss_function=logistic \
  --readable_model=model.vw \
  --kill_cache \
  --predictions=predictions_train

predictions = predictions_train
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = ../data/processed/train_transformed.vw
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0   1.0000  -1.0000       30
1.000000 1.000000            2            2.0  -1.0000   1.0000       30
0.750000 0.500000            4            4.0   1.0000  -1.0000       30
0.625000 0.500000            8            8.0   1.0000   1.0000       30
0.562500 0.500000           16           16.0  -1.0000  -1.0000       25
0.437500 0.312500           32           32.0  -1.0000  -1.0000       30
0.375000 0.312500           64           64.0   1.0000   1.0000       30
0.343750 0.312500          128          128.0   1.0000   1.0000       30
0.300781 0.257812          256          256.0  -1.0000  -1.0000       30
0.26757