Skip to content

Introduction

ivan-pavlov edited this page Mar 17, 2019 · 2 revisions

Introduction

Vowpal Wabbit is an online machine learning system that is known for its speed and scalability and is widely used in research and industry.

This package aims to bring its functionality to R.

Installation

First you have to get Vowpal Wabbit from here.

And then install the rvw package using devtools:

library(devtools)
install_github("rvw-org/rvw")

Using rvw

rvw package gives you an access to various learning algorithms from Vowpal Wabbit. In this tutorial, you will see how you can use rvw for a multiclass classification problem.

Data preparation

Here we will try to predict the age group of abalone (based on number of abalone shell rings) from physical measurements. We will use Abalone Data Set from UCI Machine Learning Repository.

library("rvw")
library("mltools")  # We need "mltools" for data preparation
library("magrittr") # We use "magrittr" for its "pipe" operator

set.seed(1)
data_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
data_names = c('sex','length','diameter','height','weight.w','weight.s','weight.v','weight.sh','rings')
data_full = read.table(data_url, header = F , sep = ',', col.names = data_names)

# Split number of rings into groups with equal (as possible) number of observations
classes = 3 # Split into 3 groups
data_full$group <- bin_data(data_full$rings, bins=classes, binType = "quantile")
group_lvls <- levels(data_full$group)
levels(data_full$group) <- seq_len(classes)

# Prepare indices to split data
ind_train <- sample(1:nrow(data_full), 0.8*nrow(data_full))
# Split data into train and test subsets
df_train <- data_full[ind_train,]
df_test <- data_full[-ind_train,]

Vowpal Wabbit input format

In order to use Vowpal Wabbit we have to convert our data from data.frame format to .vw plain text format.

Each example should be formatted as follows:

[Label] [Importance] [Base] [Tag]|Namespace Features |Namespace Features ... |Namespace Features

One row from df_train:

df_train[1,]
#>      sex length diameter height weight.w weight.s weight.v weight.sh rings group
#> 1110   M   0.52      0.4  0.145   0.7765   0.3525   0.1845     0.185     9     2     

will be converted to:

#> 2 |a sex^M |b diameter:0.4 length:0.52 height:0.145 |c weight.w:0.7765 weight.s:0.3525 weight.v:0.1845 weight.sh:0.185

Such conversion can be done using df2vw() function:

train_file_path <- file.path(tempdir(), "df_train.vw")
test_file_path <- file.path(tempdir(), "df_test.vw")
# For df_train
df2vw(data = df_train, file_path = train_file_path,
      namespaces = list(a = list("sex") ,
                        b = list("diameter", "length", "height"),
                        c = list("weight.w","weight.s","weight.v","weight.sh")),
      targets = "group")
# And for df_test
df2vw(data = df_test, file_path = test_file_path,
      namespaces = list(a = list("sex") ,
                        b = list("diameter", "length", "height"),
                        c = list("weight.w","weight.s","weight.v","weight.sh")),
      targets = "group")

Arguments we use here:

  • data - data to convert in data.frame format
  • file_path - the file path for converted data in .vw file format
  • namespaces - specify namespace for a list of features. All the features in one namespace will be hashed together in a same feature space. This is used extensively in Vowpal Wabbit (e.g. for feature interactions)
  • targets - name of a column containing a label (real number) that we are trying to predict

Basic usage

First we set up our Vowpal Wabbit model:

vwmodel <- vwsetup(general_params = list(random_seed=1),
                   feature_params = list(quadratic="bc"),
                   optimization_params = list(learning_rate=0.05, l1=1E-7),
                   option = "ect", num_classes = 3)

Arguments we use here:

  • general_params - list of parameters that define general behavior of vwmodel. Here we specify random_seed=1 - seed for random number generator
  • feature_params - list of parameters associated with feature hashing and interactions. Here we use quadratic="bc". This will create an interaction feature for every pair of features from namespace b and namespace c. You can read more about feature interactions here
  • optimization_params - list of parameters for optimization process. learning_rate=0.05 sets initial learning rate and l2=1E-7 defines L2 regularization
  • option - specify learning algorithm/reduction to use. Here we will use ect - Error Correcting Tournament algorithm to train multiclass classification model;
  • ... - additional arguments for option. num_classes = 3 - number of classes in our data;

It is also possible to specify parameters directly, like in CL version of Vowpal Wabbit:

vwsetup(
    params_str = "--random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1E-7 --ect 3"
)

This mode is generally not recomended, as it doesn't have parameters checking. Also, parameters from other vw functions can't be specified via "params_str".


Now we are ready to start training our model:

vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100)
#> Starting VW training session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters:
#> --random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1e-07 --ect 3 -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw -f /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 100 --kill_cache --cache_file /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache
#> creating quadratic features for pairs: bc
#> using l1 regularization = 1e-07
#> final_regressor = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Num weight bits = 18
#> learning rate = 0.05
#> initial_t = 0
#> power_t = 0.5
#> decay_learning_rate = 1
#> creating cache_file = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        1        2       21
#> 0.500000 0.000000            4            4.0        2        2       21
#> 0.500000 0.500000            8            8.0        2        2       21
#> 0.562500 0.625000           16           16.0        3        2       21
#> 0.531250 0.500000           32           32.0        3        2       21
#> 0.546875 0.562500           64           64.0        2        2       21
#> 0.500000 0.453125          128          128.0        1        2       21
#> 0.484375 0.468750          256          256.0        2        2       21
#> 0.476562 0.468750          512          512.0        2        2       21
#> 0.451172 0.425781         1024         1024.0        2        2       21
#> 0.417969 0.384766         2048         2048.0        1        2       21
#> 0.411894 0.411894         4096         4096.0        1        1       21 h
#> 0.405941 0.400000         8192         8192.0        2        2       21 h
#> 
#> finished run
#> number of examples per pass = 3007
#> passes used = 5
#> weighted example sum = 15035.000000
#> weighted label sum = 0.000000
#> average loss = 0.386228 h
#> total feature number = 315685

And finally compute predictions using trained model:

vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        3        2       21
#> 1.000000 1.000000            4            4.0        3        2       21
#> 0.500000 0.000000            8            8.0        2        2       21
#> 0.562500 0.625000           16           16.0        2        2       21
#> 0.343750 0.125000           32           32.0        2        2       21
#> 0.437500 0.531250           64           64.0        1        1       21
#> 0.453125 0.468750          128          128.0        3        2       21
#> 0.378906 0.304688          256          256.0        1        2       21
#> 0.337891 0.296875          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.363636
#> total feature number = 17556

Arguments we use here:

  • vwmodel - our model object
  • data - data to use. Can be of type character (will be interpreted as a file path) or of type data.frame (then df2vw() will be used to convert data first)
  • passes - number of times the learning algorithm will cycle over the data

Accessing the results

If we want to review parameters of our model and evaluation results we can simply print vwmodel:

vwmodel
#>  Vowpal Wabbit model
#> Learning algorithm:   sgd 
#> Working directory:   /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA 
#> Model file:   /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA/vw_1534797504_mdl.vw 
#> General parameters: 
#>   random_seed :   1 
#>   ring_size :  Not defined
#>   holdout_off :   FALSE 
#>   holdout_period :   10 
#>   holdout_after :   0 
#>   early_terminate :   3 
#>   loss_function :   squared 
#>   link :   identity 
#>   quantile_tau :   0.5 
#> Feature parameters: 
#>   bit_precision :   18 
#>   quadratic :   bc 
#>   cubic :  Not defined
#>   interactions :  Not defined
#>   permutations :   FALSE 
#>   leave_duplicate_interactions :   FALSE 
#>   noconstant :   FALSE 
#>   feature_limit :  Not defined
#>   ngram :  Not defined
#>   skips :  Not defined
#>   hash :  Not defined
#>   affix :  Not defined
#>   spelling :  Not defined
#>   interact :  Not defined
#> Learning algorithms / Reductions: 
#>   ect :
#>       num_classes :   3 
#> Optimization parameters: 
#>   adaptive :   TRUE 
#>   normalized :   TRUE 
#>   invariant :   TRUE 
#>   adax :   FALSE 
#>   sparse_l2 :   0 
#>   l1_state :   0 
#>   l2_state :   1 
#>   learning_rate :   0.05 
#>   initial_pass_length :  Not defined
#>   l1 :   1e-07 
#>   l2 :   0 
#>   no_bias_regularization :  Not defined
#>   feature_mask :  Not defined
#>   decay_learning_rate :   1 
#>   initial_t :   0 
#>   power_t :   0.5 
#>   initial_weight :   0 
#>   random_weights :  off
#>   normal_weights :  off
#>   truncated_normal_weights :  off
#>   sparse_weights :   FALSE 
#>   input_feature_regularizer :  Not defined
#> Model evaluation. Training: 
#>   num_examples :   15035 
#>   weighted_example_sum :   15035 
#>   weighted_label_sum :   0 
#>   avg_loss :   0.3862275 
#>   total_feature :   315685 
#> Model evaluation. Testing: 
#>   num_examples :   836 
#>   weighted_example_sum :   836 
#>   weighted_label_sum :   0 
#>   avg_loss :   0.3636364 
#>   total_feature :   17556

For more in-depth analysis of our model we can inspect weights of the final regressor:

vwaudit(vwmodel = vwmodel)
#>                     Names Hashes Model.values
#> 1                 a^sex^M  58028    0.0363983
#> 2              b^diameter  57998   -0.0403499
#> 3                b^length 145012   -0.0496383
#> 4                b^height 107988   -0.0787311
#> 5              c^weight.w  90732    0.0901030
#> 6              c^weight.s 212300    0.1253380
#> 7              c^weight.v  65196    0.3800930
#> 8             c^weight.sh  76534    0.4210850
#> 9                Constant 232120   -0.1986420
#> 10  b^diameter*c^weight.w 116710    0.2348970
#> 11  b^diameter*c^weight.s 235718    0.4218860
#> 12  b^diameter*c^weight.v  23334    1.0003500
#> 13 b^diameter*c^weight.sh 102268    1.0187700
#> 14    b^length*c^weight.w 187120    0.1662080
#> 15    b^length*c^weight.s  34256    0.2956240
#> 16    b^length*c^weight.v 214576    0.7064920
#> 17   b^length*c^weight.sh 168554    0.7264760
#> 18    b^height*c^weight.w  93904    0.6733480
#> 19    b^height*c^weight.s 209392    1.1952400
#> 20    b^height*c^weight.v  61968    2.8984900
#> 21   b^height*c^weight.sh  75338    2.9625199
#> 22                a^sex^I  64526   -0.3009660
#> 23                a^sex^F 244246    0.0278317

Also it is possible to access and modify parameters of our model

# Show the learning rate of our model
vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.05
# Change it to 0.1
vwparams(vwmodel = vwmodel, name = "learning_rate") <- 0.1
# And show again
vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.1

Extending our model

We can add more learning algorithms to our model. For example we want to use boosting algorithm with 100 "weak" learners. Then we will just add this option to our model and train again:

vwmodel <- add_option(vwmodel, option = "boosting", num_learners=100)
# We add *quiet = T* to hide output diagnostics.
vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100, quiet = T)
vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Number of weak learners = 100
#> Gamma = 0.100000
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        3        2       21
#> 1.000000 1.000000            4            4.0        3        2       21
#> 0.500000 0.000000            8            8.0        2        2       21
#> 0.500000 0.500000           16           16.0        2        2       21
#> 0.343750 0.187500           32           32.0        2        2       21
#> 0.406250 0.468750           64           64.0        1        1       21
#> 0.429688 0.453125          128          128.0        3        2       21
#> 0.375000 0.320312          256          256.0        1        2       21
#> 0.341797 0.308594          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.348086
#> total feature number = 17556

Contextual Bandit algorithms in Vowpal Wabbit

Vowpal Wabbit is famous for its the highly optimized Contextual Bandit algorithms and we can use them in rvw as well:

cb_model <- vwsetup(general_params = list(random_seed=1),
                    feature_params = list(quadratic="bc"),
                    option = "cbify", num_classes = 3) %>%
    add_option(option = "cb_explore",
               num_actions=3, explore_type="bag", explore_arg=7)

vwtrain(vwmodel = cb_model, data = train_file_path, passes = 100, quiet = T)
vw_pred <- predict(object = cb_model, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 0.500000 0.000000            2            2.0        3        3       21
#> 0.250000 0.000000            4            4.0        3        3       21
#> 0.500000 0.750000            8            8.0        2        3       21
#> 0.312500 0.125000           16           16.0        2        3       21
#> 0.500000 0.687500           32           32.0        2        3       21
#> 0.406250 0.312500           64           64.0        1        1       21
#> 0.335938 0.265625          128          128.0        3        3       21
#> 0.378906 0.421875          256          256.0        1        1       21
#> 0.453125 0.527344          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.470096
#> total feature number = 17556

New arguments we use here:

  • option = "cbify" - datasets for Contextual Bandit are highly proprietary and often hard to come by, but we can transform any multiclass classification training set into a contextual bandit training set using this option
  • num_classes - number of classes in our multiclass training set
  • option = "cb_explore" - we train a Contextual Bandit algorithm with fixed maximum number of actions. You can read more about Contextual Bandit algorithms in Vowpal Wabbit here
  • num_actions - maximum number of actions for an algorithm
  • explore_type - exploration strategy. We use Bagging Explorer - the exploration rule that is based on an ensemble approach
  • explore_arg - argument for the exploration algorithm. For Bagging Explorer it is number of different policies to train.

We can see that, unsurprisingly, Contextual Bandit model performs worser than multiclass classification model under full information in terms of the average testing error:

0.3480861 for multiclass classification model
0.4700957 for Contextual Bandit model

Acknowledgements

Development of rvw package started as R Vowpal Wabbit (Google Summer of Code 2018) project. with Dirk Eddelbuettel and James J Balamuta mentoring this project and the R Project for Statistical Computing as the mentor organization.

You can’t perform that action at this time.