<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

<center><h1>Using Vowpal Wabbit for Recommendations</h1>

<img src="https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/logo_assets/vowpal-wabbits-github-logo.png?raw=true" height="30%" width="30%" alt="Vowpal Wabbit">
</center>

[Vowpal Wabbit](https://github.com/VowpalWabbit/vowpal_wabbit) is a fast online machine learning library that implements several algorithms relevant to the recommendation use case.

The main advantage of Vowpal Wabbit (VW) is that training is done in an online fashion typically using Stochastic Gradient Descent or similar variants, which allows it to scale well to very large datasets. Additionally, it is optimized to run very quickly and can support distributed training scenarios for extremely large datasets.

In this notebook we demonstrate how to use the VW library to generate recommendations on the [Movielens](https://grouplens.org/datasets/movielens/) dataset.

Several things are worth noting in how VW is being used in this notebook. By leveraging an Azure Data Science Virtual Machine ([DSVM](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/)), VW comes pre-installed and can be used directly from the command line. There are also python bindings to be able to use VW within a python environment and even a wrapper conforming to the SciKit-Learn Estimator API. However, the python bindings must be installed as an additional python package with Boost dependencies, so for simplicity's sake execution of VW is done via a subprocess call mimicking what would happen from the command line execution of the model.

VW expects a specific [input format](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format), and to_vw() below is a convenience function to convert the standard movielens dataset into that format. Datafiles then are written to disk and passed to VW for training.

The examples shown are to demonstrate functional capabilities not to indicate performance advantages of different approaches. There are several hyper-parameters that can greatly impact performance of VW models (see [command line options](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Command-Line-Arguments)). To properly compare approaches it is helpful to learn about and tune these parameters for production workloads.

<h3>Environment Setup</h3>

In [1]:
import os
from subprocess import run
from tempfile import TemporaryDirectory

import pandas as pd

from reco_utils.dataset.movielens import load_pandas_df
from reco_utils.dataset.python_splitters import python_stratified_split
from reco_utils.evaluation.python_evaluation import rmse, mae, exp_var, rsquared, get_top_k_items

In [2]:
def to_vw(df, output, logistic=False):
    """Convert Pandas DataFrame to vw input format
    Args:
        df (pd.DataFrame): input DataFrame
        output (str): path to output file
    """
    with open(output, 'w') as f:
        tmp = df.reset_index()

        # we need to reset the rating type to an integer to simplify the vw formatting
        tmp['rating'] = tmp['rating'].astype('int64')
        
        # convert rating to binary value
        if logistic:
            tmp['rating'] = tmp['rating'].apply(lambda x: 1 if x >= 3 else -1)
        
        for _, row in tmp.iterrows():
            f.write('{rating} {index}|u {userID} |i {itemID}\n'.format_map(row))

In [3]:
def run_vw(train_params, test_params, test_data, prediction_path):
    """Convenience function to train, test, and show metrics of interest
    Args:
        train_params (str): vw training parameters
        test_params (str): vw testing parameters
        test_data (pd.dataFrame): test data
        prediction_path (str): path to vw prediction output   
    """

    # train model
    run(train_params.split(' '), check=True)
    
    # test model
    run(test_params.split(' '), check=True)
    
    # read in predictions
    pred_data = pd.read_csv(prediction_path, delim_whitespace=True, names=['prediction'], index_col=1).join(test_data)
    
    # ensure results are integers in correct range
    pred_data['prediction'] = pred_data['prediction'].apply(lambda x: int(max(1, min(5, round(x)))))

    for f in [rmse, mae, rsquared, exp_var]:
        print('{name}: {metric}'.format(name=f.__name__.upper(), metric=f(test_data, pred_data)))

In [4]:
# create temp directory to maintain data files
tmpdir = TemporaryDirectory()

model_path = os.path.join(tmpdir.name, 'vw.model')
train_path = os.path.join(tmpdir.name, 'train.dat')
test_path = os.path.join(tmpdir.name, 'test.dat')
train_logistic_path = os.path.join(tmpdir.name, 'train_logistic.dat')
test_logistic_path = os.path.join(tmpdir.name, 'test_logistic.dat')
prediction_path = os.path.join(tmpdir.name, 'prediction.dat')

<h3>Load & Transform Data</h3>

In [5]:
# load movielens data (use the 1M dataset)
df = load_pandas_df('1m')

# split data to train and test sets, default values take 75% of each users ratings as train, and 25% as test
train, test = python_stratified_split(df)

# save train and test data in vw format
to_vw(df=train, output=train_path)
to_vw(df=test, output=test_path)

# save data for logistic regression (requires adjusting the label)
to_vw(df=train, output=train_logistic_path, logistic=True)
to_vw(df=test, output=test_logistic_path, logistic=True)

<h3>Regression Based Recommendations</h3>

When considering different approaches for solving a problem with machine learning it is helpful to generate a baseline approach to understand how more complex solutions perform across dimensions of performance, time, and resource (memory or cpu) usage.

One of the most basic approaches for a recommendation engine is to simply learn a linear regression model that is trained on examples of ratings as the target variable and corresponding user ids and movie ids as independent features.

By passing each user-item rating in as an example the model will begin to learn weights based on average ratings for each user as well as average ratings per item.

VW uses linear regression by default, so no extra command line options are needed beyond specifying where to locate the model and the data.

This however generates predicted ratings which are no longer integers, so some additional adjustments can be made at prediction time to convert them back to the integer scale of 1 through 5 if necessary. Here this is done in the evaluate function.

In [6]:
%%time 

train_params = 'vw -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_path, pred=prediction_path)

run_vw(train_params=train_params, 
       test_params=test_params, 
       test_data=test, 
       prediction_path=prediction_path)

RMSE: 0.9606618288218944
MAE: 0.6868742202744634
RSQUARED: 0.2619950895104812
EXP_VAR: 0.2620255308898497
CPU times: user 1.67 s, sys: 47.3 ms, total: 1.72 s
Wall time: 2.63 s


A similar alternative is to leverage multinomial classification, which treats each rating value as a distinct class. 

This avoids any non integer results, but also reduces the training data for each class which could lead to poorer performance if the counts of different rating levels are skewed.

Basic multiclass logistic regression can be accomplished using the One Against All approach specified by the '--oaa N' option, where N is the number of classes and proving the logistic option for the loss function to be used.

In [7]:
%%time 

train_params = 'vw --loss_function logistic --oaa 5 -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=test,
       prediction_path=prediction_path)

RMSE: 1.0615890476546246
MAE: 0.7146124564153418
RSQUARED: 0.09877954418265689
EXP_VAR: 0.12455611599414951
CPU times: user 2.11 s, sys: 31.7 ms, total: 2.14 s
Wall time: 4.61 s


Additionally, one might simply be interested in whether the user likes or dislikes an item and we can adjust the input data to represent a binary outcome, where ratings in (1,3] are dislikes (negative results) and (3,5] are likes (positive results).

This framing allows for a simple logistic regression model to be applied. To perform logistic regression the loss_function parameter is changed to 'logistic' and the target label is switched to [0, 1].

In [8]:
%%time 

train_params = 'vw --loss_function logistic -f {model} -d {data}'.format(model=model_path, data=train_logistic_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_logistic_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=test,
       prediction_path=prediction_path)

RMSE: 1.764643840355633
MAE: 1.4427961357602124
RSQUARED: -1.490189004814928
EXP_VAR: -0.07611382892527008
CPU times: user 2.05 s, sys: 43.4 ms, total: 2.09 s
Wall time: 2.9 s


So far we have treated the user features and item features independently, but taking into account interactions between features can provide a mechanism to learn more fine grained preferences of the users.

To generate interaction features use the quadratic command line argument and specify the namespaces that should be combined: '-q ui' combines the user and item namespaces based on the first letter of each.

When generating interaction terms one thing to consider is the [hash space](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Feature-Hashing-and-Extraction) used for the features. It can be beneficial to increase the size of the space to reduce unwanted collisions.

In [9]:
%%time 

train_params = 'vw -b 24 -q ui -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=test,
       prediction_path=prediction_path)

RMSE: 0.9639071277012671
MAE: 0.6913606410543489
RSQUARED: 0.2570004245847899
EXP_VAR: 0.25724987674764754
CPU times: user 1.92 s, sys: 39.5 ms, total: 1.96 s
Wall time: 3.17 s


<h3>Matrix Factorization Based Recommendations</h3>

All of the above approaches train a regression model, but VW also supports matrix factorization with two different approaches.

The first approach is called using the '--rank' command line argument and performs matrix factorization based on Singular Value Decomposition (SVD).

See the [Matrix Factorization Example](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Matrix-factorization-example) for more detail.

In [10]:
%%time 

train_params = 'vw --rank 5 -qui -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=test,
       prediction_path=prediction_path)

RMSE: 1.0159629286560183
MAE: 0.7659943699817664
RSQUARED: 0.17458205809330873
EXP_VAR: 0.22847638017481597
CPU times: user 2.45 s, sys: 35.4 ms, total: 2.48 s
Wall time: 4.51 s


An alternative approach based on [Rendel's factorization machines](https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Rendle2010FM.pdf) is called using '--lrq' (low rank quadratic). More LRQ details in this [demo](https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/demo/movielens).

This learns two lower rank matrices which are multiplied to generate an approximation of the user-item rating matrix. Compressing the matrix in this way leads to learning generalizable factors which avoids some of the limitations of using regression models with extremely sparse interaction features. This can lead to better convergence and smaller on-disk models.

An additional term to improve performance is --lrqdropout which will dropout columns during training. This however tends to increase the optimal rank size. Other parameters such as L2 regularization can help avoid overfitting.

In [11]:
%%time

train_params = 'vw --lrq ui7 --lrqdropout -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=test_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=test,
       prediction_path=prediction_path)

RMSE: 0.9825091378096849
MAE: 0.7003134896516426
RSQUARED: 0.2280460770162016
EXP_VAR: 0.22805835662769547
CPU times: user 2 s, sys: 43.5 ms, total: 2.04 s
Wall time: 3.64 s


<h3>Scoring</h3>

After training a model with any of the above approaches, the model can be used to score potential user-pairs in offline batch mode, or in a real-time scoring mode. The example below shows how to leverage the utilities in the reco_utils directory to  generate Top-K recommendations from offline scored output.

In [15]:
# store all data
data_path = os.path.join(tmpdir.name, 'all.dat')
to_vw(df=df, output=data_path)

# predict on the full set of users
train_params = 'vw --lrq ui7 --lrqdropout -f {model} -d {data}'.format(model=model_path, data=train_path)
test_params = 'vw -i {model} -d {data} -t -p {pred}'.format(model=model_path, data=data_path, pred=prediction_path)

run_vw(train_params=train_params,
       test_params=test_params,
       test_data=df,
       prediction_path=prediction_path)

RMSE: 0.9673615767747724
MAE: 0.6877662568523178
RSQUARED: 0.25011863284471003
EXP_VAR: 0.2501312700738213


In [16]:
# load predictions and filter to test set
test_users = [1, 2, 3]
pred_data = pd.read_csv(prediction_path, delim_whitespace=True, names=['prediction'], index_col=1).join(df)
test_user_data = pred_data[pred_data['userID'].isin(test_users)]

get_top_k_items(test_user_data, col_rating='prediction', k=5)

Unnamed: 0,level_0,level_1,prediction,userID,itemID,rating,timestamp
0,0,0,5.0,1,1193,5.0,978300760
1,0,2,5.0,1,914,3.0,978301968
2,0,3,5.0,1,3408,4.0,978300275
3,0,4,5.0,1,2355,5.0,978824291
4,0,5,5.0,1,1197,3.0,978302268
5,1,53,5.0,2,1357,5.0,978298709
6,1,54,5.0,2,3068,4.0,978299000
7,1,55,5.0,2,1537,4.0,978299620
8,1,57,5.0,2,2194,4.0,978299297
9,1,61,5.0,2,1103,3.0,978298905


<h3>Cleanup</h3>

In [17]:
tmpdir.cleanup()