# SpaceBandits Contextual Bandits Demo
This notebook demonstrates the basic usage of SpaceBandits. The package is currently in development.

## Building a Neural Model

Linear models are powerful but inherently limited. The Neural-Linear Bayesian Contextual Bandits model, which was named and explored in the 2018 research paper [Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling](https://arxiv.org/pdf/1802.09127.pdf), uses a neural network to give the model a powerful way to map a feature vector to a latent representational feature space. These learned features are used in a standard linear model identical to the one used above.<br><br>
SpaceBandits lets us deploy the same model with the API as above. In practice, designing the model is can be somewhat complicated; the neural network adds a huge number of hyperparameters. SpaceBandits uses the default parameters used in the research paper to give users a nice starting point; modifying them is easy.

In [1]:
from space_bandits import NeuralBandits

num_actions = 2 # 2 actions
num_features = 14 # 14 features 
num_user = 200
embed_dim = 64
output_size_wide = 1
model = NeuralBandits(num_actions, 
                      num_features, 
                      num_user=200, 
                      embed_dim=embed_dim, 
                      output_size_wide=output_size_wide, 
                      layer_sizes_deep=[128, 64, 32])

We can update the model in the same way as before. To improve training efficiency, the neural network only trains after a pre-defined number of updates. The default neural network training frequency is every 50 updates (modify the training_freq_network argument to change this); each time this occurs, the network trains for 100 epochs at each training session by default (modify the training_epochs argument to change this).

In [2]:
import numpy as np
# here we update the model 100 times.

for i in range(100):
    user_index = np.random.randint(0, num_user)
    context = np.random.random((num_features))
    action = np.random.randint(0,num_actions)
    reward = np.random.random() * 10
    model.update(user_index, context, action, reward)

Training neural_model-bnn for 100 steps...
Training neural_model-bnn for 100 steps...


As with the linear model, the neural model will record all examples by default; modify the memory_size parameter (default value -1, for inf) on the constructor function to manage memory and training time.

## Saving a Neural Model

Neural models actually consist of two models: a neural network and a Bayesian linear regression model. To manage this for saving, SpaceBandits creates a .zip file that keeps your models together.

In [3]:
model.save('my_neural_model.pkl')

from space_bandits import load_model
#don't forget the .zip extension when restoring your neural model.
model = load_model('my_neural_model.pkl')

## Expected Values
We don't like black boxes. Model interpretation is critical for solid data science. Any SpaceBandits model will return its expected reward values for a given context using the .expected_values() method:

In [4]:
user_index

37

In [5]:
context

array([0.28149541, 0.51387642, 0.82579794, 0.41109477, 0.92164225,
       0.33567452, 0.58985152, 0.5867675 , 0.30294768, 0.12201099,
       0.98096452, 0.15628249, 0.16278306, 0.10798615])

In [6]:
model.expected_values(user_index, context)

array([[1.3768595],
       [1.3789343]])

Neural models make use of a latent representation of the input features; this feature vector is called $z$ in the Google Brain research paper. You can retrieve the model's latent feature vector using the .get_representation() method.

In [8]:
model.get_representation(user_index, context)

tensor([[0.0000, 7.0656, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]])

## Model Evaluation
Evaluating bandit models in real-life situations is not easy. The only way to really tell if your model is doing well is to put it into production and compare its results to other decision-making policies. Simulations and toy problems where action/reward relationships are known are a great place to start. Unfortunately, public contextual bandits datasets are hard to come by!<br><br>
For a look at some toy problems, check out the [toy problem notebook](toy_problem.ipynb).