# Customer Bucketing

In this notebook we'll train a simple feed forward neural network (multilayer perceptron) to bucket customers into customer cohorts defined by product interest and demographic information.

First, some libraries:

In [None]:
import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.autograd import Variable

# interactive plotting by Bokeh
from bokeh.plotting import figure
from bokeh.io import show, output_notebook, push_notebook

# pretty progress by tqdm
from tqdm import tnrange

We're going to use PyViz's [Bokeh](https://bokeh.pydata.org/en/latest/) to build interactive plots in this notebook. Bokeh uses a js kernel to serve data to screen, so we need to initialize it.

In [None]:
output_notebook()

### Load and Clean Data

Next, we'll use PyData's [Pandas](https://pandas.pydata.org/) to load in a `.csv` file of customers, their purchases, and a small amount of demographic information about them. We'll then split the resulting dataframe into three arrays:

1. A *[samples x 1]* vector of customer UUIDs.

2. A *[samples x 1]* vector of customer cohort buckets. These are our "labels" for training our model.

3. A *[samples x features]* matrix of feature vectors. The first columns are number of items purchased by each customer, and the last columns are the demographic information we have about the customer.

In [None]:
# read in .csv file as a Pandas DataFrame
customers = pd.read_csv('../dat/feature_vectors.csv')

# output number of customers and calculated number of buckets
num_cust = customers.shape[0]
num_bkts = customers['KMeanGrouping'].unique().shape[0]

print('number of customers:', num_cust)
print('number of customer buckets:', num_bkts)

# split off customer UUIDs
uuids = customers['CustomerID']
# split off bucket labels for training
buckets = customers['KMeanGrouping']
# relabel buckets in the range [0-30], since there are missing integers in the file
buckets.replace(to_replace=buckets.unique(), value=range(buckets.unique().shape[0]), inplace=True)
# drop the labels and the UUIDs from the feature vectors
customers.drop(['CustomerID','KMeanGrouping'], axis=1, inplace=True)

# calculate number of features
num_ftrs = customers.shape[1]

### Build neural network

This model is slightly different from the one in the regression task of notebook `2.0-Simple-Neural-Network.ipynb` for two reasons:

1. The input and output of this network has mulitiple features, defined by `input_size` and `num_classes`. This is because we have a multifeature input, or "feature vector." In the regression task, we input one value, and expected one output value. In this task, we input as many values as we have products and demographic info. 

2. The output of the model is a 30 element vector which represents the probability of the input sample corresponding to each of the 30 "classes," or buckets. The `nn.Softmax` layer on the back of the model calculates these probabilities. Instead of "regressing" one input x value to one output y value, we "classify" one feature vector to one class.

We've also included a set of commented out layers. By adding these layers back in at home, you'll add more depth to your network, and increase the accuracy of your predictions. The cost of calculating and backpropagating gradients increases dramatically, so don't expect to be able to train it in an hour on a single thread.

In [None]:
class FirstNet(nn.Module):
    def __init__(self, input_size, num_classes):
        super(FirstNet, self).__init__()
        self.fc1  = nn.Linear(input_size, 3000)
        self.relu1 = nn.ReLU()
        #self.fc2  = nn.Linear(3000, 2000)
        #self.relu2 = nn.ReLU()
        #self.fc3  = nn.Linear(2000, 1000)
        #self.relu3 = nn.ReLU()
        self.fc4  = nn.Linear(3000, num_classes)
        self.soft = nn.Softmax(dim=1)
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        #out = self.fc2(out)
        #out = self.relu2(out)
        #out = self.fc3(out)
        #out = self.relu3(out)
        out = self.fc4(out)
        out = self.soft(out)
        return out
    
net = FirstNet(input_size=num_ftrs, num_classes=num_bkts)

print(net)

### Test the Model

Below, we'll run the data through the untrained model to see what the output looks like. First we check for a GPU:

In [None]:
use_cuda = torch.cuda.is_available()
if use_cuda:
    net = net.cuda()

In [None]:
X = customers.as_matrix()
X = torch.FloatTensor(X)
X = Variable(X)

if use_cuda:
    Y = net.forward(X.cuda()).cpu()
else:
    Y = net.forward(X)

Now we plot the output from the untrained model against the groundtruth. You can click the labels in the legend to turn off the predictions or the true class values. If you click the mouse wheel control on the right side of the diagram, you can zoom and pan simultaneously.

In [None]:
# set up the plot
p1 = figure(plot_width=900, plot_height=500, title="Customer Cohort Buckets")
p1.title.text_font_size = '24pt'
p1.xaxis.axis_label = 'Customer UUID'
p1.yaxis.axis_label = 'Cohort #'

# plot the cohort bucket data
r1 = p1.circle(uuids, buckets, fill_alpha=0.6, line_alpha=0.6, legend='groundtruth')
# plot the predictions from the network
r2 = p1.circle(uuids, np.argmax(Y.data, axis=1), fill_alpha=0.2, line_alpha=0.2, 
               fill_color='red', line_color='red', legend='prediction')

# set up the legend
p1.legend.location = "top_left"
p1.legend.click_policy="hide"

# show the plot inline
show(p1, notebook_handle=True)

### Train the Model

Now let's train the model. Even the tiny neural network we defined above will take longer than we have time to train during this class, but let's kick it off, watch the loss, and see if it's learning anything:

In [None]:
%%time

# format the labels as PyTorch variables
Y = buckets.as_matrix()
Y = Variable(torch.LongTensor(Y))

# define hyperparameters
learning_rate = 0.001
num_epochs = 2000
loss_hist = []

# build a multiclass cross entropy loss function
criterion = nn.CrossEntropyLoss()
# instantiate a stochastic gradient descent optimizer class
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate)
# set the model parameters for training mode
net.train()

# build a loss plot
p2 = figure(plot_width=900, plot_height=500)
r2 = p2.line(range(len(loss_hist)), loss_hist)
p2.legend.location = "top_left"
p2.legend.click_policy="hide"
loss_plot = show(p2, notebook_handle=True)

# send data to GPU, if appropriate
if use_cuda:
    criterion = criterion.cuda()
    X = X.cuda()
    Y = Y.cuda()

# train for many epochs
for epoch in tnrange(num_epochs):
    # forward pass through the model
    pred = net.forward(X)
    # calculate local value on the loss surface
    loss = criterion(pred, Y)

    # clear the gradient buffer
    optimizer.zero_grad()
    # backward pass through the model to calculate gradients
    loss.backward()
    # take one step towards a minimum in the loss surface
    optimizer.step()

    # replot the network loss for one epoch
    loss_hist.append(loss.data[0])
    r2 = p2.line(range(len(loss_hist)), loss_hist)
    push_notebook(handle=loss_plot)
    
# set the model parameters for inference mode
net.eval()

### Save Trained Model

Always (always!) save your trained model weights. You'll thank yourself laters.

In [None]:
np.save('loss.npy', np.array(loss_hist), allow_pickle=False)

torch.save(net.cpu().state_dict(), 'firstnet.bin')