# Training a commute prediction network, and visualizing learning!  
<ul> latest version available from: https://github.com/miroenev/teach_DL , prerequisites:
* Matplotlib, Numpy, MxNet, and <a href="https://github.com/K3D-tools/K3D-jupyter">K3D</a> for realtime training 3D surface visualization

A video walkthrough of this notebook is <a href='https://youtu.be/HgbGJn9yz30'> available on YouTube</a>.

In [1]:
import warnings; warnings.filterwarnings("ignore")

import numpy as np
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
from k3d import K3D

#set default figure size
plt.rcParams['figure.figsize'] = [9.5, 5]

In [2]:
%matplotlib notebook

In [3]:
import mxnet as mx

# Define the problem

Lets try to predict commute duration from two observable independent variables: the time of day and the weather conditions.

<img src='figures/commute.png' width='400'/>
<img src='figures/target_distribution.PNG' width='1000'/>
In this toy example we'll first take on the role of the 'traffic gods' and decree that commute duration is defined through a linear mixture of the two independent variables. Later we'll sample from the distribution defined by these variables and generate a training dataset. This sampling procedure will be analogous to keeping a journal of all of our commutes for some [ long ] period of time, where each log entry consists of a set of  
* <b>X</b>: [ time-of-departure, weather-condition ], and the associated  
* <b>Y</b>: [ commute-duration ].

<img src='figures/x_y_mapping.PNG' width='900'/>

Given such a journal [dataset], we'll split it into training (75%) and testing (25%) subsets which we'll use to train and evaulate our model respectively. Specifically, we'll build a neural network model whose weights are initially randomly initialized, but are trained/updated as we stream the training data through (via the backpropagation learning algorithm). Each update will get us closer to having a model that has learned the relationship between X and Y or ([ time-of-departure, weather-condition ] to [ commute-duration] ).

<img src='figures/process.PNG' width='800'/>

During the training process we'll try to visualize the network's behavior by asking it to predict all the entries in our logbook using its current parameters/weights. As the training process unfolds, you should be able to see how the network adapts itself to the target surface/function that we determined for the commute duration.

<img src='figures/training_progress.PNG' width='700'/>

# Determine underlying relationship
We'll start by esablishing (as traffic gods) the relationships between:  
* 1) the time a commute starts (time-of-departure variable) and commute-duration
* 2) the weather when a commute is started (weather-condtion variable) and commute-duration

Note that as data scientists we never get to see this function, but we try to learn it from data.

In [4]:
# define data coordinates
xRange = [0,10]; 
yRange = [0,10]; numSteps = 100

x, y = np.meshgrid( np.linspace(xRange[0], xRange[1], numSteps),
                    np.linspace(yRange[0], yRange[1], numSteps), indexing='ij' )

def normalize_domain (x):
    x = x + np.abs(np.min(x))
    x = x / (np.max(x) + .001)
    return x
    
# define 1D relationships to target
xComponent = np.sin( x ) * 4
yComponent = np.exp( y / 4 )

# define 2D joint distribution
z = xComponent + yComponent
z = normalize_domain(z)


# plot independent variables
plt.figure( figsize = ( 7, 7) )
plt.subplots_adjust( left = 0.1, right = 0.9, top = 0.9, bottom = 0.1, wspace = 0.2 )
plt.subplot(2,1,1); plt.plot(normalize_domain(xComponent[:,0])); plt.xlabel('time-of-day'), plt.ylabel('commute duration')
plt.xticks([]), plt.yticks([])
plt.subplot(2,1,2); plt.plot(normalize_domain(yComponent[0,:])); plt.xlabel('weather [ severity ]'), plt.ylabel('commute duration')
plt.xticks([]), plt.yticks([])

# plot target [dependent] variable
plt.figure( figsize = (9, 9) )
plt.subplots_adjust( left = 0.1, right = 0.9, top = 0.9, bottom = 0.1 )
ax = plt.subplot(1,1,1, projection='3d');
ax.plot_surface ( x[0::1], y[0::1], z[0::1], color = 'blue', alpha = 1, antialiased = False )
ax.set_xlabel('time of day')
ax.set_ylabel('severity of weather')
ax.set_zlabel('commute length')
plt.show()



<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Generate  dataset
Lets generate a dataset by randomly sampling from the target distribution [ with some noise ].

In [5]:
NSamples = 5000
noiseScaling = 1/8.

gpu_device=mx.gpu()

shuffledDataIndsX = np.random.randint(x.shape[0], size=(NSamples,1))
shuffledDataIndsY = np.random.randint(y.shape[0], size=(NSamples,1))

trainData = np.zeros( ( NSamples, 2 ) )
targetValues = np.zeros( (NSamples, 1 ))
noiseAmount = noiseScaling * ( np.random.rand(NSamples) - .5 )

for iSample in range (NSamples):
    trainData[iSample, 0] = x[ shuffledDataIndsX[iSample], 0 ]
    trainData[iSample, 1] = y[ 0, shuffledDataIndsY[iSample] ]    
    targetValues[iSample] = z[ shuffledDataIndsX[iSample], shuffledDataIndsY[iSample] ] + noiseAmount[iSample]

trainDataGPU = mx.nd.array(trainData, ctx=gpu_device)
targetValuesGPU = mx.nd.array(targetValues, ctx=gpu_device)

# Plot dataset samples (red dots) overlayed onto target distribution (blue)

In [6]:
def plot_3D_data (k3dPlot):
    zScaling = 5

    offset = np.hstack( ( np.ones((trainData.shape[0], 1)) * -5, 
                          np.ones((trainData.shape[0], 1)) + 4, 
                          np.zeros((trainData.shape[0], 1)) ) ) * np.abs(xRange[1]-xRange[0])

    k3dPlot += K3D.points ( np.hstack( ( trainData, targetValues*zScaling) ) + offset, color=0xFF0000, point_size = .2, shader = 'flat' )
    k3dPlot += K3D.surface ( z*zScaling, color=0x0055FF, xmin=np.min(trainData[:,0]+offset[::,0]), xmax=np.max(trainData[:,0]+offset[::,0]), ymin=np.min(trainData[:,1]+offset[::,1]), ymax=np.max(trainData[:,1]+offset[::,1]))
    
    return zScaling, offset

plot = K3D()
_, _ = plot_3D_data(plot)
plot.display()

K3D(parameters={'antialias': True, 'backgroundColor': 16777215, 'height': 512})

# Create Data Iterators

In [7]:
data = trainDataGPU
label = targetValuesGPU[:,0]

trainIterator = mx.io.NDArrayIter( data = data, label = label, 
                                   data_name = 'data', 
                                   label_name = 'linearOutput_label', batch_size = 256)

predictIterator = mx.io.NDArrayIter( data = data, label = label, 
                                     data_name = 'data', 
                                     label_name = 'linearOutput_label', batch_size = NSamples)


# Define model structure

In [8]:
import mxnet as mx

inputData = mx.sym.Variable('data')
targetLabelVar = mx.sym.Variable('linearOutput_label') 

layer1 = mx.sym.FullyConnected( data = inputData, name = 'fc1', num_hidden = 5)
layer1Activation = mx.sym.Activation( data = layer1, name = 'sig1', act_type = "sigmoid")

layer2 = mx.sym.FullyConnected( data = layer1Activation, name='fc2', num_hidden = 27)
layer2Activation = mx.sym.Activation( data = layer2, name='sig2', act_type = "sigmoid")

layer3 = mx.sym.FullyConnected( data = layer2Activation, name='fc3', num_hidden = 20)
layer3Activation = mx.sym.Activation( data = layer3, name='sig3', act_type = "sigmoid")

layer4 = mx.sym.FullyConnected( data = layer3Activation, name = 'fc4', num_hidden = 40)
layer4Activation = mx.sym.Activation( data = layer4, name = 'sig4', act_type = "sigmoid")

output = mx.sym.FullyConnected( data = layer4Activation, name='output', num_hidden=1)

loss = mx.sym.LinearRegressionOutput( data = output, label = targetLabelVar , name = 'linearOutput_label')

In [9]:
mod = mx.mod.Module(symbol = loss,
                    context = mx.gpu(0),
                    data_names = ['data'],
                    label_names = ['linearOutput_label'])

In [10]:
# allocate memory given the input data and label shapes
mod.bind( data_shapes = trainIterator.provide_data, label_shapes = trainIterator.provide_label )

In [11]:
# initialize parameters by uniform random numbers
mod.init_params( initializer = mx.init.Xavier(), force_init = True)

In [12]:
# use adam optimizer
mod.init_optimizer( optimizer = 'adam' )

In [13]:
# use root mean squared error as the metric
metric = mx.metric.create( 'rmse' )

# Visualize network structure

In [14]:
# mx.viz.plot_network( loss )

In [15]:
import importlib
import sys
sys.path.append('utils')
import nnViz_mxnet
importlib.reload(nnViz_mxnet)

<module 'nnViz_mxnet' from 'utils/nnViz_mxnet.py'>

In [16]:
nnViz_mxnet.visualize_model(mod)

<IPython.core.display.Javascript object>

Model structure [loosely] inspired by NVIDIA's new HQ ;]  
http://c.ymcdn.com/sites/aiascv.org/resource/resmgr/meeting_images/2017/March/Nv2.jpeg

# Train Model Loop [ no visualization ]

In [17]:
# initialize parameters by uniform random numbers
mod.init_params( initializer = mx.init.Xavier(), force_init = True)

In [None]:
import time
startTime = time.time()
for epoch in range(300):
    
    trainIterator.reset()
    metric.reset()
    
    for batch in trainIterator:
        
        mod.forward( batch, is_train = True )       # compute predictions
        mod.update_metric( metric, batch.label )    # accumulate prediction accuracy
        mod.backward()                              # compute gradients
        mod.update()                                # update parameters
    
    print('Epoch %d, Training %s' % (epoch, metric.get()))
elapsedTime = time.time() - startTime
print(elapsedTime)

# Train Model + Visualize

In [17]:
# initialize parameters by uniform random numbers
mod.init_params( initializer = mx.init.Xavier(), force_init = True)

In [18]:

NEpochs = 350
displayUpdateInterval = 10


currentNN = {}
plotCount = 0

xOffset = np.zeros( (trainData.shape[0], 1))
yOffset = np.zeros( (trainData.shape[0], 1))
zOffset = np.zeros( (trainData.shape[0], 1))

xModifier = 1 * np.abs(xRange[1]-xRange[0])*1.2; yModifier = 0; zModifier = 0; 
plotCount = 0

evalLoss = np.empty((NEpochs))
evalLoss[:] = np.NaN

predictIterator.reset()
nextIterData = predictIterator.next()
batchInputs = nextIterData.data[0].asnumpy()

# 3D plot
plot = K3D()
zScaling, offset = plot_3D_data(plot)
plot.display()

# train 5 epochs, i.e. going over the data iter one pass
for iEpoch in range(NEpochs):
    
    trainIterator.reset()
    metric.reset()
    
    for batch in trainIterator:
        mod.forward( batch, is_train = True )     # compute predictions
        mod.update_metric( metric, batch.label )  # accumulate prediction accuracy
        mod.backward()                            # compute gradients
        mod.update()                              # update parameters
    
    evalLoss[iEpoch] = metric.get()[1]
    print('Epoch: %d, Training Loss: %s' % ( iEpoch, evalLoss[iEpoch] ))
    
    
    # plotting 
    if iEpoch % displayUpdateInterval == 0:
        
        mod.forward( nextIterData )
        
        currentNN[plotCount] = mod.get_outputs()[0].asnumpy()

        comboOffset = np.hstack( (xOffset + xModifier, yOffset + yModifier, zOffset + zModifier) )
        plot += K3D.points ( np.hstack( ( batchInputs, currentNN[plotCount] * zScaling) ) + comboOffset + offset, color=0xA9A9FF, point_size = .2, shader = 'flat' )        
        plot += K3D.text2d ( str( round( evalLoss[iEpoch], 4 )), comboOffset + offset + (0, 0, 3), color=0xff00ff, size=.5, reference_point='rb')
        
        
        plotCount += 1
        if plotCount % 8 == 0:
            xModifier = 1 * np.abs(xRange[1]-xRange[0])*1.2
            yModifier -= 1 * np.abs(yRange[1]-yRange[0])*1.2
        else:
            xModifier += 1 * np.abs(xRange[1]-xRange[0])*1.2

K3D(parameters={'antialias': True, 'backgroundColor': 16777215, 'height': 512})

Epoch: 0, Training Loss: 0.275697046518
Epoch: 1, Training Loss: 0.219039503485
Epoch: 2, Training Loss: 0.218044106662
Epoch: 3, Training Loss: 0.21097278744
Epoch: 4, Training Loss: 0.195589874685
Epoch: 5, Training Loss: 0.176874124259
Epoch: 6, Training Loss: 0.167491799593
Epoch: 7, Training Loss: 0.160059154779
Epoch: 8, Training Loss: 0.155063860118
Epoch: 9, Training Loss: 0.151865970343
Epoch: 10, Training Loss: 0.150664042681
Epoch: 11, Training Loss: 0.150333135575
Epoch: 12, Training Loss: 0.149954502285
Epoch: 13, Training Loss: 0.149535343796
Epoch: 14, Training Loss: 0.149266503006
Epoch: 15, Training Loss: 0.149105380476
Epoch: 16, Training Loss: 0.148975628614
Epoch: 17, Training Loss: 0.148851662874
Epoch: 18, Training Loss: 0.148711494356
Epoch: 19, Training Loss: 0.148505029827
Epoch: 20, Training Loss: 0.14814953059
Epoch: 21, Training Loss: 0.147606745362
Epoch: 22, Training Loss: 0.147014928609
Epoch: 23, Training Loss: 0.146504098177
Epoch: 24, Training Loss: 0.

Epoch: 200, Training Loss: 0.0464431788772
Epoch: 201, Training Loss: 0.0463520566002
Epoch: 202, Training Loss: 0.04626299683
Epoch: 203, Training Loss: 0.0461759457365
Epoch: 204, Training Loss: 0.0460908243433
Epoch: 205, Training Loss: 0.0460075745359
Epoch: 206, Training Loss: 0.0459263533354
Epoch: 207, Training Loss: 0.0458467880264
Epoch: 208, Training Loss: 0.0457690963522
Epoch: 209, Training Loss: 0.0456930521876
Epoch: 210, Training Loss: 0.0456185992807
Epoch: 211, Training Loss: 0.0455458180979
Epoch: 212, Training Loss: 0.0454743409529
Epoch: 213, Training Loss: 0.0454045124352
Epoch: 214, Training Loss: 0.0453360004351
Epoch: 215, Training Loss: 0.0452688539401
Epoch: 216, Training Loss: 0.0452031107619
Epoch: 217, Training Loss: 0.0451387409121
Epoch: 218, Training Loss: 0.0450755557045
Epoch: 219, Training Loss: 0.0450135977939
Epoch: 220, Training Loss: 0.044952741079
Epoch: 221, Training Loss: 0.0448931589723
Epoch: 222, Training Loss: 0.0448346653953
Epoch: 223, Tr

In [19]:
plt.figure()
plt.plot(evalLoss[:], 'b')
plt.plot(evalLoss[:], 'or')
plt.xlabel('epochs')
plt.ylabel('error')

<IPython.core.display.Javascript object>

Text(0,0.5,'error')

# Plot Predictions Before Training

In [20]:
plot = K3D()
plot += K3D.points ( np.hstack( ( batchInputs, currentNN[0]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x888888, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

K3D(parameters={'antialias': True, 'backgroundColor': 16777215, 'height': 512})

# Plot Predictions Midway Through Training

In [25]:
plot = K3D()
plot += K3D.points ( np.hstack( ( batchInputs, currentNN[int(plotCount/2)]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x888888, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

K3D(parameters={'antialias': True, 'backgroundColor': 16777215, 'height': 512})

# Plot Predictions at End of Training

In [26]:
plot = K3D()
plot += K3D.points ( np.hstack( ( batchInputs, currentNN[int(plotCount-1)]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x888888, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

K3D(parameters={'antialias': True, 'backgroundColor': 16777215, 'height': 512})

## thanks!