### Introduction
In this notebook we will test out our custom quantized backprop code, and benchmark it against autograd, which is an automatic differentiation Python library.

In [1]:
import autograd.numpy as np

from autograd import grad

from mlp4autograd import mlp4autograd


In [2]:
def error(params):
    # inference error wrapper that we will use with autograd

    ref.infer(x_in, params)

    return ref.error(y_truth)

def ref_v_autograd_error(ref_grad, autograd_grad):
    # calculates mean percentage error of two calculated gradients, usually custom Python vs autograd

    ref_grad_w = ref_grad[0]
    ref_grad_b = ref_grad[1]

    autograd_grad_w = autograd_grad[0]
    autograd_grad_b = autograd_grad[1]

    N = len(ref_grad_w)

    delta_w = []
    delta_b = []

    for i in range(N):

        delta_w.append(np.average(np.absolute(np.divide(ref_grad_w[i] - autograd_grad_w[i], autograd_grad_w[i]))))

        delta_b.append(np.average(np.absolute(np.divide(ref_grad_b[i] - autograd_grad_b[i], autograd_grad_b[i]))))

    return (np.average(delta_w) + np.average(delta_b))/2

First we create an instant of our custom backprop module with floating point precision. We can use this to verify our implementation of the backprop algorithm.

In [3]:
ref = mlp4autograd()

# no quantization, [3,7,7,5] architecture
# initialize weights uniformly in [0,1]
ref.set_quant(quant = False)
ref.net_shape([3,7,7,5])
ref.init_params(mode="uniform")
params = [ref._weight, ref._bias]

# generate some random input
x_in = np.random.random((3,1))
ref.infer(x_in, params)

# let the truth be x_in with some small perturbation
# we want some small difference so gradient is nonzero but not too large
y_truth = ref._activation[-1] + np.random.random((5,1))

# calculate gradient according to truth
ref.grad(y_truth, False)

Next we calculate the gradient using the autograd library

In [4]:
# create the gradient function
gradient = grad(error)

# evaluate the gradient function using the same set of parameters
grad_eval = gradient(params)

Finally, we can compare our own gradient calculations with floating point precision against autograd's gradient. We expect exactly 0% error, because the calculations we use are exactly the same as autograd.

In [5]:
print("Percentage difference: " + "{:2.2%}".format(ref_v_autograd_error(ref.params_grad, grad_eval)))

Percentage difference: 0.00%


Next, we create a instance of the backprop module with (24,12) quantization. We will need to quantize the parameters, inputs, and outputs. Then we will evaluate the gradient with fixed point arithmetic.

In [6]:
D_BITS = 24
Q_BITS = 12

# (24,12) quantization, [3,7,7,5] architecture
ref_q = mlp4autograd()
ref_q.set_quant(D_BITS, Q_BITS, True)
ref_q.net_shape([3,7,7,5])

# quantize weights and biases
weight_q = []
bias_q = []

for w in ref._weight:
    weight_q.append(np.round(w * (2 ** Q_BITS)))

for b in ref._bias:
    bias_q.append(np.round(b * (2 ** Q_BITS)))

ref_q._weight = weight_q
ref_q._bias = bias_q

# quantize inputs and outputs
x_in_q = np.round(x_in * (2 ** Q_BITS))
y_truth_q = np.round(y_truth * (2 ** Q_BITS))

# perform inference and backprop
ref_q.infer(x_in_q, [weight_q, bias_q])
ref_q.grad(y_truth_q)

# de-quantize (fixed point back to floating point)
weight_grad_q = []
bias_grad_q = []

for wg in ref_q._weight_dif:
    weight_grad_q.append(np.divide(wg, (2 ** Q_BITS)))

for bg in ref_q._bias_dif:
    bias_grad_q.append(np.divide(bg, (2 ** Q_BITS)))

param_grad_q = [weight_grad_q, bias_grad_q]

Finally, we can calculate the mean percent error between fixed point gradient and autograd gradient. We should see mean percent errors less than 1 %.

In [7]:
print("Percentage difference: " + "{:2.2%}".format(ref_v_autograd_error(param_grad_q, grad_eval)))

Percentage difference: 0.25%
