# Build the model

This cell uses model, layer and optimizer modules. The model class has a couple of interesting things to mention, There are 2 cost functions. Cost function calculates a cost with softmax applied to the final layer, there are also cross entropy with and without L2 regularization. Logit cost function calculates the cost without applying a softmax function. Cross entropy will not work without a softmax function (negative values are not defined), so there are only quadratic cost functions. 

Layers are added to the model by appending to the layers variable. Weight decay and bias decay are the L2 regularization constants. Setting weight init to xavier will use Xavier initialization, setting a small constant will use the constant as the standard deviation of a truncated normal distribution, and setting a numpy array will initialize the weights using the array. 

Adamopt here is the tensorflow implementation of Adam optimizer, the wrapper module is just for a standard API because i made another optimizer for a different experiment. Using the tensorflow one would work exactly the same.

In [1]:
import tensorflow as tf
import numpy as np
from parana.Model import Model
from parana.Layers import fc_layer
from parana.Layers import softmax_layer
from parana.optimizer import optimizer
from parana.optimizer import adamopt
from parana.Layers import conv_layer
from parana.parameter_saver import saver
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True, reshape=False)
from IPython.display import clear_output
from parana.parameter_pruning import sparse_lobotomizer
from parana.sparse_selection import get_k_min
from parana.sparse_wiggles import get_mean_activations
from parana.sparse_wiggles import get_abs_values
path = 'c:/users/jim/tensorflowtrials/'
import pickle

class this_model(Model):
    
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels
        self.cost_function = 'cross_entropy_l2'
        self.logit_cost_function = 'quadratic_l2'
        self.dropout = 0.7
        self.layers = [conv_layer(inputs = inputs,
                                 height = 7, 
                                 width = 7, 
                                 filters = 12, 
                                 padding = 4, 
                                 stride = 1,
                                 flatten = False,
                                 weight_init = 'xavier',
                                 weight_decay=0.0001, bias_decay=0.0001)]
        self.layers.append(conv_layer(inputs = self.layers[0].activate,
                                 height = 5, 
                                 width = 5, 
                                 filters = 15, 
                                 padding = 2, 
                                 stride = 1,
                                 flatten = False,
                                 weight_init = 'xavier',
                                 weight_decay=0.0001, bias_decay=0.0001))
        self.layers.append(conv_layer(inputs = self.layers[1].activate,
                                 height = 3, 
                                 width = 3, 
                                 filters = 25, 
                                 padding = 1, 
                                 stride = 1,
                                 flatten = False,
                                 weight_init = 'xavier',
                                 weight_decay=0.0001, bias_decay=0.0001))
        self.layers.append(conv_layer(inputs = self.layers[2].activate,
                                 height = 3, 
                                 width = 3, 
                                 filters = 20, 
                                 padding = 1, 
                                 stride = 1,
                                 flatten = True,
                                 weight_init = 'xavier',
                                 weight_decay=0.0001, bias_decay=0.0001))
        self.layers.append(fc_layer(inputs = self.layers[3].activate,
                                   size = 1500,
                                   weight_init = 'xavier',
                                   weight_decay=0.0001, bias_decay=0.0001))
        self.layers.append(fc_layer(inputs=self.layers[4].activate, 
                               weight_init = 'xavier',
                               size=500, 
                               weight_decay=0.0001, bias_decay=0.0001))
        self.layers.append(softmax_layer(inputs=self.layers[5].activate, 
                                          size=10, 
                                          weight_decay=0.0001, bias_decay=0.0001))

X = tf.placeholder('float', [None, 28, 28, 1], name = 'Inputs')
y = tf.placeholder('float', [None, 10], name = 'Labels')   

noise = tf.Variable(tf.zeros([28,28,1]), name='x_noise')
set_zero_noise = tf.assign(noise, tf.zeros([28,28,1]))
noise_placeholder = tf.placeholder('float', [28, 28,1], name = 'noise_placeholder')
assign_noise = tf.assign(noise, noise_placeholder)
X_noise = X + noise

sess = tf.Session()

mymodel = this_model(X_noise, y)

opt = adamopt(session = sess,
              learning_rate = 0.0001,
              cost_function = mymodel.cost, 
              model =  mymodel)
opt2 = adamopt(session = sess,
              learning_rate = 0.000005,
              cost_function = mymodel.cost, 
              model =  mymodel)
model_saver = saver(mymodel, sess)

sess.run(tf.global_variables_initializer())

  from ._conv import register_converters as _register_converters


Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


# Load the weights
Instead of training

In [2]:
path = 'c:/users/jim/tensorflowtrials/'
model_saver.load_parameters('{}cnn99.34%.p'.format(path))
print(model_saver.split_accuracy(session = sess,
                       stages=20,
                       inputs = mnist.test.images, 
                       labels = mnist.test.labels))

Parameters loaded from  c:/users/jim/tensorflowtrials/cnn99.34%.p
0.9934869766235351


# Decouple weights and sparsemodel

Sparse decouple weights unrolls each convolutional filter into the single row of a 2d matrix. These are stored as scipy csr matrices. It also assigns biases to each activation in a numpy array.

Sparsemodel defines a new model that uses python operations. Tensorflow does not support sparse operations, and the full arrays will not fit in memory. This will run much slower than on tensorflow, but performance improves when parameters are pruned from the model.

The input will now be a flat array, so i have loaded the tensorflow mnist loader again flattening the arrays.

In [3]:
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True, reshape=True)
mymodel.sparsemodel(mymodel.sparse_decouple_weights(sess))
sess.close()

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Close your tensorflow session, it is no longer useful


# Pruning by activation values here

The sparse_lobotomizer module is named because it removes weights (or connections) from the model, bad joke but the name stuck. There are 2 main functions that this performs, get_wigglyness gets a value for each parameter of the network and prune_step prunes a ratio of weights based on these values.

Get_mean_activations multiplies each weight value by its corresponding input for each data point(100 here), and takes the mean.

Get_k_min just gets the indices of the minimum valued parameters.

In [4]:
path = 'c:/users/jim/tensorflowtrials/'
activation_screwdriver = sparse_lobotomizer(model = mymodel,
                                 layers_list = mymodel.layers,
                                 wigglyness = get_mean_activations,
                                 iterations = 100,
                                 parameter_selection = get_k_min,
                                 data_function = mnist.train.next_batch)
#activation_screwdriver._arrays = pickle.load(open('{}activation_screwdriver.p'.format(path), 'rb'))
print('loaded')
activation_screwdriver.get_wigglyness(iterations = 1500)

1499


[<784x9408 sparse matrix of type '<class 'numpy.float64'>'
 	with 348684 stored elements in Compressed Sparse Row format>,
 <9408x11760 sparse matrix of type '<class 'numpy.float64'>'
 	with 3232080 stored elements in Compressed Sparse Row format>,
 <11760x19600 sparse matrix of type '<class 'numpy.float64'>'
 	with 2521500 stored elements in Compressed Sparse Row format>,
 <19600x15680 sparse matrix of type '<class 'numpy.float64'>'
 	with 3361920 stored elements in Compressed Sparse Row format>,
 <15680x1500 sparse matrix of type '<class 'numpy.float64'>'
 	with 23073000 stored elements in Compressed Sparse Row format>,
 <1500x500 sparse matrix of type '<class 'numpy.float64'>'
 	with 263000 stored elements in Compressed Sparse Row format>,
 <500x10 sparse matrix of type '<class 'numpy.float64'>'
 	with 4840 stored elements in Compressed Sparse Row format>]

# Save the parameter data
These models take a long time to work with, so I have saved the mean activation values for this model

In [5]:
pickle.dump(activation_screwdriver._arrays, open('{}activation_screwdriver99.34%.p'.format(path), 'wb'))

In [6]:
abs_values = sparse_lobotomizer(model = mymodel,
                                 layers_list = mymodel.layers,
                                 wigglyness = get_abs_values,
                                 iterations = 100,
                                 parameter_selection = get_k_min,
                                 data_function = mnist.train.next_batch)
abs_values.get_wigglyness()
print('gotit')

gotit


# Prune step

I have split the pruning for this model into 3 sections, convolution layers, fully connected layers, and softmax layer. Even though i am not applying a softmax function to the final layer, I am working on the assumption that being the final layer, and smaller than the others, it will behave differently to pruning. Models could be split further, or even pruned on an individual layer level, but this is very time consuming, especially when not run on a GPU. 

I set up a single script and run it several times with different parameters. The run time on this is directly proportional to the number of parameters, so to prune efficiently start with big steps then move on to a finer grain. Most experiments on this model and others that i have tested removing 90% of parameters has no real effect on performance, this means a 10X speed up for sparse_model. Things are going to scale differently for tensorflow or other GPU optimized frameworks. 

Checking accuracy with the full test set of 10,000 images is slow, so i test to start with on 1,000 after convolutional layers have been heavily pruned, things speed up

In [7]:
model_saver.store_sparse()

In [25]:
model_saver.restore_sparse()

In [15]:
import time
import pickle
path = 'c:/users/jim/tensorflowtrials/cnn_adv_noise/'
noisevector = pickle.load(open('{}adv_class_5_limit_0.3.p'.format(path), 'rb'))
noisevector = np.reshape(noisevector, -1)
inputs = mnist.test.images[:1000]
noisy_inputs = inputs + noisevector
labels = mnist.test.labels[:1000]
tic = time.time()
print (mymodel.sparse_accuracy(inputs, labels))
print('that took', time.time() - tic)
print (mymodel.sparse_accuracy(noisy_inputs, labels))
print('that took', time.time() - tic)


0.984
that took 9.984939575195312
0.226
that took 19.959836959838867


In [12]:
activation_screwdriver.prune_step(0.5, layers_list = mymodel.layers[:3])
print('PRunEd')

PRunEd


In [14]:
activation_screwdriver.prune_step(0.985, layers_list = [mymodel.layers[4]])
print('PRunEd')

PRunEd


In [15]:
abs_values.prune_step(0.9, layers_list = [mymodel.layers[4]])
print('prUNeD')

prUNeD
