# Sparana Demonstration

This is a short notebook to demonstrate how to train a model using my sparse training library sparana. It demonstrates training, pruning, and retraining of a sparse model. I will briefly explain some planned features that I have not implemented yet. 

### Dependencies, and training data

I have used numpy and cupy, these will need to be installed. I have used the tensorflow loader for the MNIST dataset because, it is there and easy to use. Not having cupy installed will throw an error.

## Create the model

I am trying to keep the code as clean as possible, cost functions and regularization is associated with training, so are defined with the optimizer function. Comp_type is set as either GPU to train on the graphics card using Cupy, or CPU using numpy, which I use for models that don't fit in GPU memory. 

mymodel.initialze_weights just initializes the weights using what is known as Xavier initialization, described here (http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf). I have also implemented different types of initializations, but Xavier initialization seems to work the best, and is the only one I tend to use. 

### Initialization ToDo:

I have not implemented an initialization of sparse matrices yet. I have pruned trained models by removing more than 90% of parameters without losing performance (So have many others), and I intend to test this by training on weight matrices that are initialized as this sparse to begin with. 

In [9]:
import numpy as np
import cupy as cp

from sparana.model import model
from sparana.layers import full_relu_layer
from sparana.layers import full_linear_layer
from sparana.optimizer import sgd_optimizer
from sparana.lobotomizer import lobotomizer
from sparana.optimizer import sgd_optimizer
from sparana.data_loader import loader
from sparana.lobotomizer import lobotomizer
path = 'c:/users/jim/tensorflowtrials'
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data/', one_hot = True)

mymodel = model(input_size = 784, 
                layers = [full_relu_layer(size = 1500), 
                          full_relu_layer(size = 1000),
                          full_relu_layer(size = 600),
                          full_linear_layer(size = 10)],
               comp_type = 'GPU')

mymodel.initialize_weights('Xavier', bias_constant = 0.1)

myloader = loader(mnist.train.images,
                 mnist.train.labels,
                 mnist.test.images,
                 mnist.test.labels)

opt = sgd_optimizer(mymodel, 0.0001, l2_constant = 0.0001)

lobo = lobotomizer(mymodel)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Initalizing GPU weights


# Loader, optimizer and lobotomizer

This loader just takes numpy arrays, and tracks how many minibatches, and epochs have been used, makes it easier for when I just re-run the same training loop. All data is stored in memory so this will not be fit for much larger datasets, but works fine for MNIST.

The optimizer is pretty basic, L2 regularization set here rather than with the layers.

Lobotomizer is the name I use for the module that analyses the parameters and prunes them.

In [10]:
myloader = loader(mnist.train.images,
                 mnist.train.labels,
                 mnist.test.images,
                 mnist.test.labels)

opt = sgd_optimizer(mymodel, learning_rate = 0.0001, l2_constant = 0.0001)

lobo = lobotomizer(mymodel)

# Training loop

The loader returns a tuple of inputs and labels, the optimizer just takes inputs and labels. Yes I know that using the loader instead of the tensorflow function for test data is redundant.

In [11]:
for i in range(100000):
    images, labels = myloader.minibatch(150)
    opt.train_step(images, labels)
    
print(mymodel.get_accuracy(myloader.test_data(), myloader.test_labels()))

0.9833


# Pruning

Here I prune the 80% of weights with the smallest absolute value. 80% is not a very big advantage, this is for a demonstration. I have only built an SGD optimizer at this stage, but have tested others, Adam optimizer seems to encorage sparsity, and allow models to be more heavily pruned. 98.3% is not anything special, but for a basic fully connected model trained using stochastic gradient descent, it is fine. 

In [12]:
lobo.get_absolute_values()
lobo.prune_smallest(0.8)

print(mymodel.get_accuracy(myloader.test_data(), myloader.test_labels()))

0.6696


# Converting the model and retraining

Training on sparse matrices is slower than on full matrices, but gets faster as more parameters are removed, removing 80% will still be slower than regular training. The performance drop is too much to be of any use in this example, but does let me show that a model can be trained on sparse parameters. 

In [13]:
mymodel.convert_to_sparse()

for i in range(5000):
    images, labels = myloader.minibatch(150)
    opt.train_step(images, labels)
    
print(mymodel.get_accuracy(myloader.test_data(), myloader.test_labels()))

Model is now sparse
0.9138


In [14]:
myloader.print_stats()

Epochs:  286
Minibatches:  105000


# More work to be done.

Well, from an experiment point of view, this run is a failure, 91% accuracy on MNIST for this much work is no good to anyone, more tweaking needs to be done. 

From a software point of view, the library is working as intended, re-training a pruned, sparse model over a limited number of parameters improved performance. 