## Notes
* This notebook has implementations of `Backprop` with binary spike feature vectors obtained from the `SPIKEFLOW`. Here, we will use tf.data API. Here the shape of the inputs is `[None, n_input]` instead of `[n_input, None]`. We had to do this because nested elements in `from_tensor_slices` must have the same dimension in 0th rank [see](https://stackoverflow.com/questions/49579684/what-is-the-difference-between-dataset-from-tensors-and-dataset-from-tensor-slic). Everytime an iterator `iter = dataset.make_initializable_iterator()` gets initialized, the dataset is randomly shuffled so we need not shuffle again, [see](https://stackoverflow.com/questions/49579684/what-is-the-difference-between-dataset-from-tensors-and-dataset-from-tensor-slic). We also use `z_3 = tf.floor(z_3)`. Surrogate gradients with one step is used. (one sided)

* Here, error in the hidden layer, $\delta^{2}$ is implemented as:
$ \delta^{2} = W^{3T}\delta^{(3)}\odot\sigma^{'}(z^{(2)}) \tag{1}$
* $\sigma^{'}(z^{(2)})$ is approximated with a surrogate. (See section 6)
* It also takes care of catastrophic forgetting by using synaptic intelligence.
## References
* [Neural Nets](http://neuralnetworksanddeeplearning.com/chap3.html)
* [Randombackprop](https://github.com/xuexue/randombp/blob/master/randombp.py)
* [Randombackprop](https://github.com/sangyi92/feedback_alignment/blob/master/RFA.ipynb)
* [Backprop](http://blog.aloni.org/posts/backprop-with-tensorflow/)
* [Initializers](https://towardsdatascience.com/hyper-parameters-in-action-part-ii-weight-initializers-35aee1a28404)
* [Dropout](https://github.com/pinae/TensorFlow-MNIST-example/blob/master/fully-connected.py)
* [Softmax](https://stackoverflow.com/questions/34240703/what-is-logits-softmax-and-softmax-cross-entropy-with-logits)
* [SoftmaxLogits](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
* [TF memory leaks when  assigning in loop](https://github.com/tensorflow/tensorflow/issues/4151)

In [1]:
import os, time
#os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import matplotlib.pyplot as plt
import matplotlib as mpl
import tensorflow as tf
from IPython.display import display, HTML
tf.compat.v2.random.set_seed(0)
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.client import timeline
import h5py, pickle
from keras.utils.np_utils import to_categorical 
import numpy as np
import pandas as pd
import MNIST_Loader
import seaborn as sb
import theano, random, sys, time

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
print(tf.__version__)
tf

1.14.0


<module 'tensorflow' from '/home/ruthvik/.local/lib/python2.7/site-packages/tensorflow/__init__.pyc'>

## Hide code

In [None]:
HTML('''<script>
code_show=true; 
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')

## Plotting chars

### Small 

In [3]:
mpl.rcParams['figure.figsize'] = 3.75,3
mpl.rcParams['axes.titlesize'] = 12
mpl.rcParams['axes.labelsize'] = 12
mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.markersize'] = 10
mpl.rcParams['xtick.labelsize'] = 12
mpl.rcParams['ytick.labelsize'] = 12
mpl.rcParams['legend.fontsize'] = 12

### Large

mpl.rcParams['figure.figsize'] = 15,10
mpl.rcParams['axes.titlesize'] = 24
mpl.rcParams['axes.labelsize'] = 25
mpl.rcParams['lines.linewidth'] = 2
mpl.rcParams['lines.markersize'] = 10
mpl.rcParams['xtick.labelsize'] = 30
mpl.rcParams['ytick.labelsize'] = 22
mpl.rcParams['legend.fontsize'] = 25

## Load the data and separate set1

In [4]:
filename = '../../spiking_networks/train_pool1_spike_features_inh_False_conv1maps_30.h5'
with h5py.File(filename, 'r') as hf:
    emnist_train_images = hf['pool1_spike_features'][:].astype(np.int8)
emnist_train_images[np.where(emnist_train_images>=1)] = 1

filehandle = open('../../spiking_networks/train_y.pkl','rb')
emnist_train_labels = pickle.load(filehandle).astype(np.int).tolist()
filehandle.close()
emnist_train_labels = np.array(emnist_train_labels)
print('Total train features:{}'.format(emnist_train_images.shape[0]))

#### LOAD TEST IMAGES AND LABELS
filename = '../../spiking_networks/test_pool1_spike_features_inh_False_conv1maps_30.h5'
with h5py.File(filename, 'r') as hf:
    emnist_test_images = hf['pool1_spike_features'][:].astype(np.int8)
emnist_test_images[np.where(emnist_test_images>=1)] = 1
print('Total test features:{}'.format(emnist_test_images.shape[0]))

filehandle = open('../../spiking_networks/test_y.pkl','rb')
emnist_test_labels = pickle.load(filehandle).astype(np.int)
filehandle.close()
emnist_test_labels = np.array(emnist_test_labels)

#### LOAD TRAIN AND VALIDATION DATA AND LABELS
train_images = emnist_train_images
train_labels = emnist_train_labels
train_labels = np.array(train_labels)
test_images = emnist_test_images
test_labels = emnist_test_labels
num_classes=10

Total train features:60000
Total test features:10000


In [5]:
BATCH_SIZE = 10
#### EXTRACT REQUIRED LOCATIONS OF 0 TO 5 FOR TRAIN DATA
def extract_class_data(start=0, stop=1):
    set1_locs = np.where((train_labels>=start) & (train_labels<=stop))[0]
    train_labels_set1 = to_categorical(train_labels[set1_locs], num_classes=num_classes)
    train_images_set1 = train_images[set1_locs,:]
    n_images = len(train_images_set1)

    #### EXTRACT REQUIRED LOCATIONS OF 0 TO 5 FOR TEST DATA
    set1_locs = np.where((test_labels>=start) & (test_labels<=stop))[0]
    test_labels_set1 = to_categorical(test_labels[set1_locs], num_classes=num_classes)
    test_images_set1 = test_images[set1_locs,:]
    print('Test features:{}'.format(test_images_set1.shape))
    print('Length of test labels:{}'.format(test_labels_set1.shape[0]))
    test_data_set1 = (test_images_set1, test_labels_set1)
    


    train_images_set1 = train_images_set1[int(0.09*n_images):]
    train_labels_set1 = train_labels_set1[int(0.09*n_images):]
    print('Train features:{}'.format(train_images_set1.shape))
    print('Length of train labels:{}'.format(train_labels_set1.shape[0]))
    train_data_set1 = (train_images_set1, train_labels_set1)

    valid_labels_set1 = train_labels_set1[0:int(0.09*n_images)]
    valid_images_set1 = train_images_set1[0:int(0.09*n_images)]
    print('Valid features:{}'.format(valid_images_set1.shape))
    print('Length of valid labels:{}'.format(valid_labels_set1.shape[0]))
    valid_data_set1 = (valid_images_set1, valid_labels_set1)
    
    n_train_set1 = train_labels_set1.shape[0]
    n_test_set1 = test_labels_set1.shape[0]
    n_valid_set1 = valid_labels_set1.shape[0]

    return train_data_set1, valid_data_set1, test_data_set1

## Start a session

In [6]:
tf.reset_default_graph()
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess= tf.InteractiveSession(config=config)
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()

$tanh(x) = \frac{(e^{x} – e^{-x})}{(e^{x} + e^{-x})}$

$d\frac{tanh(x)}{dx} = 1 – (tanh(x))^{2}$

$\sigma(x) = \frac{1.0}{1 + e^{-x}}$

$d\frac{\sigma(x)}{dx} = \sigma(x)*(1 - \sigma(x))$

## Setup the network graph

In [7]:
n_input = 3630
n_middle = 1024
n_out = 10
batch_size = tf.placeholder(tf.int64, name='batch_size') 
a_1 = tf.placeholder(tf.float32, [None, n_input], name = 'Input_batch')
y = tf.placeholder(tf.float32, [None, n_out], name = 'output_batch')
dataset = tf.data.Dataset.from_tensor_slices((a_1, y))
#dataset = dataset.shuffle(buffer_size=len(all_train_labels), reshuffle_each_iteration=True)
dataset = dataset.batch(batch_size)
iter = dataset.make_initializable_iterator()
features, labels = iter.get_next()

drop_out = tf.placeholder(tf.float32)
tau = tf.placeholder(tf.float32)
set1_mask = tf.placeholder(tf.float32, [10], name='mask')
eta = tf.placeholder(tf.float32)
n_tot = tf.placeholder(tf.float32)
lmbda = tf.placeholder(tf.float32, name='lambda')
with tf.name_scope('hid_lyr_w_b'):  ###havier or glorot initialization
    low = -4*tf.math.sqrt(6.0/(n_input + n_middle)) # use 4 for sigmoid, 1 for tanh activation 
    high = 4*tf.math.sqrt(6.0/(n_input + n_middle))
    
    low = -tf.math.sqrt(2.0/(n_input)) # use 4 for sigmoid, 1 for tanh activation 
    high = tf.math.sqrt(2.0/(n_input))
    
    w_2 = tf.Variable(tf.random_uniform(shape=[n_input,n_middle],minval=low,maxval=high), name = 'W_2')
    #w_2 = tf.Variable(tf.truncated_normal(shape=[n_input,n_middle], stddev=0.01),name = 'W_2')
    tf.summary.histogram('w_2', w_2)
    b_2 = tf.Variable(tf.zeros([1,n_middle]), name = 'b_2')
    tf.summary.histogram('b_2', b_2)
    
    w2_grad_accum = tf.Variable(np.zeros(shape=[n_input,n_middle], dtype=np.float32), name='w2_grad_accum')
    b2_grad_accum = tf.Variable(np.zeros(shape=[1,n_middle], dtype=np.float32), name='b2_grad_accum')
    
    big_omeg_w2 = tf.Variable(np.zeros(shape=[n_input,n_middle], dtype=np.float32), name='omeg_w2')
    tf.summary.histogram('big_omeg_w2', big_omeg_w2)
    big_omeg_b2 = tf.Variable(np.zeros(shape=[1,n_middle], dtype=np.float32), name='omeg_b2')
    tf.summary.histogram('big_omeg_b2', big_omeg_b2)
    
    star_w2 = tf.Variable(np.zeros(shape=[n_input,n_middle], dtype=np.float32), name='star_w2')
    star_b2 = tf.Variable(np.zeros(shape=[1,n_middle], dtype=np.float32), name='star_b2')
with tf.name_scope('op_lyr_w_b'):
    
    low = -tf.math.sqrt(2.0/(n_middle))
    high = tf.math.sqrt(2.0/(n_middle))
    w_3 = tf.Variable(tf.random_uniform(shape=[n_middle,10],minval=low,maxval=high), name = 'W_3')
    #w_3 = tf.Variable(tf.truncated_normal(shape=[n_middle,n_out], stddev=0.01),name = 'W_3')
    tf.summary.histogram('w_3', w_3)
    b_3 = tf.Variable(tf.zeros([1,n_out]), name = 'b_3')
    tf.summary.histogram('b_3', b_3)
    
    w3_grad_accum = tf.Variable(np.zeros(shape=[n_middle,n_out], dtype=np.float32), name='w3_grad_accum')
    b3_grad_accum = tf.Variable(np.zeros(shape=[1,n_out], dtype=np.float32), name='b3_grad_accum')
    
    big_omeg_w3 = tf.Variable(np.zeros(shape=[n_middle,n_out], dtype=np.float32), name='omeg_w3')
    tf.summary.histogram('big_omeg_w3', big_omeg_w3)
    big_omeg_b3 = tf.Variable(np.zeros(shape=[1,n_out], dtype=np.float32), name='omeg_b3')
    tf.summary.histogram('big_omeg_b3', big_omeg_b3)
    
    star_w3 = tf.Variable(np.zeros(shape=[n_middle,n_out], dtype=np.float32), name='star_w3')
    star_b3 = tf.Variable(np.zeros(shape=[1,n_out], dtype=np.float32), name='star_b3')

def sigma(x):
    return tf.math.divide(tf.constant(1.0),
                  tf.add(tf.constant(1.0), tf.exp(tf.negative(x))))
def tanh(x):
    return tf.math.divide(tf.subtract(tf.exp(x), tf.exp(tf.negative(x))), 
                          tf.add(tf.exp(x), tf.exp(tf.negative(x))) )

def sigmaprime(x):
    return tf.multiply(sigma(x), tf.subtract(tf.constant(1.0), sigma(x)))

def tanhprime(x):
    return tf.subtract(tf.constant(1.0),tf.square(tanh(x)))

def spkNeuron(x):
    return tf.where(tf.greater_equal(x,0.0), tf.ones_like(x), 
                        tf.zeros_like(x))

def ReLU(x):
    return tf.maximum(0.0, x)

def ReLUprime(x):
    return tf.where(tf.greater_equal(x,0.0), tf.ones_like(x), 
                        tf.zeros_like(x))

def spkPrime1(x):
    l1_bound_higher = tf.greater_equal(x,-0/4)
    r1_bound_lesser = tf.less_equal(x,tau/4) 
    grad_one = tf.where(tf.logical_and(l1_bound_higher,r1_bound_lesser), tf.ones_like(x), tf.zeros_like(x))
    return grad_one

def firstLyrSpks(x):
    return tf.where(tf.greater_equal(x,1.0), tf.ones_like(x), 
                        tf.zeros_like(x))

    
with tf.name_scope('hid_lyr_acti'):
    z_2 = tf.add(tf.matmul(features,w_2,name = 'w_2xa_1'), b_2, name = 'z_2')
    locs_to_drop = tf.random.categorical(tf.math.log([[1.0-drop_out, drop_out]]), tf.size(z_2))
    locs_to_drop = tf.reshape(locs_to_drop, tf.shape(z_2))
    z_2 = tf.where(locs_to_drop>0,-tf.ones_like(z_2),z_2, 'drop_out_app')
    tf.summary.histogram('z_2', z_2)
    #@a_2 = ReLU(z_2)
    a_2 = spkNeuron(z_2)
    tf.summary.histogram('a_2', a_2)
with tf.name_scope('op_lyr_acti'):
    z_3 = tf.add(tf.matmul(a_2,w_3, name = 'w_3xa_2'),b_3, name = 'z_3')
    z_3 = tf.floor(z_3)
    #z_3 = tf.subtract(tf.reduce_max(z_3),z_3, name = 'inhibition')
    tf.summary.histogram('z_3', z_3)
    #@a_3  = sigma(z_3) ##UNCOMMENT THIS LINE AND COMMENT ABOVE LINE IF YOU WANT spike SQUISHING
    a_3 = tf.cast(tf.nn.softmax(z_3,axis=1), tf.float32)
    a_3 = tf.multiply(a_3, set1_mask, name='masking')
    tf.summary.histogram('a_3', a_3)
    ##COMMENT THE ABOVE LINE AND UNCOMMENT BELOW LINE IF YOU WANT SOFTMAX
    #a_3 = tf.nn.softmax(z_3,axis=1) ##AXIS IS VERY IMPORTANT!!! axis=1 INDICATES THE CLASSES AS y IS [None,10]

#cost = tf.reduce_mean(-tf.reduce_sum((y*tf.log(a_3) +tf.log(1-a_3)*(1-y)) ,axis=0), name = 'cost_calc') WORKS, USE BELOW
with tf.name_scope('cost_calc'):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels,logits=z_3,axis=1),
                          name = 'cost_calc')#WORKS
        ##COMMENT BELOW LINES IF YOU WANT quadratic
    #@dc_da = tf.multiply(-tf.subtract(labels,a_3, name = 'y_minus_a_3'), mask)
    #@cost = tf.reduce_mean(tf.reduce_sum((1/2.0)*tf.square(dc_da),axis=1), name = 'cost_calc')
    tf.summary.scalar('cost', cost)

with tf.name_scope('op_lyr_grad'):
    #@d_z_3 = tf.multiply(-tf.subtract(labels,a_3, name = 'delta3'), mask, name='masking')
    d_z_3 = -tf.subtract(labels,a_3, name = 'delta3')
    #d_z_3 = tf.multiply(dc_da,a_3, name = 'delta3')
    d_b_3 = tf.expand_dims(tf.reduce_mean(d_z_3, axis=[0]), axis=0)
    tf.summary.histogram('d_b_3', d_b_3)
    d_w_3 = tf.multiply(1/tf.cast(batch_size, tf.float32),
                        tf.matmul(tf.transpose(a_2),d_z_3), 
                        name='delta_w3')
    tf.summary.histogram('d_w_3', d_w_3)
    
with tf.name_scope('hid_lyr_grad'):
    #@d_z_2 = tf.multiply(tf.matmul(d_z_3,tf.transpose(w_3), name = 'w_3Txdelta3'), ReLUprime(z_2),
    #@                    name = 'delta2')
    d_z_2 = tf.multiply(tf.matmul(d_z_3,tf.transpose(w_3), name = 'w_3Txdelta3'), spkPrime1(z_2),
                        name = 'delta2')
    #d_z_2 = tf.matmul(d_z_3,tf.transpose(w_3), name = 'delta2')
    d_b_2 = tf.expand_dims(tf.reduce_mean(d_z_2, axis=[0]), axis=0)
    tf.summary.histogram('d_b_2', d_b_2)
    d_w_2 = tf.multiply(1/tf.cast(batch_size, tf.float32),
                        tf.matmul(tf.transpose(features),d_z_2), 
                        name='delta_w2')
    tf.summary.histogram('d_w_2', d_w_2)
    
omega_step=[tf.assign(w2_grad_accum,
                      tf.add(w2_grad_accum,tf.multiply(eta*eta*lmbda/n_tot, tf.square(d_w_2))),
                     name='update_omeg_w2'),
            tf.assign(b2_grad_accum,
                      tf.add(b2_grad_accum,tf.multiply(eta*eta*lmbda/n_tot,tf.square(d_b_2))),
                     name='update_omeg_b2'),
            
            tf.assign(w3_grad_accum,
                      tf.add(w3_grad_accum,tf.multiply(eta*eta*lmbda/n_tot,tf.square(d_w_3))),
                     name='update_omeg_w3'),
            tf.assign(b3_grad_accum, 
                      tf.add(b3_grad_accum,tf.multiply(eta*eta*lmbda/n_tot,tf.square(d_b_3))),
                     name='update_omeg_b3')
]

step = [tf.assign(w_2,
                  tf.subtract(w_2, (eta*d_w_2+big_omeg_w2*(w_2-star_w2))),name='update_w_2'),
        tf.assign(b_2,
                  tf.subtract(b_2,(eta*d_b_2+big_omeg_b2*(b_2-star_b2))),name='update_b_2'),
        
        tf.assign(w_3,
                  tf.subtract(w_3, (eta*d_w_3+big_omeg_w3*(w_3-star_w3))),name='update_w_3'),
        tf.assign(b_3,
                  tf.subtract(b_3,(eta*d_b_3+big_omeg_b3*(b_3-star_b3))),name='update_b_3')    
]
with tf.name_scope('acc_calc'):
    predictions = tf.argmax(a_3, 1)
    acct_mat = tf.equal(tf.argmax(a_3, 1), tf.argmax(labels, 1))
    acct_res = tf.reduce_mean(tf.cast(acct_mat, tf.float32))
    tf.summary.scalar('accuracy', acct_res)

init_op = tf.global_variables_initializer()

## Init the writer with SI

In [8]:
path = '/home/ruthvik/Desktop/Summer 2017/tf_graph_outputs/mnist/continual_learning/original_mnist_5sets'
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(path + '/mh_spike_3lyrs_he1', sess.graph)

In [9]:
#INITIALIZE THE NETWORK
sess.run(init_op,options=run_options, run_metadata=run_metadata)
zeta = 1e-3
new_big_omeg_w2 = np.zeros(shape=[n_input,n_middle], dtype=np.float32)
new_big_omeg_b2 = np.zeros(shape=[1,n_middle], dtype=np.float32)
new_big_omeg_w3 = np.zeros(shape=[n_middle,n_out], dtype=np.float32)
new_big_omeg_b3 = np.zeros(shape=[1,n_out], dtype=np.float32)

reset_w2_grad_accum = np.zeros(shape=[n_input,n_middle], dtype=np.float32)
reset_b2_grad_accum = np.zeros(shape=[1,n_middle], dtype=np.float32)
reset_w3_grad_accum = np.zeros(shape=[n_middle,n_out], dtype=np.float32)
reset_b3_grad_accum = np.zeros(shape=[1,n_out], dtype=np.float32)
    
start_w2 = None
start_b2 = None
start_w3 = None
start_b3 = None

end_w2 = None
end_b2 = None
end_w3 = None
end_b3 = None

old_test_data = []
historical_cross_test_acc = {}
historical_train_accuracies = {}
historical_train_costs = {}
historical_val_accuracies = {}
historical_val_costs = {}
sets = [(0,1), (2,3), (4,5), (6,7), (8,9)]
#sets = [(0,4),(5,9)]
test_labels_set = []
logging_count = 0
n_test_samples = []
for a_set in range(len(sets)):
    current_set = sets[a_set]
    current_set_name = 'set'+str(a_set)
    mask_val = [0]*num_classes
    for i in range(current_set[0], current_set[1]+1):
        mask_val[i]=1
    set_mask_val = np.array(mask_val, dtype=np.float32)
    train_data_set, valid_data_set, test_data_set = extract_class_data(start=current_set[0],
                                                                  stop=current_set[1])
    train_images_set, train_labels_set = train_data_set[0], train_data_set[1]
    valid_images_set, valid_labels_set = valid_data_set[0], valid_data_set[1]
    test_images_set, test_labels_set = test_data_set[0], test_data_set[1]
    n_test_samples.append(len(test_labels_set))
    train_total = len(train_images_set)
    n_batches = len(train_images_set)/BATCH_SIZE
    print('Number of batches:{}'.format(n_batches))


    set_omegas = [tf.assign(big_omeg_w2, new_big_omeg_w2), tf.assign(big_omeg_b2, new_big_omeg_b2), 
                  tf.assign(big_omeg_w3, new_big_omeg_w3), tf.assign(big_omeg_b3, new_big_omeg_b3)]
    sess.run(set_omegas)
    
    reset_grad_accums = [tf.assign(w2_grad_accum, reset_w2_grad_accum),
                         tf.assign(b2_grad_accum, reset_b2_grad_accum),
                         tf.assign(w3_grad_accum, reset_w3_grad_accum),
                         tf.assign(b3_grad_accum, reset_b3_grad_accum)]
    sess.run(reset_grad_accums)
                                                                                  
    epochs = 60
    repeats = 1
    
    for repeat in range(repeats):
        tf.set_random_seed(repeat)
        print('Repeat:{}'.format(repeat))
        train_accuracies = []
        train_costs = []
        val_accuracies = []
        val_costs = []
        best_val = 0
        first_params_set = None
        last_params_set = None
        T1 = time.time()
        for i in range(epochs):
            if(i==0):
                start_w2, start_b2, start_w3, start_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                                  batch_size: len(train_images_set)})
            print('Epoch:{}'.format((i)))
            t1 = time.time()

            ### CALCULATE TRAIN COSTS AND TRAIN ACCURACIES
            train_cost, train_accuracy = sess.run([cost, acct_res] ,feed_dict = {drop_out : 0.0, 
                                                                                 set1_mask:set_mask_val})
            train_costs.append(train_cost)
            train_accuracies.append(train_accuracy)
            #train_writer.add_summary(summary,logging_count)

            print('training cost:{} and training accuracy:{}'.format(train_costs[i], train_accuracies[i]))

            ### CALCULATE VALID COSTS AND VALID ACCURACIES
            sess.run(iter.initializer, feed_dict={a_1: valid_images_set, y: valid_labels_set,
                                                  batch_size: len(valid_images_set)})
            _, _, val_acc, val_cost, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                  feed_dict = {drop_out : 0.0,set1_mask:set_mask_val})
            val_costs.append(val_cost)
            val_accuracies.append(val_acc)

            if(val_acc>best_val):
                best_val = val_acc
                best_params_set1 = [(w_2.eval(),b_2.eval()),(w_3.eval(),b_3.eval())]
            print('validation cost:{} and validation accuracy:{}'.format(val_cost, val_acc))   
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                                  batch_size: BATCH_SIZE})
            
            for d in range(len(old_test_data)):
                previous_set_name = 'set'+str(d)           
                prev_set = sets[d]
                prev_mask_val = [0]*num_classes
                for clas in range(prev_set[0], prev_set[1]+1):
                    prev_mask_val[clas]=1
                prev_set_mask_val = np.array(prev_mask_val, dtype=np.float32)
                sess.run(iter.initializer, feed_dict={a_1: old_test_data[d][0], y: old_test_data[d][1],
                                                  batch_size: len(old_test_data[d][0])})
                _, _, hist_test_acc, _, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                      feed_dict = {drop_out : 0.0,set1_mask:prev_set_mask_val})
                
                print('Testing accuracy on :{} while training :{} is :{}'.format(previous_set_name,
                                                                          current_set_name,
                                                                          hist_test_acc))
                if(current_set_name+'-'+previous_set_name in historical_cross_test_acc.keys()):
                    historical_cross_test_acc[current_set_name+'-'+previous_set_name].append(hist_test_acc)
                else:
                    historical_cross_test_acc[current_set_name+'-'+previous_set_name] = [hist_test_acc]
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                              batch_size: BATCH_SIZE})
            print('Training on :{}'.format(current_set))
            for j in range(n_batches):
                
                if(not (np.isnan(w_2.eval().any() and np.isnan(w_3.eval()).any()))):
                    #if(a_set==1):
                    #    print(j, w_2.eval().sum(), w_3.eval().sum())
                    if(((j)% 1000 ==0)):
                        logging_count+=1
                        summary,_,_ = sess.run([merged,step, omega_step], 
                                             feed_dict = {drop_out:0.5,batch_size: BATCH_SIZE, tau:0.5,
                                                          set1_mask:set_mask_val, eta:0.001,
                                                          lmbda:1.0e4,n_tot:train_total})
                        #train_writer.add_summary(summary, (i+1)*j)
                        train_writer.add_summary(summary, logging_count)
                    else:
                        sess.run([step, omega_step], feed_dict = {drop_out:0.5,batch_size: BATCH_SIZE, tau:0.5,
                                                                 set1_mask:set_mask_val, eta:0.001,
                                                                  lmbda:1.0e4,n_tot:train_total})
                else:
                    print('Nan encountered in epoch:{} and batch:{}'.format(i, j))
            print('Epoch time:{}'.format(time.time()-t1))


        sess.run(iter.initializer, feed_dict={a_1: test_images_set, y: test_labels_set,
                                                  batch_size: len(test_images_set)})
        _,final_test_acc,_ = sess.run([predictions, acct_res, a_3], 
                                                              feed_dict = {drop_out:0.0, 
                                                                           set1_mask:set_mask_val})
        print('Final test accuracy is:{}'.format(final_test_acc))
        end_w2, end_b2, end_w3, end_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
        update_star_wbs = [tf.assign(star_w2,end_w2),tf.assign(star_b2,end_b2),tf.assign(star_w3,end_w3),
                          tf.assign(star_b3,end_b3)]
        sess.run(update_star_wbs)
        #all_final_test_accs_set1.append(final_test_acc)


        best_step = [tf.assign(w_2,best_params_set1[0][0]), tf.assign(b_2,best_params_set1[0][1]),
                     tf.assign(w_3,best_params_set1[1][0]),tf.assign(b_3,best_params_set1[1][1])]
        sess.run(best_step)
        sess.run(iter.initializer, feed_dict={a_1: test_images_set, y: test_labels_set,
                                                  batch_size: len(test_images_set)})
        _,test_acc_corresp_best_val,_ = sess.run([predictions, acct_res, a_3],
                                                 feed_dict = {drop_out:0.0,set1_mask:set_mask_val})

        print('Test accuracy corresp to best val acc:{}'.format(test_acc_corresp_best_val))
        print('Time taken:{}'.format(time.time()-T1))
        if(i==epochs-1):
            if(test_acc_corresp_best_val>final_test_acc):
                end_w2, end_b2, end_w3, end_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
                #all_final_test_accs_set1[-1] = test_acc_corresp_best_val
                update_star_wbs = [tf.assign(star_w2,end_w2),tf.assign(star_b2,end_b2),tf.assign(star_w3,end_w3),
                          tf.assign(star_b3,end_b3)]
                sess.run(update_star_wbs)
            
            best_step = [tf.assign(w_2,end_w2), tf.assign(b_2,end_b2),
                     tf.assign(w_3,end_w3),tf.assign(b_3,end_b3)]
            sess.run(best_step)
            
            first_params_set = [(start_w2, start_b2), (start_w3, start_b3)]
            last_params_set = [(end_w2, end_b2), (end_w3, end_b3)]
            
            small_omegas = [(w2_grad_accum.eval(), b2_grad_accum.eval()), (w3_grad_accum.eval(),
                           b3_grad_accum.eval())]
            
            delta_ws = map(lambda x,y: np.square(x-y)+zeta,[item[0] for item in last_params_set],
                       [item[0] for item in first_params_set])
            
            delta_bs = map(lambda x,y: np.square(x-y)+zeta,[item[1] for item in last_params_set],
                       [item[1] for item in first_params_set])
            delta_wbs = zip(delta_ws, delta_bs)
            
            big_omegas_ws = map(lambda x,y: (x/y),[item[0] for item in small_omegas],
                       [item[0] for item in delta_wbs])
            
            big_omegas_bs = map(lambda x,y: (x/y),[item[1] for item in small_omegas],
                       [item[1] for item in delta_wbs])
            
            big_omegas = zip(big_omegas_ws, big_omegas_bs)
            if(a_set != len(sets)-1):     
                new_big_omeg_w2 += big_omegas[0][0]
                new_big_omeg_b2 += big_omegas[0][1]
                new_big_omeg_w3 += big_omegas[1][0]
                new_big_omeg_b3 += big_omegas[1][1]
            
            for d in range(len(old_test_data)):
                previous_set_name = 'set'+str(d)
                prev_set = sets[d]
                prev_mask_val = [0]*num_classes
                for clas in range(prev_set[0], prev_set[1]+1):
                    prev_mask_val[clas]=1
                prev_set_mask_val = np.array(prev_mask_val, dtype=np.float32)
                sess.run(iter.initializer, feed_dict={a_1: old_test_data[d][0], y: old_test_data[d][1],
                                                  batch_size: len(old_test_data[d][0])})
                _, _, hist_test_acc, _, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                      feed_dict = {drop_out : 0.0,set1_mask:prev_set_mask_val})
                
                historical_cross_test_acc[current_set_name+'-'+previous_set_name].append(hist_test_acc)
                print('Testing accuracy on :{} after training :{} is :{}'.format(previous_set_name,
                                                                          current_set_name,
                                                                          hist_test_acc))
                historical_cross_test_acc[current_set_name+'-'+current_set_name]=[test_acc_corresp_best_val]
            old_test_data.append(test_data_set)
            print('omegW2-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_w2.max(),
                                                            new_big_omeg_w2.mean(),
                                                            new_big_omeg_w2.std()))
            print('omegb2-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_b2.max(),
                                                            new_big_omeg_b2.mean(),
                                                            new_big_omeg_b2.std()))
            print('omegW3-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_w3.max(),
                                                            new_big_omeg_w3.mean(),
                                                            new_big_omeg_w3.std()))
            print('omegb3-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_b3.max(),
                                                            new_big_omeg_b3.mean(),
                                                            new_big_omeg_b3.std()))
            #sys.exit()
    historical_train_accuracies[current_set_name]=train_accuracies
    historical_train_costs[current_set_name]=train_costs
    historical_val_accuracies[current_set_name]=val_accuracies
    historical_val_costs[current_set_name]=val_costs
            
            
train_writer.close()
#valid_writer.close()

Test features:(2115, 3630)
Length of test labels:2115
Train features:(11526, 3630)
Length of train labels:11526
Valid features:(1139, 3630)
Length of valid labels:1139
Number of batches:1152
Repeat:0
Epoch:0
training cost:2.18432164192 and training accuracy:0.473104298115
validation cost:2.18295145035 and validation accuracy:0.467954337597
Training on :(0, 1)
Epoch time:45.1382958889
Epoch:1
training cost:0.017300978303 and training accuracy:0.99514144659
validation cost:0.0186073817313 and validation accuracy:0.994732201099
Training on :(0, 1)


KeyboardInterrupt: 

### Examining final accuracy

In [None]:
set_accs = []
for i in range(0,num_classes/2):
    set_acc =  historical_cross_test_acc['set4-set'+str(i)][-1]*100
    print('Accuracy on set {}:{} after training set {}:{} is:{}'.format(i, sets[i],\
                                    4, sets[2], set_acc))
    set_accs.append(set_acc)
n_test_samples = np.array(n_test_samples)
final_acc = (n_test_samples*set_accs).sum()/n_test_samples.sum()
print('Final accuracy on all sets:{}'.format(final_acc))

### Some histograms

#### W_2

In [None]:
plt.hist(w_2.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_W_2

In [None]:
plt.hist(new_big_omeg_w2.flatten(),bins=100,log=True)
#plt.yscale('log')
plt.grid()
plt.show()

#### b_2

In [None]:
plt.hist(b_2.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_b_2

In [None]:
plt.hist(new_big_omeg_b2.flatten(),100,log=True)
plt.grid()
plt.show()

#### W_3

In [None]:
plt.hist(w_3.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_W_3

In [None]:
plt.hist(new_big_omeg_w3.flatten(),100,log=True)
plt.grid()
plt.show()

#### b_3

In [None]:
plt.hist(b_3.eval().flatten(), 10)
plt.grid()
plt.show()

#### Omega_b_3

In [None]:
plt.hist(new_big_omeg_b3.flatten(),10)
plt.grid()
plt.show()

## Init the writer without SI

In [None]:
path = '/home/ruthvik/Desktop/Summer 2017/tf_graph_outputs/mnist/continual_learning/original_mnist_5sets'
merged = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(path + '/mh_spike_without_SI_3lyrs_trunc1', sess.graph)

In [None]:
#INITIALIZE THE NETWORK
sess.run(init_op,options=run_options, run_metadata=run_metadata)
zeta = 1e-3
new_big_omeg_w2 = np.zeros(shape=[n_input,n_middle], dtype=np.float32)
new_big_omeg_b2 = np.zeros(shape=[1,n_middle], dtype=np.float32)
new_big_omeg_w3 = np.zeros(shape=[n_middle,n_out], dtype=np.float32)
new_big_omeg_b3 = np.zeros(shape=[1,n_out], dtype=np.float32)

reset_w2_grad_accum = np.zeros(shape=[n_input,n_middle], dtype=np.float32)
reset_b2_grad_accum = np.zeros(shape=[1,n_middle], dtype=np.float32)
reset_w3_grad_accum = np.zeros(shape=[n_middle,n_out], dtype=np.float32)
reset_b3_grad_accum = np.zeros(shape=[1,n_out], dtype=np.float32)
    
start_w2 = None
start_b2 = None
start_w3 = None
start_b3 = None

end_w2 = None
end_b2 = None
end_w3 = None
end_b3 = None

old_test_data = []
historical_cross_test_acc = {}
historical_train_accuracies = {}
historical_train_costs = {}
historical_val_accuracies = {}
historical_val_costs = {}
sets = [(0,1), (2,3), (4,5), (6,7), (8,9)]
#sets = [(0,4),(5,9)]
test_labels_set = []
logging_count = 0
n_test_samples = []
for a_set in range(len(sets)):
    current_set = sets[a_set]
    current_set_name = 'set'+str(a_set)
    mask_val = [0]*num_classes
    for i in range(current_set[0], current_set[1]+1):
        mask_val[i]=1
    set_mask_val = np.array(mask_val, dtype=np.float32)
    train_data_set, valid_data_set, test_data_set = extract_class_data(start=current_set[0],
                                                                  stop=current_set[1])
    train_images_set, train_labels_set = train_data_set[0], train_data_set[1]
    valid_images_set, valid_labels_set = valid_data_set[0], valid_data_set[1]
    test_images_set, test_labels_set = test_data_set[0], test_data_set[1]
    n_test_samples.append(len(test_labels_set))
    train_total = len(train_images_set)
    n_batches = len(train_images_set)/BATCH_SIZE
    print('Number of batches:{}'.format(n_batches))


    set_omegas = [tf.assign(big_omeg_w2, new_big_omeg_w2), tf.assign(big_omeg_b2, new_big_omeg_b2), 
                  tf.assign(big_omeg_w3, new_big_omeg_w3), tf.assign(big_omeg_b3, new_big_omeg_b3)]
    sess.run(set_omegas)
    
    reset_grad_accums = [tf.assign(w2_grad_accum, reset_w2_grad_accum),
                         tf.assign(b2_grad_accum, reset_b2_grad_accum),
                         tf.assign(w3_grad_accum, reset_w3_grad_accum),
                         tf.assign(b3_grad_accum, reset_b3_grad_accum)]
    sess.run(reset_grad_accums)
                                                                                  
    epochs = 60
    repeats = 1
    
    for repeat in range(repeats):
        tf.set_random_seed(repeat)
        print('Repeat:{}'.format(repeat))
        train_accuracies = []
        train_costs = []
        val_accuracies = []
        val_costs = []
        best_val = 0
        first_params_set = None
        last_params_set = None
        T1 = time.time()
        for i in range(epochs):
            if(i==0):
                start_w2, start_b2, start_w3, start_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                                  batch_size: len(train_images_set)})
            print('Epoch:{}'.format((i)))
            t1 = time.time()

            ### CALCULATE TRAIN COSTS AND TRAIN ACCURACIES
            train_cost, train_accuracy = sess.run([cost, acct_res] ,feed_dict = {drop_out : 0.0, 
                                                                                 set1_mask:set_mask_val})
            train_costs.append(train_cost)
            train_accuracies.append(train_accuracy)
            #train_writer.add_summary(summary,logging_count)

            print('training cost:{} and training accuracy:{}'.format(train_costs[i], train_accuracies[i]))

            ### CALCULATE VALID COSTS AND VALID ACCURACIES
            sess.run(iter.initializer, feed_dict={a_1: valid_images_set, y: valid_labels_set,
                                                  batch_size: len(valid_images_set)})
            _, _, val_acc, val_cost, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                  feed_dict = {drop_out : 0.0,set1_mask:set_mask_val})
            val_costs.append(val_cost)
            val_accuracies.append(val_acc)

            if(val_acc>best_val):
                best_val = val_acc
                best_params_set1 = [(w_2.eval(),b_2.eval()),(w_3.eval(),b_3.eval())]
            print('validation cost:{} and validation accuracy:{}'.format(val_cost, val_acc))   
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                                  batch_size: BATCH_SIZE})
            
            for d in range(len(old_test_data)):
                previous_set_name = 'set'+str(d)           
                prev_set = sets[d]
                prev_mask_val = [0]*num_classes
                for clas in range(prev_set[0], prev_set[1]+1):
                    prev_mask_val[clas]=1
                prev_set_mask_val = np.array(prev_mask_val, dtype=np.float32)
                sess.run(iter.initializer, feed_dict={a_1: old_test_data[d][0], y: old_test_data[d][1],
                                                  batch_size: len(old_test_data[d][0])})
                _, _, hist_test_acc, _, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                      feed_dict = {drop_out : 0.0,set1_mask:prev_set_mask_val})
                
                print('Testing accuracy on :{} while training :{} is :{}'.format(previous_set_name,
                                                                          current_set_name,
                                                                          hist_test_acc))
                if(current_set_name+'-'+previous_set_name in historical_cross_test_acc.keys()):
                    historical_cross_test_acc[current_set_name+'-'+previous_set_name].append(hist_test_acc)
                else:
                    historical_cross_test_acc[current_set_name+'-'+previous_set_name] = [hist_test_acc]
            sess.run(iter.initializer, feed_dict={a_1: train_images_set, y: train_labels_set,
                                              batch_size: BATCH_SIZE})
            print('Training on :{}'.format(current_set))
            for j in range(n_batches):
                
                if(not (np.isnan(w_2.eval().any() and np.isnan(w_3.eval()).any()))):
                    #if(a_set==1):
                    #    print(j, w_2.eval().sum(), w_3.eval().sum())
                    if(((j)% 1000 ==0)):
                        logging_count+=1
                        summary,_,_ = sess.run([merged,step, omega_step], 
                                             feed_dict = {drop_out:0.5,batch_size: BATCH_SIZE, tau:0.5,
                                                          set1_mask:set_mask_val, eta:0.001,
                                                          lmbda:0.0e4,n_tot:train_total})
                        #train_writer.add_summary(summary, (i+1)*j)
                        train_writer.add_summary(summary, logging_count)
                    else:
                        sess.run([step, omega_step], feed_dict = {drop_out:0.5,batch_size: BATCH_SIZE, tau:0.5,
                                                                 set1_mask:set_mask_val, eta:0.001,
                                                                  lmbda:0.0e4,n_tot:train_total})
                else:
                    print('Nan encountered in epoch:{} and batch:{}'.format(i, j))
            print('Epoch time:{}'.format(time.time()-t1))


        sess.run(iter.initializer, feed_dict={a_1: test_images_set, y: test_labels_set,
                                                  batch_size: len(test_images_set)})
        _,final_test_acc,_ = sess.run([predictions, acct_res, a_3], 
                                                              feed_dict = {drop_out:0.0, 
                                                                           set1_mask:set_mask_val})
        print('Final test accuracy is:{}'.format(final_test_acc))
        end_w2, end_b2, end_w3, end_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
        update_star_wbs = [tf.assign(star_w2,end_w2),tf.assign(star_b2,end_b2),tf.assign(star_w3,end_w3),
                          tf.assign(star_b3,end_b3)]
        sess.run(update_star_wbs)
        #all_final_test_accs_set1.append(final_test_acc)


        best_step = [tf.assign(w_2,best_params_set1[0][0]), tf.assign(b_2,best_params_set1[0][1]),
                     tf.assign(w_3,best_params_set1[1][0]),tf.assign(b_3,best_params_set1[1][1])]
        sess.run(best_step)
        sess.run(iter.initializer, feed_dict={a_1: test_images_set, y: test_labels_set,
                                                  batch_size: len(test_images_set)})
        _,test_acc_corresp_best_val,_ = sess.run([predictions, acct_res, a_3],
                                                 feed_dict = {drop_out:0.0,set1_mask:set_mask_val})

        print('Test accuracy corresp to best val acc:{}'.format(test_acc_corresp_best_val))
        print('Time taken:{}'.format(time.time()-T1))
        if(i==epochs-1):
            if(test_acc_corresp_best_val>final_test_acc):
                end_w2, end_b2, end_w3, end_b3 = w_2.eval(), b_2.eval(), w_3.eval(), b_3.eval()
                #all_final_test_accs_set1[-1] = test_acc_corresp_best_val
                update_star_wbs = [tf.assign(star_w2,end_w2),tf.assign(star_b2,end_b2),tf.assign(star_w3,end_w3),
                          tf.assign(star_b3,end_b3)]
                sess.run(update_star_wbs)
            
            best_step = [tf.assign(w_2,end_w2), tf.assign(b_2,end_b2),
                     tf.assign(w_3,end_w3),tf.assign(b_3,end_b3)]
            sess.run(best_step)
            
            first_params_set = [(start_w2, start_b2), (start_w3, start_b3)]
            last_params_set = [(end_w2, end_b2), (end_w3, end_b3)]
            
            small_omegas = [(w2_grad_accum.eval(), b2_grad_accum.eval()), (w3_grad_accum.eval(),
                           b3_grad_accum.eval())]
            
            delta_ws = map(lambda x,y: np.square(x-y)+zeta,[item[0] for item in last_params_set],
                       [item[0] for item in first_params_set])
            
            delta_bs = map(lambda x,y: np.square(x-y)+zeta,[item[1] for item in last_params_set],
                       [item[1] for item in first_params_set])
            delta_wbs = zip(delta_ws, delta_bs)
            
            big_omegas_ws = map(lambda x,y: (x/y),[item[0] for item in small_omegas],
                       [item[0] for item in delta_wbs])
            
            big_omegas_bs = map(lambda x,y: (x/y),[item[1] for item in small_omegas],
                       [item[1] for item in delta_wbs])
            
            big_omegas = zip(big_omegas_ws, big_omegas_bs)
            if(a_set != len(sets)-1):     
                new_big_omeg_w2 += big_omegas[0][0]
                new_big_omeg_b2 += big_omegas[0][1]
                new_big_omeg_w3 += big_omegas[1][0]
                new_big_omeg_b3 += big_omegas[1][1]
            
            for d in range(len(old_test_data)):
                previous_set_name = 'set'+str(d)
                prev_set = sets[d]
                prev_mask_val = [0]*num_classes
                for clas in range(prev_set[0], prev_set[1]+1):
                    prev_mask_val[clas]=1
                prev_set_mask_val = np.array(prev_mask_val, dtype=np.float32)
                sess.run(iter.initializer, feed_dict={a_1: old_test_data[d][0], y: old_test_data[d][1],
                                                  batch_size: len(old_test_data[d][0])})
                _, _, hist_test_acc, _, _ = sess.run([predictions,acct_mat,acct_res, cost, a_3],
                                                      feed_dict = {drop_out : 0.0,set1_mask:prev_set_mask_val})
                
                historical_cross_test_acc[current_set_name+'-'+previous_set_name].append(hist_test_acc)
                print('Testing accuracy on :{} after training :{} is :{}'.format(previous_set_name,
                                                                          current_set_name,
                                                                          hist_test_acc))
                historical_cross_test_acc[current_set_name+'-'+current_set_name]=[test_acc_corresp_best_val]
            old_test_data.append(test_data_set)
            print('omegW2-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_w2.max(),
                                                            new_big_omeg_w2.mean(),
                                                            new_big_omeg_w2.std()))
            print('omegb2-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_b2.max(),
                                                            new_big_omeg_b2.mean(),
                                                            new_big_omeg_b2.std()))
            print('omegW3-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_w3.max(),
                                                            new_big_omeg_w3.mean(),
                                                            new_big_omeg_w3.std()))
            print('omegb3-MAXIMUM:{},MEAN:{},STD:{}'.format(new_big_omeg_b3.max(),
                                                            new_big_omeg_b3.mean(),
                                                            new_big_omeg_b3.std()))
            #sys.exit()
    historical_train_accuracies[current_set_name]=train_accuracies
    historical_train_costs[current_set_name]=train_costs
    historical_val_accuracies[current_set_name]=val_accuracies
    historical_val_costs[current_set_name]=val_costs
            
            
train_writer.close()
#valid_writer.close()

### Examining final accuracy

In [None]:
set_accs = []
for i in range(0,num_classes/2):
    set_acc =  historical_cross_test_acc['set4-set'+str(i)][-1]*100
    print('Accuracy on set {}:{} after training set {}:{} is:{}'.format(i, sets[i],\
                                    4, sets[2], set_acc))
    set_accs.append(set_acc)
n_test_samples = np.array(n_test_samples)
final_acc = (n_test_samples*set_accs).sum()/n_test_samples.sum()
print('Final accuracy on all sets:{}'.format(final_acc))

### Some histograms

#### W_2

In [None]:
plt.hist(w_2.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_w2

In [None]:
plt.hist(new_big_omeg_w2.flatten(),bins=100,log=True)
#plt.yscale('log')
plt.grid()
plt.show()

#### b_2

In [None]:
plt.hist(b_2.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_b2

In [None]:
plt.hist(new_big_omeg_b2.flatten(),100,log=True)
plt.grid()
plt.show()

#### W_3

In [None]:
plt.hist(w_3.eval().flatten(), 100)
plt.grid()
plt.show()

#### Omega_w3

In [None]:
plt.hist(new_big_omeg_w3.flatten(),100)
plt.show()

#### b_3

In [None]:
plt.hist(b_3.eval().flatten(), 10)
plt.grid()
plt.show()

#### Omega_b3

In [None]:
plt.hist(new_big_omeg_b3.flatten(),10)
plt.grid()
plt.show()