**<font color='red'>DISCLAIMER</font>: WARNING: This notebook takes 5 hours to run on a high power machine rented from Amazon.**

**The first part of our story notebook describes the process we went through to design our Neural Network library. Because of the flexibility we wanted to have in assembling our new NN architectures in any combination of many different design choices, we had to code everything from scratch. The novelty of our varied specialty architectures, which are readily constructed by using the suite of functions in our library, is a significant part of what our final project is about. Our library file, NN.py, is submitted along with the rest of our final project files, and is imported in this story notebook as NN.** 

**The second part of this story notebook picks up at the point where our library is fully developed and we start running our final models on different data sets to compare them to our standard Feed Forward Neural Network.**



In [1]:
from NN import * #import our Neural Network library

**We started testing our simple children FFNN on a toy problem with a very simple rule whereby the a single feature determines the class of an example out of 300 randomly generated binary-valued features.** 

In [2]:
#test our children FFNN's with simple toy problem
x_train = np.random.randint(2, size=(5000, 300))
y_train = x_train[:,-2:-1]

x_test = np.random.randint(2, size=(5000, 300))
y_test = x_test[:,-2:-1]

In [3]:
epochs = 2
alpha = 0.1
layer_sizes = [x_train.shape[1], 3, 1]
reg = 0
validation_frac = 0.1

w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)

In [4]:
print('child FFNN accuracy on toy problem:', acc)

child FFNN accuracy on toy problem: 1.0


**We then decided to test the famous xor problem, for which it was proven that a NN could only understand the xor rule with a hidden layer. Our results below show that the minimum single hidden layer of 2 neurons (to cover the 4 possible combos of xor) is enough for our NN to reach 100% accuracy, while regardless of number of training examples or training epochs, the simple NN without hidden layer can never learn the rule and reach 100% accuracy.**

In [5]:
def xor(a, b):
    if a != b:
        return 1
    else:
        return 0 

#test our children FFNN's with simple toy problem
x_train = np.random.randint(2, size=(500, 2))
y_train = np.array([xor(ex[0], ex[1]) for ex in x_train])

x_test = np.random.randint(2, size=(500, 2))
y_test = np.array([xor(ex[0], ex[1]) for ex in x_test])

y_train = np.reshape(y_train, (500, 1))
y_test = np.reshape(y_test, (500, 1))

In [6]:
# test one FFNN without hidden layer, and one with a single hidden layer of only 2 neurons
layer_sizes_lst = [[x_train.shape[1], 1], [x_train.shape[1], 2, 1]]
epochs = 50
alpha = 0.1
reg = 0
validation_frac = 0.1

w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes_lst[0], alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes_lst[0])]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print("xor without hidden layer:", acc)

w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes_lst[1], alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes_lst[1])]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print("xor with a single hidden layer of 2 neurons:", acc)

xor without hidden layer: 0.492
xor with a single hidden layer of 2 neurons: 1.0


**As a small but interesting derail, we decide to explore our NN's ability to derive the xor rule with "noise features" added; so we add features to our original 2-feature data set. Below, we see that even when adding 18 random noise features, our NN is still able to infer the xor rule given enough examples and training epochs.**

In [7]:
#add 'noise' features to x_train
x_train = np.random.randint(2, size=(5000, 20))
y_train = np.array([xor(ex[0], ex[1]) for ex in x_train])

x_test = np.random.randint(2, size=(5000, 20))
y_test = np.array([xor(ex[0], ex[1]) for ex in x_test])

y_train = np.reshape(y_train, (5000, 1))
y_test = np.reshape(y_test, (5000, 1))

In [8]:
layer_sizes = [x_train.shape[1], 10, 1]
epochs = 20
alpha = 0.1
reg = 0
validation_frac = 0.1

w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print('xor with 18 "noise" features:', acc)

xor with 18 "noise" features: 1.0


**After we were confident that our basic FFNN implementation was working correctly, we created started implementing the design options we wanted to have; namely choice of feature or class specialization, stripping or non-stripping of layers in the children NN's, whether or not to add virgin layers in the parent NN, whether or not to make the weights mutable, and the option of selective backpropagation for fully connected layers.**

**Here we test our child spawner function, which allows us to generate the children NN that will comprise the final synthesizer parent NN, with different parameters. Our idea of specializing learning in sub Neural Networks before synthesis in a parent NN initially focused on the features of a data set, but we thought it would be interesting to see what results we could get with class specialization with our design option. While One Versus Rest already exists, our design parameters allow us to make very different models, for example by removing the output layers of the children NN's, having the weights in certain layers of the parent NN stay mutable or immutable (or even withing a specific layer), adding virgin synthesis layers above the original output layers, and any combination of all those options.**

**After playing around with different data sets (the Iris data set, a banking data set, a poker hands data set...), we chose to focus our efforts on the MNIST data set for class specialization, and on the Abalone data set for feature specialization in our children NN's. Iris had too few examples and reached maximum accuracy too soon for the results to be significant; the features in the poker hands data set were too similar...ultimately we felt that MNIST and Abalone alleviated those concerns best.**

**Other lower-level design options we implemented were regularization, and whether or not to balance our data sets to have an even number of examples per class. This took an interesting meaning in the context of what we were trying to do. In the case in which we specialize our children NN's on classes, we realized that we were effectively reducing the problem to a 2-class classification problem, namely the class we were training the child NN on, and all the other classes merged into a second class. This had the effect of creating a strong imbalance whereby even if we had balanced the original data set (which we did), the training of the children was imbalanced at 9 to 1 for example in the case of the 10 classes of the MNIST data set. We gave ourselves the option to balance the data set for that case, and realized through experimentation that for big enough data sets, the value of having many training examples more than made up for the disadvantage of having an imbalanced data set.**

**Another low-level design feature we implemented was to have part of the dataset be a validation data set that we could use to test our NN's after each epoch of training. This allowed us to track what the optimal number of training epochs was for each child (specialized in a single class or feature).**

**Below we test a couple of models on both the balanced and imbalanced data sets for class specialization. After finding that imbalanced data gives better prediction accuracy because of the much greater number of examples despite the imbalance, we go on to only use the imbalanced data set to run our later models.**

In [10]:
'''Test Child Spawner Using MNIST'''

dataset = 'mnist'
x_train, y_train, x_test, y_test = preprocess_mldata(dataset, .7)

In [11]:
'''BALANCE TEST: Create Children that are Specialized by Class and with balanced training datasets'''
balance = 1
alpha = 0.1
reg = 0
specialize_by = 'class'
children_hidden_layers = [20]
epochs = 20
validation_frac = .1

w_children, b_children, acc_children, ep_accs, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, 
                                    specialize_by, balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracies are:', acc_children)
print('best epochs for each child are:', best_epochs)


children accuracies are: [0.9887137482737273, 0.9878565645983142, 0.977713224439259, 0.9700938139911425, 0.9800942902042954, 0.9713795895042621, 0.983856374113053, 0.979475213105386, 0.9661888661364827, 0.9728558502785847]
best epochs for each child are: [8, 8, 10, 14, 6, 4, 7, 3, 17, 17]


In [12]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [200]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 20

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9554740701938188
validation test accuracy per epoch: [ 0.95306122  0.95326531  0.95408163  0.95530612  0.95530612  0.95591837
  0.95612245  0.95653061  0.95693878  0.95632653  0.95653061  0.95591837
  0.95591837  0.95612245  0.95571429  0.95591837  0.9555102   0.9555102
  0.95530612  0.9555102 ]
Best Epoch: 8


In [13]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, no interconnections, mutable weights'''
virgin_hidden_layers = [20]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 1
layer_weights_rand = [0, 0, 1]
epochs = 30

ac6, er6, acs6, e6 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.961569598552312
validation test accuracy per epoch: [ 0.95265306  0.95591837  0.95714286  0.95857143  0.95714286  0.95979592
  0.95795918  0.95795918  0.95857143  0.96244898  0.96163265  0.96285714
  0.96204082  0.96        0.9622449   0.9622449   0.9622449   0.96244898
  0.96367347  0.96285714  0.96367347  0.96326531  0.96408163  0.96306122
  0.96367347  0.96285714  0.96285714  0.96285714  0.96163265  0.96204082]
Best Epoch: 22


In [14]:
'''IMBALANCE TEST: Create Children that are Specialized by Class and with imbalanced training datasets'''
balance = 0
alpha = 0.1
reg = 0
specialize_by = 'class'
children_hidden_layers = [20]
epochs = 20
validation_frac = .1

w_children, b_children, acc_children, ep_accs, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, 
                                    specialize_by, balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracies for each class:', acc_children)
print('best epochs for each child:', best_epochs)


children accuracies for each class: [0.9962855374065431, 0.9953331111005286, 0.9900471451021478, 0.9869993809229011, 0.9923329682365827, 0.9894280680032382, 0.993856850326206, 0.9925234534977856, 0.9863326825086909, 0.9885232630125244]
best epochs for each child: [1, 2, 8, 17, 14, 12, 2, 15, 15, 15]


In [15]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [200]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 20

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9632839659031383
validation test accuracy per epoch: [ 0.96        0.96346939  0.96367347  0.96408163  0.96408163  0.96367347
  0.96367347  0.96408163  0.96387755  0.96387755  0.96428571  0.96428571
  0.96428571  0.9644898   0.96408163  0.96428571  0.96428571  0.96408163
  0.96428571  0.96428571]
Best Epoch: 13


In [16]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, no interconnections, mutable weights'''
virgin_hidden_layers = [20]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 1
layer_weights_rand = [0, 0, 1]
epochs = 30

ac6, er6, acs6, e6 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9573789228058479
validation test accuracy per epoch: [ 0.9644898   0.96306122  0.96102041  0.95938776  0.96        0.95938776
  0.95979592  0.95795918  0.95632653  0.95693878  0.95918367  0.95857143
  0.9577551   0.95714286  0.95938776  0.95897959  0.96040816  0.96142857
  0.95938776  0.95795918  0.95877551  0.95877551  0.95938776  0.95877551
  0.96        0.96081633  0.95918367  0.96102041  0.96102041  0.96061224]
Best Epoch: 0


**Something that boosted our accuracy quite a bit, was to realize that we were feeding examples to our NN's in order of class, meaning for example 100 examples of class 1, then 100 examples of class 2, etc...for a data set of 1000 examples and 10 classes. This essentially prevented the NN's from learning how to differentiate between classes as well as if they were seeing alternating examples of different classes. We just started shuffling the data sets that weren't already so, and our accuracy went up significantly.**

**From here on we trained a lot of class-specialized models on the MNIST data set, based on the most interesting combinations of design options such as stripping vs not stripping the output layers of the children, adding virgin synthesis layers in the parent NN, making weights mutable/immutable, and adding or not adding crossweights. We are not going through cross-validation here because of the computation cost, but we tuned the parameters to our models using powerful instances on Amazon Web Services throughout our final project, and found that the same optimal parameters applied to almost all of them.**

**We start by training standard 3-layer and 4-layer FFNN's for comparison purposes without our architectures.**

In [17]:
'''Test a standard 3 Layer FFNN'''
epochs = 20
layer_sizes = [784, 200, 10]
w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print(acc)
print(accs)

0.9681889613791133
[ 0.93816327  0.95        0.95612245  0.95857143  0.96081633  0.96285714
  0.96408163  0.96653061  0.96693878  0.96693878  0.96816327  0.96897959
  0.96918367  0.96836735  0.96673469  0.9677551   0.96795918  0.96714286
  0.96714286  0.96816327]


In [18]:
'''Test a standard 4 Layer FFNN'''
epochs = 30
layer_sizes = [784, 200, 10, 10]
w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print(acc)
print(accs)

0.9640935282632507
[ 0.92408163  0.9422449   0.94693878  0.95571429  0.95489796  0.95877551
  0.95836735  0.96122449  0.96326531  0.96510204  0.96408163  0.96469388
  0.96510204  0.96653061  0.96530612  0.9655102   0.96693878  0.96612245
  0.96489796  0.9655102   0.96612245  0.96673469  0.96612245  0.96489796
  0.96591837  0.9655102   0.9655102   0.96571429  0.96571429  0.96530612]


In [19]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, mutable interconnections, immutable weights'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 1
selective_weights = 1
reset_zeros = 0
layer_weights_rand = [0, 0, 1]
epochs = 30

ac3B, er3B, acs3B, e3B = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9597599885708843
validation test accuracy per epoch: [ 0.96081633  0.95938776  0.96        0.95938776  0.95918367  0.95979592
  0.96        0.96020408  0.96102041  0.96102041  0.96061224  0.96081633
  0.96040816  0.96040816  0.96020408  0.96040816  0.96040816  0.96
  0.96020408  0.96020408  0.96040816  0.96061224  0.96061224  0.96081633
  0.96081633  0.96020408  0.96040816  0.96040816  0.96020408  0.96020408]
Best Epoch: 8


In [20]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, no interconnections, mutable weights'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 1
layer_weights_rand = [0, 0, 1]
epochs = 30

ac6B, er6B, acs6B, e6B = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9579027572741559
validation test accuracy per epoch: [ 0.96346939  0.96265306  0.96244898  0.95979592  0.95938776  0.95897959
  0.95714286  0.95693878  0.95428571  0.95836735  0.95795918  0.95755102
  0.95755102  0.95857143  0.96        0.95571429  0.9577551   0.95857143
  0.95877551  0.95979592  0.96061224  0.95959184  0.95918367  0.96020408
  0.95877551  0.95877551  0.95795918  0.95795918  0.95857143  0.96020408]
Best Epoch: 0


**Below we reimplement voting and stacking ensemble methods as special cases using our library in order to see how they fair against our architectures.**

In [21]:
'''Try a Synthesis Network with no layers stripped away, no virgin layers, no interconnections - ie. Voting'''
virgin_hidden_layers = []
strip_layers = 0
skip_layers = 2
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 0]
epochs = 1

ac1, er1, acs1, e1 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers,skip_layers, validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9614743559217106
validation test accuracy per epoch: [ 0.96346939]
Best Epoch: 0


In [22]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, no interconnections - ie. Stacking'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 2
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 0, 1]
epochs = 30

ac2, er2, acs2, e2 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers,skip_layers, validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9596647459402828
validation test accuracy per epoch: [ 0.96346939  0.96367347  0.96326531  0.96326531  0.96326531  0.96306122
  0.96285714  0.96285714  0.96265306  0.96285714  0.96285714  0.96285714
  0.96285714  0.96265306  0.9622449   0.9622449   0.96204082  0.96204082
  0.96163265  0.96163265  0.96163265  0.96163265  0.96163265  0.96163265
  0.96142857  0.96163265  0.96163265  0.96163265  0.96163265  0.96163265]
Best Epoch: 1


In [23]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [200]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 20

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9629982380113339
validation test accuracy per epoch: [ 0.96122449  0.96285714  0.96285714  0.96387755  0.96346939  0.96326531
  0.96367347  0.96346939  0.96346939  0.96367347  0.96367347  0.96326531
  0.96346939  0.96346939  0.96387755  0.96387755  0.96387755  0.96387755
  0.96387755  0.96346939]
Best Epoch: 3


In [24]:
'''Try a Synthesis Network with 1 layers stripped away, 2 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [200, 50]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1, 1]
epochs = 30

ac5, er5, acs5, e5 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9629982380113339
validation test accuracy per epoch: [ 0.95836735  0.96265306  0.96428571  0.96428571  0.96408163  0.96428571
  0.96469388  0.96489796  0.96571429  0.96612245  0.9655102   0.9655102
  0.9655102   0.96591837  0.96591837  0.96571429  0.96571429  0.96632653
  0.96612245  0.96571429  0.96571429  0.9655102   0.96571429  0.96591837
  0.96591837  0.96591837  0.96571429  0.9655102   0.96571429  0.96632653]
Best Epoch: 17


In [25]:
'''TEST EVERYTHING MUTABLE'''
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, with interconnections, mutable weights'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 0, 1]
epochs = 30

ac62, er62, acs62, e62 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9608076575075004
validation test accuracy per epoch: [ 0.95918367  0.96061224  0.96122449  0.95959184  0.9544898   0.96020408
  0.96122449  0.96183673  0.96163265  0.95755102  0.96        0.96163265
  0.96244898  0.96163265  0.96020408  0.95857143  0.9577551   0.96061224
  0.96265306  0.96183673  0.96265306  0.96244898  0.9622449   0.96244898
  0.96306122  0.96265306  0.96265306  0.96244898  0.96306122  0.96163265]
Best Epoch: 24


In [26]:
'''TEST INTERCONNECTIONS INITIALIZED TO RAND'''
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, with interconnections, mutable weights'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1, 1]
epochs = 30

ac63, er63, acs63, e63 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9606171722462974
validation test accuracy per epoch: [ 0.96163265  0.96        0.95938776  0.95979592  0.95857143  0.95632653
  0.95632653  0.95938776  0.95897959  0.9577551   0.95938776  0.9622449
  0.96102041  0.96142857  0.95979592  0.96040816  0.96020408  0.96265306
  0.96326531  0.96244898  0.96183673  0.96183673  0.96285714  0.9622449
  0.96183673  0.96183673  0.96204082  0.96102041  0.96142857  0.96040816]
Best Epoch: 18


In [27]:
'''Create Deep Children with 2 hidden layers that are Specialized by Class'''
balance = 0
alpha = 0.1
reg = 0
specialize_by = 'class'
children_hidden_layers = [20, 20]
epochs = 20
validation_frac = .1

w_children, b_children, acc_children, ep_accs, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, 
                                    specialize_by, balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracies for each class:', acc_children)
print('best epochs for each child:', best_epochs)


children accuracies for each class: [0.9962855374065431, 0.9954759750464308, 0.9903804943092528, 0.9857136054097814, 0.9919519977141769, 0.9886185056431258, 0.9937616076956045, 0.9917138911376733, 0.9853802562026763, 0.9879994285442164]
best epochs for each child: [3, 4, 9, 17, 10, 7, 9, 2, 11, 16]


In [28]:
'''Try a DEEP Synthesis Network with 1 layers stripped away, 2 virgin layers, with interconnections, immutable weights'''
virgin_hidden_layers = [200, 200]
strip_layers = 1
skip_layers = 2
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 0, 1, 1]
epochs = 30

ac61, er61, acs61, e61 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9593313967331778
validation test accuracy per epoch: [ 0.96061224  0.96040816  0.96040816  0.96061224  0.96081633  0.96102041
  0.96102041  0.96102041  0.96081633  0.96081633  0.96081633  0.96122449
  0.96122449  0.96102041  0.96102041  0.96122449  0.96122449  0.96122449
  0.96142857  0.96142857  0.96142857  0.96102041  0.96102041  0.96102041
  0.96081633  0.96061224  0.96040816  0.96061224  0.96081633  0.96081633]
Best Epoch: 18


In [29]:
'''Try a DEEP Synthesis Network with 1 layers stripped away, 2 virgin layers, with RAND interconnections, mutable weights'''
virgin_hidden_layers = [200, 200]
strip_layers = 1
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1, 1, 1]
epochs = 30

ac64, er64, acs64, e64 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9638078003714463
validation test accuracy per epoch: [ 0.96183673  0.96183673  0.96367347  0.96408163  0.96285714  0.96265306
  0.96122449  0.96428571  0.96510204  0.96408163  0.96489796  0.96346939
  0.96387755  0.96571429  0.96510204  0.96469388  0.96469388  0.96387755
  0.96408163  0.96469388  0.96489796  0.96469388  0.96306122  0.96469388
  0.96367347  0.96408163  0.96469388  0.96489796  0.96367347  0.96489796]
Best Epoch: 13


**Here we tried using Dimensionality Reduction on the MNIST data set in order to test the effects of some of our feature specialization architectures. We one again ran a standard FFNN for comparison purporse; however the results of the models, including the standard FFNN, were subpar compared to using class specialization on the original data set.**

In [30]:
'''Use the reduced dimensionality set before running feature specialized children networks'''
num_pca = 10
x_train, y_train, x_test, y_test = preprocess_mldata(dataset, .7)
x_train, x_test = dim_reduce(x_train, y_train, x_test, num_pca)



In [31]:
'''Test a normal 3 Layer FFNN'''
epochs = 20
layer_sizes = [19, 190, 10]
w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print(acc)
print(accs)

0.9446164103052527
[ 0.91142857  0.92306122  0.92857143  0.93285714  0.93346939  0.93591837
  0.93755102  0.93877551  0.93877551  0.94122449  0.94040816  0.93938776
  0.94020408  0.94061224  0.94020408  0.94163265  0.94081633  0.94204082
  0.94102041  0.94306122]


In [32]:
'''Create Children that are Specialized by Features'''
specialize_by = 'features'
children_hidden_layers = [10]
epochs = 20

w_children, b_children, acc_children, _, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, specialize_by, 
                                                     balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracy is', acc_children)
print('best epochs for children are', best_epochs)

children accuracy is [0.3036811276727463, 0.2112481546740321, 0.29810943378256105, 0.3475879803800181, 0.284442116291252, 0.22586789847135577, 0.18700890518596125, 0.2244392590123339, 0.21734368303252535, 0.18215153102528692, 0.4193533025382161, 0.360064764988809, 0.4194961664841183, 0.326729844278299, 0.2989189961426735, 0.2927758464688795, 0.26791751988189916, 0.24329729987142246, 0.2369160436211248]
best epochs for children are [18, 19, 5, 10, 11, 9, 11, 1, 17, 6, 3, 8, 3, 1, 0, 13, 5, 18, 2]


In [33]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [190]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 20

ac9, er9, acs9, e9 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers,skip_layers, validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.910519548549931
validation test accuracy per epoch: [ 0.90387755  0.90857143  0.91        0.91        0.91081633  0.91061224
  0.91122449  0.91204082  0.91244898  0.91306122  0.91326531  0.91387755
  0.9144898   0.9144898   0.91489796  0.91489796  0.91489796  0.91510204
  0.91510204  0.91510204]
Best Epoch: 17


In [34]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, no interconnections, mutable weights'''
virgin_hidden_layers = [10]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 1
layer_weights_rand = [0, 0, 1]
epochs = 30

ac10, er10, acs10, e10 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9099957140816229
validation test accuracy per epoch: [ 0.89122449  0.90346939  0.90653061  0.91081633  0.91244898  0.91367347
  0.91346939  0.91387755  0.91306122  0.91326531  0.91428571  0.9144898
  0.91367347  0.9144898   0.91367347  0.91326531  0.91367347  0.9144898
  0.9144898   0.91591837  0.91653061  0.91591837  0.91653061  0.91632653
  0.91673469  0.91714286  0.91734694  0.91673469  0.91673469  0.91653061]
Best Epoch: 26


**We then used the Abalone data set to test our feature specialized architectures.**

In [35]:
abalone = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data", header=None)

#encode non-numerical values into integers, in this case M for male, F for female and I for infant into 0, 1 and 2
abalone[0] = abalone[[0]].apply(lambda x: x.astype('category')).apply(lambda x: x.cat.codes)

abalone = abalone.rename(columns = {8:'class'})
#remove the classes with fewer than 10 examples
abalone = abalone.drop(abalone[abalone['class'] < 3].index)
abalone = abalone.drop(abalone[abalone['class'] > 21].index)
abalone['class'] = abalone['class'] - 3

x_train, y_train, x_test, y_test = split(abalone, .7)

# Standardize the training and test data
scaler = preprocessing.StandardScaler().fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

#binarize classes
y_train = y_bin(y_train).astype(np.int64)
y_test = y_bin(y_test).astype(np.int64)

In [36]:
balance = 0
alpha = 0.01
reg = 0
validation_frac = .1

**As before, we first run a standard FFNN on the data set for comparison purporses, before moving on to testing our architectures**

In [37]:
'''Test a standard 3 Layer FFNN'''
epochs = 100
layer_sizes = [8, 160, 19]
w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print(acc)
print(accs)


0.23897353648757017
[ 0.18900344  0.17525773  0.19243986  0.18556701  0.18900344  0.19243986
  0.19931271  0.21305842  0.23367698  0.23367698  0.24398625  0.24398625
  0.24054983  0.24054983  0.2371134   0.23024055  0.23024055  0.22680412
  0.23024055  0.23024055  0.22680412  0.22680412  0.2233677   0.2233677
  0.2233677   0.2233677   0.23024055  0.23024055  0.23024055  0.23024055
  0.23024055  0.22680412  0.22680412  0.22680412  0.22680412  0.22680412
  0.2233677   0.2233677   0.2233677   0.2233677   0.2233677   0.21993127
  0.21993127  0.2233677   0.2233677   0.2233677   0.2233677   0.2233677
  0.2233677   0.2233677   0.2233677   0.2233677   0.2233677   0.2233677
  0.22680412  0.23024055  0.23024055  0.23024055  0.23024055  0.22680412
  0.22680412  0.22680412  0.22680412  0.22680412  0.22680412  0.22680412
  0.22680412  0.22680412  0.22680412  0.22680412  0.2233677   0.2233677
  0.2233677   0.2233677   0.2233677   0.2233677   0.22680412  0.22680412
  0.22680412  0.22680412  0.2268041

In [38]:
'''Create Children that are Specialized by Features'''
specialize_by = 'features'
children_hidden_layers = [20]
epochs = 50

w_children, b_children, acc_children, accs, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, 
                                    specialize_by, balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracy is', acc_children)
print('best epochs for children are', best_epochs)

children accuracy is [0.16599839615076184, 0.16680032076984763, 0.1740176423416199, 0.16599839615076184, 0.23817161186848437, 0.20609462710505214, 0.18203688853247796, 0.21331194867682438]
best epochs for children are [0, 0, 38, 0, 14, 19, 5, 47]


In [39]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, with interconnections, mutable weights'''
virgin_hidden_layers = [160]
strip_layers = 1
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [1, 1]
epochs = 100

ac1F, er1F, acs1F, e1F = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers,skip_layers, validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.25340817963111467
validation test accuracy per epoch: [ 0.21993127  0.20618557  0.20962199  0.2233677   0.24742268  0.25773196
  0.27147766  0.26804124  0.27491409  0.26804124  0.26460481  0.26116838
  0.25773196  0.25773196  0.25085911  0.24742268  0.24398625  0.24742268
  0.24742268  0.24742268  0.24742268  0.24742268  0.24742268  0.24054983
  0.23367698  0.23367698  0.23367698  0.23024055  0.23367698  0.23367698
  0.23367698  0.23367698  0.23024055  0.23367698  0.23367698  0.23024055
  0.22680412  0.22680412  0.2233677   0.2233677   0.22680412  0.22680412
  0.2233677   0.2233677   0.22680412  0.22680412  0.22680412  0.22680412
  0.22680412  0.22680412  0.22680412  0.22680412  0.2233677   0.22680412
  0.22680412  0.22680412  0.23024055  0.23024055  0.23024055  0.23024055
  0.23024055  0.2233677   0.2233677   0.21993127  0.21649485  0.21993127
  0.21993127  0.21993127  0.21993127  0.21993127  0.21993127  0.21993127
  0.21993127  0.21993127  0.21993127  0.22680412 

In [40]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, with interconnections, immutable weights'''
virgin_hidden_layers = [160]
strip_layers = 1
skip_layers = 0
selective_weights = 1
reset_zeros = 0
layer_weights_rand = [1, 1]
epochs = 100

ac1F, er1F, acs1F, e1F = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers,skip_layers, validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.2550120288692863
validation test accuracy per epoch: [ 0.16838488  0.19243986  0.18900344  0.20274914  0.23024055  0.2371134
  0.24742268  0.26116838  0.26460481  0.26460481  0.26804124  0.26804124
  0.26460481  0.26116838  0.26116838  0.26116838  0.25773196  0.25773196
  0.25429553  0.25773196  0.25773196  0.25773196  0.25773196  0.26116838
  0.26116838  0.25773196  0.25429553  0.25429553  0.25085911  0.25085911
  0.24742268  0.24742268  0.24742268  0.24742268  0.24398625  0.24742268
  0.24398625  0.24398625  0.24398625  0.24054983  0.24054983  0.24054983
  0.24054983  0.2371134   0.2371134   0.2371134   0.2371134   0.2371134
  0.23367698  0.2371134   0.23367698  0.2371134   0.24054983  0.24054983
  0.24398625  0.24398625  0.24054983  0.2371134   0.2371134   0.2371134
  0.2371134   0.2371134   0.2371134   0.2371134   0.2371134   0.2371134
  0.2371134   0.2371134   0.2371134   0.2371134   0.2371134   0.2371134
  0.2371134   0.23024055  0.23024055  0.23367698  0.233

In [41]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [160]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 100

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.2574178027265437
validation test accuracy per epoch: [ 0.18556701  0.18556701  0.21305842  0.23367698  0.24054983  0.23024055
  0.23024055  0.22680412  0.21993127  0.22680412  0.24054983  0.25085911
  0.25773196  0.25085911  0.25085911  0.25085911  0.25085911  0.24742268
  0.24398625  0.24742268  0.24742268  0.24742268  0.25085911  0.25085911
  0.25085911  0.25085911  0.25085911  0.24742268  0.25085911  0.24742268
  0.25085911  0.25085911  0.25085911  0.25085911  0.24742268  0.24742268
  0.24742268  0.24398625  0.24398625  0.24398625  0.24398625  0.24398625
  0.24398625  0.24398625  0.24054983  0.2371134   0.2371134   0.2371134
  0.2371134   0.2371134   0.23367698  0.23367698  0.23367698  0.23367698
  0.23024055  0.23024055  0.23024055  0.23024055  0.22680412  0.22680412
  0.2233677   0.2233677   0.2233677   0.2233677   0.2233677   0.2233677
  0.22680412  0.22680412  0.22680412  0.22680412  0.22680412  0.22680412
  0.2233677   0.2233677   0.2233677   0.2233677   0.

In [45]:
'''TEST EVERYTHING MUTABLE'''
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, with interconnections, mutable weights'''
virgin_hidden_layers = [19]
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1, 1]
epochs = 100

ac62, er62, acs62, e62 = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.16599839615076184
validation test accuracy per epoch: [ 0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058 

In [43]:
'''Try a Synthesis Network with no layers stripped away, 1 virgin layers, mutable interconnections, immutable weights'''
virgin_hidden_layers = [19]
strip_layers = 0
skip_layers = 1
selective_weights = 1
reset_zeros = 0
layer_weights_rand = [0, 0, 1]
epochs = 100

ac3B, er3B, acs3B, e3B = SNN(x_train, y_train, alpha, reg, epochs,w_children,b_children,strip_layers,skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.16599839615076184
validation test accuracy per epoch: [ 0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058  0.18213058
  0.18213058  0.18213058  0.18213058  0.18213058 

**After seeing very encouraging results with some of our architectures, especially in terms of how quickly the parent NN's train and reach maximum accuracy compared to a standard FFNN, we decided to test how our architectures would fare on very small data sets, reasoning that class or feature specialization might be especially effective in those cases by making the most out of the small number of examples to learn from.
We decided to train our models on 5% of the MNIST data set, a small fraction of the original but still a total of 700 examples, about 7 examples per class.**

In [65]:
'''Setup MNIST'''
dataset = 'mnist'
balance = 0
alpha = 0.1
reg = 0
validation_frac = .1

x_train, y_train, x_test, y_test = preprocess_mldata(dataset, .05)

In [66]:
'''IMBALANCE TEST: Create Children that are Specialized by Class and with imbalanced training datasets'''
balance = 0
specialize_by = 'class'
children_hidden_layers = [20]
epochs = 20

w_children, b_children, acc_children, ep_accs, best_epochs = Child_Spawner(x_train, y_train, children_hidden_layers, specialize_by, 
                                                     balance, alpha, reg, epochs, validation_frac, x_test, y_test)
print('children accuracy is', acc_children)
print('best epochs for children are', best_epochs)

children accuracy is [0.9919998796974391, 0.9912479886915593, 0.9800448127039504, 0.9754282019278485, 0.9818042376577092, 0.9775786102046647, 0.9873231176408668, 0.9819245402186499, 0.9737138904344427, 0.9761049038331403]
best epochs for children are [4, 5, 0, 12, 3, 7, 2, 4, 0, 7]


**Once again, we start by running a standard FFNN for comparison purposes.**

In [67]:
'''Test a standard 3 Layer FFNN'''
epochs = 40
layer_sizes = [784, 200, 10]
w, b, errors, accs, best_epoch = FFNN(x_train, y_train, layer_sizes, alpha, reg, 
                    epochs, [], [], 0, 0, np.ones([len(layer_sizes)]), 0, 0, validation_frac)
acc, conf_mat = FFNN_acc(x_test, y_test, layer_sizes, w, b)
print(acc)
print(accs)

0.9114873907878314
[ 0.88857143  0.88571429  0.9         0.89714286  0.90571429  0.90571429
  0.90285714  0.91142857  0.90857143  0.90857143  0.90857143  0.90857143
  0.91142857  0.91142857  0.91428571  0.91142857  0.90857143  0.90857143
  0.90857143  0.90857143  0.90857143  0.90571429  0.90285714  0.90285714
  0.90285714  0.90285714  0.9         0.90285714  0.90571429  0.90285714
  0.9         0.9         0.9         0.9         0.9         0.9         0.9
  0.90285714  0.90285714  0.90285714]


In [68]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = [200]
strip_layers = 1
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 40

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.919276981608746
validation test accuracy per epoch: [ 0.91714286  0.91714286  0.91714286  0.92        0.91714286  0.91714286
  0.91714286  0.92        0.92        0.92285714  0.92285714  0.92285714
  0.92        0.92        0.92        0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.92        0.92        0.92      ]
Best Epoch: 9


In [69]:
'''Try a Synthesis Network with no layers stripped away, no virgin layers, with interconnections, immutable weights'''
virgin_hidden_layers = []
strip_layers = 0
skip_layers = 1
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 1]
epochs = 40

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9180438803591031
validation test accuracy per epoch: [ 0.91714286  0.91714286  0.92        0.92        0.92        0.92        0.92
  0.92        0.92285714  0.92285714  0.92285714  0.92285714  0.92285714
  0.92285714  0.92285714  0.92285714  0.92285714  0.92285714  0.92285714
  0.92285714  0.92285714  0.92285714  0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.92        0.92        0.92      ]
Best Epoch: 8


In [70]:
'''Try a Synthesis Network with 1 layers stripped away, 1 virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = []
strip_layers = 0
skip_layers = 0
selective_weights = 0
reset_zeros = 0
layer_weights_rand = [0, 0]
epochs = 40

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9193371328892164
validation test accuracy per epoch: [ 0.91428571  0.91714286  0.91714286  0.92        0.92        0.92        0.92
  0.92        0.92        0.92        0.92        0.91714286  0.91714286
  0.92        0.92        0.92        0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286]
Best Epoch: 3


In [71]:
'''Try a Synthesis Network with 0 layers stripped away, no virgin layers, no interconnections, immutable weights'''
virgin_hidden_layers = []
strip_layers = 0
skip_layers = 0
selective_weights = 1
reset_zeros = 1
layer_weights_rand = [0, 0]
epochs = 40

ac4, er4, acs4, e4 = SNN(x_train, y_train, alpha, reg, epochs, w_children, b_children, strip_layers, skip_layers,validation_frac,
    layer_weights_rand, selective_weights, reset_zeros, children_hidden_layers, virgin_hidden_layers, x_test, y_test)

test set accuracy: 0.9179987668987504
validation test accuracy per epoch: [ 0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286  0.91714286
  0.91714286  0.91714286  0.91714286  0.91714286]
Best Epoch: 0
