# Introduction
In this document, I create a neural network, and explore methods of training the network.
A neural network consists of layers of nodes. Each layer of nodes passes output to each node in the next layer. We call the connections between nodes in different layers "edges". The number of edges connected to each node is equal to the sum of the number of edges in the two adjacent layers. These edges are each assigned different weights. To get the output of each node, the inputs are multiplied by the weight associated with it, and then sumed. The sum is then passed through a function. Usually this is refered to as an activation function. I used the hyperbolic tangent function. 
After the inputs have passed all the way through the network, the final output is compared to the known output. Then, the accuracy of the network is determined. Finally, a program adjusts the weights to optimize the accuracy (or minimize the error). 

## Creating a Neural Network
Below are two functions that I use to create a randomly generated neural network. The first function, createLayer, creates a layer of nodes. Each node is a list of edges (randomly generated weights between -1 and 1) connecting it to the nodes in the previous layer. The second function, createNetwork, creates a list of layers. The final structure is a list of lists of floats. 
Although createNetwork creates a network with layers that contain the same number of nodes, the createLayer function could be used to create networks with layers containing different numbers of nodes. 

In [2]:
import math
import random
import copy

#creates a layer of random node weights
def createLayer(node_number, input_number):
    layer = []
    #creates each node
    for i in range(node_number):
        node = []
        #creates weights for each node (number of weights = number of inputs node gets = number of nodes in prev. layer)
        for j in range(input_number): 
            node.append((0.5-random.random())*2)
        layer.append(node)
    return layer
    
#creates a network of arrays
def createNetwork(hid_layer_number, init_input_number, layer_size):
    layers = []
    #creates an input layer 
    layers.append(createLayer(layer_size, init_input_number))
    for i in range(hid_layer_number):
        layers.append(createLayer(layer_size, layer_size))
    layers.append(createLayer(1, layer_size))
    return layers

## Getting the Output of the Network
Each node is passed a value from every other node in the previous layer (the first layer is passed the input). To compute each node's outputs, you multiply each input by its weight, and sum them. This is then passed through an activation function. In this case, I used the hyperbolic tangent function, but you can use any bounded function. 
The function createOutput computes this value. 
The next function, runNetwork, creates the final output of the network. For the first layer, runNetwork passes in the input data, and gets a list of outputs from each node in that layer. It then runs createOutput for the next layer, this time with the outputs of the previous layer as inputs. The final layer has a single node, and produces a single output. 

In [3]:
def createOutput(input_list, weight_list):
    weighted_sum = 0
    #takes the dot product of the input list and weight list, then feeds it through the hyperbolic tangent function
    for i in range(len(input_list)):
        weighted_sum+=(weight_list[i]*input_list[i])
    #takes the hyperbolic tangent of the weighted sum
    return math.tanh(weighted_sum)

def runNetwork(Input, network):
    #iterates through layers
    for i in range(len(network)): 
        #iterates through nodes in layers
        output_list = []
        #creates the outputs of each layer, then feeds them into the next layer
        for j in range(len(network[i])):
            output_list.append(createOutput(Input, network[i][j]))
        Input = output_list
    return output_list[0]

## Starting the Project
When I started the project, I used some simple data. I wanted my network to take 2 inputs, x and y, and compute whether x was greater than y. To do this, I created a dataset that had an output of -1 if x < y, and an output of 1 if x > y. 
I had my neural network round the final output so it would give me a binary result. Then, I could feed all the data through the network and compute the percent of results that were accurate, and try to maximize that. 
I tried a few methods to maximize accuracy. First, I simply replaced each weight with a new, randomly generated weight. If the accuracy of the network was greater than before, I kept the weight. If not, I threw it out and tried again. 

In [4]:
def runNetworkBinary(Input, nework):
        #iterates through layers
    for i in range(len(network)): 
        #iterates through nodes in layers
        output_list = []
        #creates the outputs of each layer, then feeds them into the next layer
        for j in range(len(network[i])):
            output_list.append(createOutput(Input, network[i][j]))
        Input = output_list
        #ROUNDS the output to get a binary output
    return round(output_list[0])

def createDataBinary(length): 
    data_list = []
    output_list = []
    for i in range(length):
        x = random.random()
        y = random.random()
        if (x/y) > 1: 
            above_line = 1
        elif (x/y) == 1: 
            above_line = 0
        elif (x/y) < 1: 
            above_line = -1
            
        data_list.append([x, y])
        output_list.append(above_line)
    return data_list, output_list

def getAccuracy(input_list, output_list, network):
    accurate_total = 0
    predicted_output = []
    for i in range(len(input_list)):
        output = runNetworkBinary(input_list[i], network)
        predicted_output.append(output)
        if output == output_list[i]:
            accurate_total+=1
    accuracy = accurate_total/len(output_list) * 100
    return accuracy, predicted_output

    
#randomly adjusts the weights until it finds a better one  
def trainRandom(input_list, output_list, network):
    oldAccuracy = getAccuracy(input_list, output_list, network)[0]
    print("Initial Accuracy:", oldAccuracy, "%")
    for i in range(len(network)):
        for j in range(len(network[i])):
            for k in range(len(network[i][j])):
                #stores the old weight (copy function important, if not there python just makes a pointer)
                old_weight = copy.copy(network[i][j][k])
                #picks a new random value for network[i][j][k]
                network[i][j][k] = 2*(random.random()-0.5) #assigns a new random value to the edge
                newAccuracy = getAccuracy(input_list, output_list, network)[0] 
                tries=0
                while newAccuracy <= oldAccuracy:
                    tries+=1
                    network[i][j][k] = 2*(random.random()-0.5)
                    newAccuracy = getAccuracy(input_list, output_list, network)[0]
                    if tries>250:
                        tries=0
                        network[i][j][k] = old_weight
                        print("no better weight found")
                        break
                tries=0
                if newAccuracy > oldAccuracy:
                    oldAccuracy = newAccuracy
                    print(oldAccuracy)
    return network, oldAccuracy


Below, I run the function that randomly changes each weight until it finds a better one. If it has tried 250 times, the function moves on to the next weight. Interestingly, if the neural network is particularly small, sometimes the function doesn't get past 0.0% accuracy. Other times, it moves past 0.0% fairly soon--and then improves quickly. This also depends on the size of the network--the larger it is, the more accurate (or closer to random) it is in the beginning. (You should run the box below a few times to see for yourself). 
This system didn't work very well. 

In [12]:
input_list, output_list = createDataBinary(100)
Network = createNetwork(3, 2, 5)
trainRandom(input_list, output_list, network)

Initial Accuracy: 47.0 %
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better

([[[-0.9848969856379275, -0.4288850071224177],
   [-0.31590984652007115, -0.8431560176025068],
   [-0.45636020210272776, -0.8023920065529486],
   [0.7015539133813553, 0.9238309808234924],
   [-0.20830101968296133, -0.8996568231111568]],
  [[0.6015437212329693,
    0.6723850132964759,
    0.38968648703624376,
    -0.9351444817056864,
    0.6198161392084254],
   [-0.3760584101711655,
    0.08510160093227381,
    -0.22266831273422638,
    0.5343761437248851,
    -0.9224251725869177],
   [-0.9961571383554302,
    0.1062217094511202,
    -0.48440131146000565,
    -0.9111369978264869,
    -0.6063941839122344],
   [-0.06397099787648175,
    -0.5673572953642756,
    0.27067494493054434,
    -0.5713073498018966,
    0.7615428214008892],
   [0.17876598067810834,
    0.5807275017104541,
    -0.44354361870954384,
    -0.287176341114068,
    0.3986534287446135]],
  [[-0.6182693883947992,
    -0.6037925503498205,
    0.5009123786558063,
    -0.22336929417896156,
    0.24855874578678194],
   [-0.9792

Next, I decided to find a network that was doing OK initially, and then change each weight by a little bit until the function was more accurate. 

In [9]:
def getOKNetwork(input_list, output_list):
    network = createNetwork(1, 2, 2) #three hidden layers, 2 inputs, 5 nodes/layer
    accuracy = getAccuracy(input_list, output_list, network)[0]
    print(accuracy)
    for i in range(200):
        new_network = createNetwork(1, 2, 4)
        new_accuracy = getAccuracy(input_list, output_list, new_network)[0]
        if new_accuracy > accuracy:
            print("found better network")
            network = new_network
            accuracy = new_accuracy
    print("Best Accuracy:", accuracy)
    return network, accuracy


def trainStep(input_list, output_list, network, scale_init):
    scale = 1
    for x in range(3):
        #makes the scale smaller and smaller with every repeat
        scale *= scale_init
        for y in range(4):
            oldAccuracy = getAccuracy(input_list, output_list, network)[0]
            print("Initial Accuracy:", oldAccuracy, "%")
            for i in range(len(network)):
                for j in range(len(network[i])):
                    for k in range(len(network[i][j])):
                        #copies the old weight
                        old_weight = copy.copy(network[i][j][k])
                        #adjusts the weight by a little bit
                        network[i][j][k] += scale * (0.5-random.random())*2.0
                        #gets the new accuracy
                        newAccuracy = getAccuracy(input_list, output_list, network)[0]
                        tries=0
                        while newAccuracy <= oldAccuracy:
                            tries+=1
                            network[i][j][k] = old_weight
                            network[i][j][k] += scale * (0.5-random.random())*2.0
                            newAccuracy = getAccuracy(input_list, output_list, network)[0]
                            if tries>250:
                                network[i][j][k] = old_weight
                                print("no better weight found")
                                break
                        tries=0
                        if newAccuracy > oldAccuracy: 
                            oldAccuracy = newAccuracy
                            print(oldAccuracy)
    return network, oldAccuracy

In the cell below, I get some OK networks. Jupyter is being very weird: if I run this in spyder, I get different accuracies and neworks every time I run the function. This is true for trainRandom as well. Here, you only get a new one if you reinitialize the kernel. I have no clue why, but I think it has something to do with the way jupyter stores variables, or produces random numbers..
The spyder files that are also in the github work, though they're not formatted nicely. 


In [15]:
Network2, acc = getOKNetwork(input_list, output_list)
Network3, acc = getOKNetwork(input_list, output_list)
#print("Start Step Training")
#trainStep(input_list, output_list, Network, 0.5)

Initial Accuracy: 47.0 %
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better weight found
no better

## Documenting how the accuracy changes when weights are changed
This function cycles through a network. It adds a little bit to a weight, records the new accuracy of the network in a another array of the same structure, then reinitializes the network, and does it again for the next weight. 
I had A LOT of trouble with this. It kept giving me different accuracies when I called the function multiple times with the same input. 
I spent probably four hours trying to debug the program. Finally, I figured out that the problem was that when I said networkCopy = network, I wasn't actually making a copy of the network, I was just making a pointer. I fixed it by saying networkCopy = copy.deepcopy(network). 
The function returns an array of the change in accuracy in the same structure as the original network, and an array of the derivative of accuracy (change in accuracy divided by step size). 

In [None]:
def accArray(input_list, output_list, network, step):
    initial_acc = getAccuracy(Input, Output, network)[0]
    print("Initial Accuracy:", initial_acc)
    accArray = copy.deepcopy(network)
    derivArray = copy.deepcopy(network)
    for i in range(len(network)):
        for j in range(len(network[i])):
            for k in range(len(network[i][j])):
                #initializes networkCopy to be the same as network
                networkCopy = copy.deepcopy(network)
                networkCopy[i][j][k] += step
                accArray[i][j][k] = initial_acc - getAccuracy(input_list, output_list, networkCopy)[0]
                derivArray[i][j][k]= accArray[i][j][k]/step
    return accArray, derivArray

print(accArray(input_list, output_list, Network2))

## I am switching back to spyder because the problem discussed above with generating networks is annoying me, and I do not want to fix it. Please look in the backpropogation.py file (not all the code above is in that file, only new stuff). More writing is below. 

Some writing about what is in that other file:
-I started looking more closely into backpropogation, then decided not to do it. However, I decided to change from calculating accuracy to calculating a squared error value. I also stopped rounding my outputs. So, even though the acutal outputs of my data set are binary, I have make floats that are close to the binary numbers. 
-The function getSquaredError gets the average squared error of the data set
-The function errArray creates an array of the difference between squared errors after changing a weight (it functions exactly like accArray but with squared errors instead)
-The function scaleWeight (last function in the file) takes each weight adds the step size that was added in errArray times the change in error. It does that until the error gets worse, or it has done it five times. Then, it moves onto the next weight. 
-The code at the very bottom creates a dataset. Then, it creates a randomly generated neural network that is pretty good (using the getOKNetwork function). Next, it trains the network using the scaleError function 100 times, printing the new accuracy each time. Finally, the program creates a new dataset, which is then fed through the improved network. The program then prints the actual and predicted value of the new dataset. 