# Artificial neural networks 
In this section we will look at what is arguably the most common type of artificial neural network, a feed-forward network with backpropagation. Feed-forward means the signal is generally moving in one direction through the network. Backpropagation means we will determine errors at the end of each signal’s traversal through the network and try to distribute fixes for those errors back through the network, especially affecting the neurons that were most responsible for them.
## Neurons
The smallest unit in an artificial neural network is a neuron. It holds a vector of weights, which are just floating-point numbers. A vector of inputs (also just floating-point numbers) is passed to the neuron. It combines those inputs with its weights using a dot product. It then runs an activation function on that product and spits the result out as its output. This action can be thought of as analagous to a real neuron firing.
An activation function is a transformer of the neuron’s output. The activation function is almost always nonlinear, which allows neural networks to represent solutions to nonlinear problems. 
## Layers
In a typical feed-forward artificial neural network, neurons are organized in layers. Each layer consists of a certain number of neurons lined up in a row or column. In a feed-forward network, which is what we will be building, signals always pass in a single direction from one layer to the next. The neurons in each layer send their output signal to be used as input to the neurons in the next layer. Every neuron in each layer is connected to every neuron in the next layer. 
The first layer is known as the input layer, and it receives its signals from some external entity. The last layer is known as the output layer, and its output typically must be interpreted by an external actor to get an intelligent result. The layers between the input and output layers are known as hidden layers. 
Imagine that the network was designed to classify small black-and-white images of animals. Perhaps the input layer has 100 neurons representing the grayscale intensity of each pixel in a 10 x 10 pixel animal image, and the output layer has 5 neurons representing the likelihood that the image is of a mammal, reptile, amphibian, fish, or bird. The final classification could be determined by the output neuron with the highest floating-point output. If the output numbers were 0.24, 0.65, 0.70, 0.12, and 0.21, respectively, the image would be determined to be an amphibian.
## Backpropagation
Backpropagation finds the error in a neural network’s output and uses it to modify the weights of neurons. The neurons most responsible for the error are most heavily modified.
Before they can be used, most neural networks must be trained. We must know the right outputs for some inputs so that we can use the difference between expected outputs and actual outputs to find errors and modify weights. In other words, neural networks know nothing until they are told the right answers for a certain set of inputs, so that they can prepare themselves for other inputs. Backpropagation only occurs during training.
## Dot product
As you will recall, dot products are required both for the feed-forward phase and for the backpropagation phase. Luckily, a dot product is simple to implement using the Python built-in functions zip() and sum().

In [1]:
from typing import List
from math import exp

# dot product of two vectors
def dot_product(xs: List[float], ys: List[float]) -> float:
    return sum(x * y for x, y in zip(xs, ys))

## The activation function 
The activation function has two purposes: It allows the neural network to represent solutions that are not just linear transformations (as long as the activation function itself is not just a linear transformation), and it can keep the output of each neuron within a certain range. An activation function should have a computable derivative so that it can be used for backpropagation. 

In [2]:
# the classic sigmoid activation function
def sigmoid(x: float) -> float:
    return 1.0 / (1.0 + exp(-x))

def derivative_sigmoid(x: float) -> float:
    sig: float = sigmoid(x)
    return sig * (1 - sig) 

## Implementing neurons 
An individual neuron will store many pieces of state, including its weights, its delta, its learning rate, a cache of its last output, and its activation function, along with the derivative of that activation function. Some of these elements could be more efficiently stored up a level (in the future Layer class), but they are included in the following Neuron class for illustrative purposes. 

In [3]:
from typing import List, Callable

class Neuron:
    def __init__(self, weights: List[float], learning_rate: float,
     activation_function: Callable[[float], float], derivative_activation_function: 
                 Callable[[float], float]) -> None:
        self.weights: List[float] = weights
        self.activation_function: Callable[[float], float] = activation_function
        self.derivative_activation_function: Callable[[float], float] = derivative_activation_function
        self.learning_rate: float = learning_rate
        self.output_cache: float = 0.0
        self.delta: float = 0.0

    def output(self, inputs: List[float]) -> float:
        self.output_cache = dot_product(inputs, self.weights)
        return self.activation_function(self.output_cache)

## Implementing layers 
A layer in our network will need to maintain three pieces of state: its neurons, the layer that preceded it, and an output cache. The output cache is similar to that of a neuron, but up one level. It caches the outputs (after activation functions are applied) of every neuron in the layer.

At creation time, a layer’s main responsibility is to initialize its neurons. Our Layer class’s __init__() method therefore needs to know how many neurons it should be initializing, what their activation functions should be, and what their learning rates should be. In this simple network, every neuron in a layer has the same activation function and learning rate. 

In [4]:
from __future__ import annotations
from typing import List, Callable, Optional
from random import random

class Layer:
    def __init__(self, previous_layer: Optional[Layer], num_neurons: int, learning_rate: float, 
                 activation_function: Callable[[float], float], 
                 derivative_activation_function: Callable[[float], float]) -> None:
        
        self.previous_layer: Optional[Layer] = previous_layer
        self.neurons: List[Neuron] = []
        
            
        # the following could all be one large list comprehension 
        for i in range(num_neurons):
            if previous_layer is None:
                random_weights: List[float] = []
            else:
                random_weights = [random() for _ in range(len(previous_layer.neurons))]
            neuron: Neuron = Neuron(random_weights, learning_rate, activation_function, 
                                    derivative_activation_function)
            self.neurons.append(neuron)
            
        self.output_cache: List[float] = [0.0 for _ in range(num_neurons)]
            
    def outputs(self, inputs: List[float]) -> List[float]:
        if self.previous_layer is None:
            self.output_cache = inputs
        else:
            self.output_cache = [n.output(inputs) for n in self.neurons]
        return self.output_cache
    
    # should only be called on output layer
    def calculate_deltas_for_output_layer(self, expected: List[float]) -> None:
        for n in range(len(self.neurons)):
            self.neurons[n].delta = self.neurons[n].derivative_activation_function(
                    self.neurons[n].output_cache) * (expected[n] - self.output_cache[n])

    # should not be called on output layer
    def calculate_deltas_for_hidden_layer(self, next_layer: Layer) -> None:
        for index, neuron in enumerate(self.neurons):
            next_weights: List[float] = [n.weights[index] for n in next_layer.neurons]
            next_deltas: List[float] = [n.delta for n in next_layer.neurons]
            sum_weights_and_deltas: float = dot_product(next_weights, next_deltas)
            neuron.delta = neuron.derivative_activation_function(neuron.output_cache) * sum_weights_and_deltas

## Implementing the network 
The network itself has only one piece of state: the layers that it manages. The Network class is responsible for initializing its constituent layers. 

In [5]:
from __future__ import annotations
from typing import List, Callable, TypeVar, Tuple
from functools import reduce

T = TypeVar('T') # output type of interpretation of neural network

class Network:
    def __init__(self, layer_structure: List[int], learning_rate: float,
     activation_function: Callable[[float], float] = sigmoid, 
                 derivative_activation_function: Callable[[float], float] = derivative_sigmoid) -> None:
        if len(layer_structure) < 3:
            raise ValueError("Error: Should be at least 3 layers (1 input, 1 hidden, 1 output)")
        self.layers: List[Layer] = []
        # input layer
        input_layer: Layer = Layer(None, layer_structure[0], learning_rate, 
                                   activation_function, derivative_activation_function)
        self.layers.append(input_layer)
        # hidden layers and output layer
        for previous, num_neurons in enumerate(layer_structure[1::]):
            next_layer = Layer(self.layers[previous], num_neurons, learning_rate, 
                               activation_function, derivative_activation_function)
            self.layers.append(next_layer)
            
    # Pushes input data to the first layer, then output from the first
    # as input to the second, second to the third, etc.
    def outputs(self, input: List[float]) -> List[float]:
        return reduce(lambda inputs, layer: layer.outputs(inputs), self.layers, input)
    
    # Figure out each neuron's changes based on the errors of the output
    # versus the expected outcome
    def backpropagate(self, expected: List[float]) -> None:
        # calculate delta for output layer neurons
        last_layer: int = len(self.layers) - 1
        self.layers[last_layer].calculate_deltas_for_output_layer(expected)
        # calculate delta for hidden layers in reverse order
        for l in range(last_layer - 1, 0, -1):
            self.layers[l].calculate_deltas_for_hidden_layer(self.layers[l + 1])
            
    # backpropagate() doesn't actually change any weights
    # this function uses the deltas calculated in backpropagate() to
    # actually make changes to the weights
    def update_weights(self) -> None:
        for layer in self.layers[1:]: # skip input layer
            for neuron in layer.neurons:
                for w in range(len(neuron.weights)):
                    neuron.weights[w] = neuron.weights[w] + (neuron.learning_rate
        * (layer.previous_layer.output_cache[w]) * neuron.delta)
                    
    # train() uses the results of outputs() run over many inputs and compared
    # against expecteds to feed backpropagate() and update_weights()
    def train(self, inputs: List[List[float]], expecteds: List[List[float]]) -> None:
        for location, xs in enumerate(inputs):
            ys: List[float] = expecteds[location]
            outs: List[float] = self.outputs(xs)
            self.backpropagate(ys)
            self.update_weights()
            
    # for generalized results that require classification
    # this function will return the correct number of trials
    # and the percentage correct out of the total
    def validate(self, inputs: List[List[float]], expecteds: List[T], interpret_output: 
                 Callable[[List[float]], T]) -> Tuple[int, int, float]:
        correct: int = 0
        for input, expected in zip(inputs, expecteds):
            result: T = interpret_output(self.outputs(input))
            if result == expected:
                correct += 1
        percentage: float = correct / len(inputs)
        return correct, len(inputs), percentage

## Classification problems 
There are many machine-learning techniques that can be used for classification problems. Perhaps you have heard of support vector machines, decision trees, or naive Bayes classifiers. (There are others, too.) Recently, neural networks have become widely deployed in the classification space. They are more computationally intensive than some of the other classification algorithms, but their ability to classify seemingly arbitrary kinds of data makes them a powerful technique. 

### Normalizing data 
The data sets that we want to work with generally require some “cleaning” before they are input into our algorithms. Cleaning may involve removing extraneous characters, deleting duplicates, fixing errors, and other menial tasks. The aspect of cleaning we will need to perform for the two data sets we are working with is normalization. Normalization is about taking attributes recorded on different scales and converting them to a common scale.

Every neuron in our network outputs values between 0 and 1 due to the sigmoid activation function. It sounds logical that a scale between 0 and 1 would make sense for the attributes in our input data set as well. Converting a scale from some range to a range between 0 and 1 is not challenging. For any value, V, in a particular attribute range with maximum, max, and minimum, min, the formula is just newV = (oldV - min) / (max - min). This operation is known as feature scaling.

In [6]:
# assume all rows are of equal length
# and feature scale each column to be in the range 0 - 1
def normalize_by_feature_scaling(dataset: List[List[float]]) -> None:
    for col_num in range(len(dataset[0])):
        column: List[float] = [row[col_num] for row in dataset]
        maximum = max(column)
    minimum = min(column)
    for row_num in range(len(dataset)):
        dataset[row_num][col_num] = (dataset[row_num][col_num] - minimum) / (maximum - minimum)

### The classic iris data set 
Originally collected in the 1930s, the data set consists of 150 samples of iris plants (pretty flowers), split amongst three different species (50 of each). Each plant is measured on four different attributes: sepal length, sepal width, petal length, and petal width. 

The iris data set is from the University of California’s UCI Machine Learning Repository: M. Lichman, UCI Machine Learning Repository (Irvine, CA: University of California, School of Information and Computer Science, 2013).

Each line represents one data point. The four numbers represent the four attributes (sepal length, sepal width, petal length, and petal width), which, again, are arbitrary to us in terms of what they actually represent. The name at the end of each line represents the particular iris species.

In [7]:
import csv
from typing import List
from random import shuffle

def read_lines(file_name: str, **fmtparams) -> List:
    lines = []
    with open(file_name, mode='r') as csv_file:
        lines: List = list(csv.reader(csv_file, fmtparams))
        shuffle(lines) # get our lines of data in random order
    return lines

In [8]:
iris_parameters: List[List[float]] = []
iris_classifications: List[List[float]] = []
iris_species: List[str] = []
irises: List = read_lines('iris.csv')
for iris in irises:
    parameters: List[float] = [float(n) for n in iris[0:4]]
    iris_parameters.append(parameters)
    species: str = iris[4]
    if species == "Iris-setosa":
        iris_classifications.append([1.0, 0.0, 0.0])
    elif species == "Iris-versicolor":
        iris_classifications.append([0.0, 1.0, 0.0])
    else:
        iris_classifications.append([0.0, 0.0, 1.0])
    iris_species.append(species)

normalize_by_feature_scaling(iris_parameters)

iris_network: Network = Network([4, 6, 3], 0.3)
    
def iris_interpret_output(output: List[float]) -> str:
    if max(output) == output[0]:
        return "Iris-setosa"
    elif max(output) == output[1]:
        return "Iris-versicolor"
    else:
        return "Iris-virginica"
    
# train over the first 140 irises in the data set 50 times
iris_trainers: List[List[float]] = iris_parameters[0:140]
iris_trainers_corrects: List[List[float]] = iris_classifications[0:140]
for _ in range(50):
    iris_network.train(iris_trainers, iris_trainers_corrects)
    
# test over the last 10 of the irises in the data set
iris_testers: List[List[float]] = iris_parameters[140:150]
iris_testers_corrects: List[str] = iris_species[140:150]
iris_results = iris_network.validate(iris_testers, iris_testers_corrects,
     iris_interpret_output)
print(f"{iris_results[0]} correct of {iris_results[1]} = {iris_results[2] * 100}%")

9 correct of 10 = 90.0%


### Classifying wine 
We are going to test our neural network with another data set, one based on the chemical analysis of wine cultivars from Italy. There are 178 samples in the data set.

The layer configuration for the wine-classification network needs 13 input neurons, as was already mentioned (one for each parameter). It also needs three output neurons. (There are three cultivars of wine, just as there were three species of iris.)

In [9]:
wine_parameters: List[List[float]] = []
wine_classifications: List[List[float]] = []
wine_species: List[int] = []
wines: List = read_lines('wine.csv', quoting=csv.QUOTE_NONNUMERIC)
for wine in wines:
    parameters: List[float] = [float(n) for n in wine[1:14]]
    wine_parameters.append(parameters)
    species: int = int(wine[0])
    if species == 1:
        wine_classifications.append([1.0, 0.0, 0.0])
    elif species == 2:
        wine_classifications.append([0.0, 1.0, 0.0])
    else:
        wine_classifications.append([0.0, 0.0, 1.0])
    wine_species.append(species)
        
normalize_by_feature_scaling(wine_parameters)

wine_network: Network = Network([13, 13, 7, 3], 0.5)

def wine_interpret_output(output: List[float]) -> int:
    if max(output) == output[0]:
        return 1
    elif max(output) == output[1]:
        return 2
    else:
        return 3

# train 
MAX = 150
wine_trainers: List[List[float]] = wine_parameters[0:MAX]
wine_trainers_corrects: List[List[float]] = wine_classifications[0:MAX]
for _ in range(20):
    wine_network.train(wine_trainers, wine_trainers_corrects)

# test over the last 28 of the wines in the data set
wine_testers: List[List[float]] = wine_parameters[150:178]
wine_testers_corrects: List[int] = wine_species[150:178]
wine_results = wine_network.validate(wine_testers, wine_testers_corrects, wine_interpret_output)
print(f"{wine_results[0]} correct of {wine_results[1]} = {wine_results[2] * 100}%")

13 correct of 28 = 46.42857142857143%
