In [1]:
%run ../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# Artificial Neural Networks in a nutshell

## The gist of what they are

An artificial neural network (often shortened as ANN) is an attempt at artificially representing the network of neurons in the brain and their functioning with software. It consists of neural units (*artificial neurons*), each capable of receiving input and producing output, and which are connected together. 

The fun thing about ANNs, and the whole point of them, is the fact that they mimic "learning". Inputs to the network are weighted and a learning mechanism consists in an iterative self-adjustment of these weights in such a way to achieve optimal correspondence to the desired result on training data. This way, the network is meant to emulate the synapses of the brain in their capability to carry information to one neuron to another. 

### A bit on the biological link

The biological metaphor of these algorithms to actual neural networks in the brain is more of an inspiration than a grounded reality. ANNs were conceived with the idea to mimic how the human brain works but in reality they are a far shout from actually doing this well and comprehensively, and also, we don't know the human brain well enough yet anyway. In fact, it is actually confusing to say that neural networks "mimick the brain", as the brain doesn't really work as they do. On this, the discussion in the opening chapter of [Chollet's book](#chollet) is a very good one.

Following this [page](#biology), we can say that, in general:

* real neurons are slower than artificial ones, but there's really plenty and the way they communicate is non-trivial
* real networks use energy very efficiently
* real networks can do several highly complex operations at a time

## Artificial neurons and networks 

### The artificial neuron

<img src="../../imgs/ann.jpg" width="500" align="left" style="margin:20px"/>

This here in this figure is a schematic model of an artificial neuron. 

Several input data $\mathbf{x} = (x_1, \ldots, x_n)$, represented by the $i$, are streamed into the neuron and combined with *weights* $\mathbf{w} = (w_1, w_2, \ldots, w_n)$, so that they get weighted, in a linear combination $\mathbf{w} \cdot \mathbf{x}$. The neuron is also equipped with a threshold $t$, so that the combination becomes $\mathbf{w} \cdot \mathbf{x} + t$, or $\mathbf{w} \cdot \mathbf{x} -b$ where $b = -t$ is the bias. Note that if we assume a further input poin $x_0=1$ and use the bias as a further weight, we can write the combination as $\mathbf{w} \cdot \mathbf{x}$ where $\mathbf{w} = (b, w_1, w_2, \ldots, w_n)$ and $\mathbf{x} = (x_0, x_1, x_2, \ldots, x_n)$.

This weighted combination of inputs is passed to the $activation function$, which determines what is the output of the neuron. The activation function is what determines how the neuron fires. It takes the weighted input and determined what is the output, there exist different implemented choices for the activation function of a neuron/network. The simplest choice would be a linear function, but it can be shown that it wouldn't be a smart choice as it does not really allow the neuron to learn. This point and more about activations will be explored in other sections.

### A network of neurons

<figure style="float:right;">
  <img src="../../imgs/nielsen_ann.png" width="400" align="center" style="margin:30px 70px"/>
  <figcaption>A (feedforward, see later) neural network. Image from [Nielsen's book](#nielsen).</figcaption>
</figure>

To build a network of neurons, what you have to do is put several of them together in a way that they can communicate. Neurons are grouped into *layers*, groups that reside at the same level, so that communication is passed from one layer of neurons to the other. In a network, there is 

* an *input* layer: the one at the start of the communication process
* an *output* layer: the one that spits the final result at the end of the process
* one or more *hidden* layers: the layers in between that constitute the intermediate steps

Each layer can be composed of however many neurons you wish. This means that if there are $n$ neurons at a given stage, each neuron in the following stage will receive $n$ inputs. In much the same way as the transfer function uses a combination of weighted inputs into a neuron, the input to any neuron in a certain layer is a weighted sum of all outputs of the neurons of the previous layer. 

The network will learn by minimising a *loss function* via optimisation (typically gradient descent but not necessarily): this is the job of the *optimiser*, which regulates how exactly learning is performed. In essence, it applies gradient descent, with specific variants. The network gets its training started with weights initialised with some choice, say a random one. Training means allowing it to learn the best combination of weights that adheres to the input data.

The optimiser is also paired with a mechanism called *backpropagation*, which acts during gradient descent and allows the weights to be adjusted continuouly in order to iteratively improve the accuracy of the results (minimising the loss function). What backpropagation does in practice is computing the derivatives of the cost function with respect to the weights and propagating them from the last layer back to the first one. Each weight gets modified iteratively by an amount which is proportional to the derivative of the cost function with respect to it, which is what gradient descent uses.

## Universality of neural networks

It can be shown (well, it's been mathematically [proven](#universality)) that neural networks can be "taught" to approximate any continuous function: the more neurons, the better the approximation achieved. This result is called the *Universal Approximation Theorem*. A very good blog post on the topic is [this one](#universality-blog), and the references below contain some other good reads as well.

## Types of ANNs and pills of history

This here is no more than a super-quick and very high-level introduction to several types of neural networks one can buiild, the details of which are explored elsewhere in this chapter. You can find a more comprehensive outline of the different types of networks in this [article](#nets), which contains a very cool illustration by F Van Veen. The article also reports some important papers about the mentioned networks.

The categories listed here describe different properties of the network, so categories can be combined to build networks. For example, you can build a deep convolutional network. The things loosely described down here will be expanded in the respective places in the chapter.

### Deepness

A network is called *deep* simply when it contains more than one hidden layer, usually many for today's applications, allowing for learning incredibly complex patterns in the data. Deep neural networks are those beasts performing *deep learning*, this (relatively) new trend in machine learning/artificial intellingence which is starting to tackle very complicated problems with impressive results. 

Deep Learning as a thing (a field) is not a new concept, it dates its birth back from the 1980s, but their big resurgence has been from the id 2000s, on when they have been finally shown to be capable to learn in an efficient way. Before then research on deep architectures hadn't reached the point where these tools could be put to use for any practical reason, due to time complexity, hardware bottlenecks, and overall lack of efficiency. 

### Learning architecture

#### Feedforward 

In a feedforward network, communication flows in a horizontal way, meaning the output of neurons in a certain layer is passed to neurons in the next layer horizontally, there is no going backwards. Feedforward networks were conceived straight with the birth of the perceptron (the first ANN born), so in the 1950s. 

#### Recurrent 

Recurrent networks have loops, so the output of a neuron can be fed back to the neuron itself, allowing for the dynamicity which is missing in the feed-forward model. These types of networks are implemented in such a way that there is the time factor embedded in, meaning neurons fire only within a specific window of time, allowing for feedback communication to not be propagated instantaneously (which would be difficult to control). These types of networks have a concept of *memory* and there's several types of them. 

Recurrent networks were born in the 1980s. They are particularly suited for problems which involve the temporal component, like those dealing with natural language. 
 
### Topology

#### Convolutional networks

Convolutional networks have one or more convolutional layers, where a neuron is not connected to all other ones and the output is obtained via a convolution operation on the input data. Unlike fully connected networks, here neurons in a layer are only connected to a group of neurons in the previous layer and this makes for a very nice functionality.

Convolutional networks are well suited for tasks related to vision, that is, where the input data consists of images or video data: for these sorts of tasks, in most typical case, a fully connected network would have to perform too many operations and be too large to be of any practical use (a neuron there, for an image of size 32x32 and in RGB, so with a 3 dimensional colour space, would have $32\cdot32\cdot3$ weights), while the use of convolutions saves complexity. 

The inspiration for these categories of networks came from the vision systems of the biological world, and this is why they have been designed specifically for machine vision tasks. An image gets passed to the network in batches of input data: at the very start, the first batch of $n$ pixels gets in, then a counter is shifted by one pixel and the second batch of $n$ pixels goes in. This mechanism is loosely borrowed from what the neurons in the visual cortex do. They only deal with a certain part of the visual field at once, that is, with a pixel and its neighbours. 

The first convolutional networks date from the 1990s (even though the concepts are decades older) but they became ubiquitous in the 2010s with the many visual applications they serve nowadays. In fact, they are particularly suited for image tasks as exhibit a natural ability to capture spatial structures.

## References

### General

1. <a name="chollet"></a> F Chollet, **Deep Learning with Python**, *Manning*, 2017
2. <a name="nielsen"></a> M Nielsen, [**Neural networks and deep learning**](http://neuralnetworksanddeeplearning.com), Determination Press, 2015
3. <a name="nets"></a> [**The neural network zoo**](http://www.asimovinstitute.org/neural-network-zoo/), an article + illustration by F Van Veen at the Asimov Institute

### About artificial neurons

1. []()


### About the biological link

1. <a name="biology"></a> [The biological inspiration for ANNs](http://read.pudn.com/downloads164/doc/747044/L01.pdf), lecture from a course in Machine Learning by A. Papli≈Ñski

### About the universality theorem

1. <a name="universality"></a> G Cybenko, [**Approximation by superposition of sigmoidal function**](http://www.dartmouth.edu/~gvc/Cybenko_MCSS.pdf), *Math Control Signal System*, 2, 1989
2. <a name="universality-blog"></a> D McNeela [**The Universal Approximation Theorem for Neural Networks**](http://mcneela.github.io/machine_learning/2017/03/21/Universal-Approximation-Theorem.html)
3. <a></a> J Klein [**A Simple Proof of the Universal Approximation Theorem**](https://blog.goodaudience.com/neural-networks-part-1-a-simple-proof-of-the-universal-approximation-theorem-b7864964dbd3)
4. B Fortuner, [**Can neural networks really learn any function?**](https://towardsdatascience.com/can-neural-networks-really-learn-any-function-65e106617fc6)

### About types of networks

1. [**Convolutional Neural Networks for visual recognition**](http://cs231n.github.io/convolutional-networks/), a Stanford CS class

### Something funky

1. <a name="series"></a> [**Comparison of artificial neural networks and human brains on solving number series**](http://www.cogsys.wiai.uni-bamberg.de/teaching/ws1112/km/practice/ANNs.pdf), a students' project at the University of Bamberg