# Neural Networks

An Artificial Neural Network (ANN) is a computational model that is inspired by the way biological neural networks in the human brain process information. Artificial Neural Networks have generated a lot of excitement in Machine Learning research and industry, thanks to many breakthrough results in speech recognition, computer vision and text processing.

## The biological neuron
<img src="img/biological_neuron.png">


### Books

* [The Deep Learning Book](http://www.deeplearningbook.org/)
* [The Deep Learning](https://books.google.by/books/about/%D0%93%D0%BB%D1%83%D0%B1%D0%BE%D0%BA%D0%BE%D0%B5_%D0%BE%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5.html?id=Zi48DwAAQBAJ&redir_esc=y)

### Courses

* [cs231n](http://cs231n.github.io)
* [Fast.ai](http://course.fast.ai/)
* [Deep Learning Specialization (by Andrew Ng)](https://www.coursera.org/specializations/deep-learning)

##  Structure of Neural Network

### A single neuron
<img src="img/single_neuron.png">

The above network takes numerical **inputs** X1 and X2 and has **weights** w1 and w2 associated with those inputs. Additionally, there is another input 1 with weight b (called the **Bias**) associated with it. We will learn more details about role of the bias later.

The **output** Y from the neuron is computed as shown in the figure. The function f is non-linear and is called the **Activation Function**. The purpose of the activation function is to introduce **non-linearity** into the output of a neuron. This is important because most real world data is non linear and we want neurons to learn these non linear representations.

## Layer-wise organization

### Neural Networks as neurons in graphs

Neural Networks are modeled as collections of neurons that are connected in an acyclic graph. In other words, the outputs of some neurons can become inputs to other neurons. Cycles are not allowed since that would imply an infinite loop in the forward pass of a network. Instead of an amorphous blobs of connected neurons, Neural Network models are often organized into distinct layers of neurons. For regular neural networks, the most common layer type is the **fully-connected layer** in which neurons between two adjacent layers are fully pairwise connected, but neurons within a single layer share no connections. Below are two example Neural Network topologies that use a stack of fully-connected layers:

<img src="img/neural_net2.jpeg">

### Naming conections

Notice that when we say N-layer neural network, we do not count the input layer. Therefore, a single-layer neural network describes a network with no hidden layers (input directly mapped to output). In that sense, you can sometimes hear people say that logistic regression or SVMs are simply a special case of single-layer Neural Networks. You may also hear these networks interchangeably referred to as “Artificial Neural Networks” (ANN) or “Multi-Layer Perceptrons” (MLP). Many people do not like the analogies between Neural Networks and real brains and prefer to refer to neurons as units.

### Output layer

Unlike all layers in a Neural Network, the output layer neurons most commonly do not have an activation function (or you can think of them as having a linear identity activation function). This is because the last output layer is usually taken to represent the class scores (e.g. in classification), which are arbitrary real-valued numbers, or some kind of real-valued target (e.g. in regression).

### Sizing neural networks

The two metrics that people commonly use to measure the size of neural networks are the number of neurons, or more commonly the number of parameters. Working with the example network in the above picture:

It has 4 + 2 = 6 neurons (not counting the inputs), [3 x 4] + [4 x 2] = 20 weights and 4 + 2 = 6 biases, for a total of 26 learnable parameters.

To give you some context, modern Convolutional Networks contain on orders of 100 million parameters and are usually made up of approximately 10-20 layers (hence deep learning). However, as we will see the number of effective connections is significantly greater due to parameter sharing. More on this in the Convolutional Neural Networks module.

## Example feed-forward computation

Repeated matrix multiplications interwoven with activation function. One of the primary reasons that Neural Networks are organized into layers is that this structure makes it very simple and efficient to evaluate Neural Networks using matrix vector operations. Working with the example three-layer neural network in the diagram above, the input would be a [3x1] vector. All connection strengths for a layer can be stored in a single matrix. For example, the first hidden layer’s weights **W1** would be of size [4x3], and the biases for all units would be in the vector **b1**, of size [4x1]. Here, every single neuron has its weights in a row of **W1**, so the matrix vector multiplication **np.dot(W1, x)** evaluates the activations of all neurons in that layer. Similarly, **W2** would be a [4x4] matrix that stores the connections of the second hidden layer, and **W3** a [1x4] matrix for the last (output) layer. The full forward pass of this 3-layer neural network is then simply three matrix multiplications, interwoven with the application of the activation function:

```python

# forward-pass of a 3-layer neural network:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)
```

In the above code, **W1,W2,W3,b1,b2,b3** are the learnable parameters of the network. Notice also that instead of having a single input column vector, the variable **x** could hold an entire batch of training data (where each input example would be a column of **x**) and then all examples would be efficiently evaluated in parallel. Notice that the final Neural Network layer usually doesn’t have an activation function (e.g. it represents a (real-valued) class score in a classification setting).


[More info](http://cs231n.github.io/neural-networks-1/)