# Introduction to Neural Networks

# 1.  Why neural networks?

- Artificial neural networks are the basis for anything deep learning, as deep learning uses very large ("deep" neural networks)
- Deep learning and neural networks are transforming every major industry.
- For that reason, deep learning skills are highly sought-after

# 2. What is a neural network?

Let's start with an easy example to get an idea of what a neural network is. Imagine a city has 10 ice cream vendors. We would like to predict what the sales amount is for an ice cream vendor given certain input features. Imagine you have several features to predict the sales for each ice cream vendor: the location, the way the ice cream is priced, and the variety in the ice cream offerings.

Let's look at the input feature *location*. You know that one of the things that really affect the sales is how many people will walk by the ice cream shop, as these are all potential customers. And realistically, the volume of people passing is largely driven by the *location*. 

Next, let's look at the input feature *pricing*. How the ice cream is priced really tells us something about the affordability, which will affect sales as well. 

Last, let's look at the *variety in offering*. When an ice cream shop offers a lot of different ice cream flavors, this might be perceived as a higher quality shop just because customers have more flavors to choose from (and might really like that!). On the other hand, *pricing* might also affect perceived quality: customers might feel that the quality is higher when the prices are too. This shows that several inputs might affect one hidden feature, as these features in the so-called "hidden layer" are called. 

In reality, all features will be connected with all nodes in the hidden layer, and weights will be assigned to the edges (more about this later), as you can see in the network below. That's why networks like this are also referred to as **densely connected neural networks**.

![title](figures/Ice_cream_network_smaller.jpg)

When we generalize all this, a neural network looks like the configuration below. 

As you can see, to implement a neural network, we need to give feed it the inputs $x_i$ (location, pricing and variety in the example) and the outcome $y$ (pricing in the example), and all the features in the middle will be figured out automatically in the network. That's why this layer is called the **hidden layer**, with the nodes representing **hidden units**. 

![title](figures/First_network.jpg)

#  3. The power of deep learning. 

In our previous example, we have 3 input units, hidden layer with 4 units and 1 output units. Notice that networks come in all shapes and sizes. This is only one example of what deep learning is capable of! The network described above can be extended almost endlessly:

- We can add more features (nodes) in the input layer.
- We can add more nodes in the hidden layer. Also, we can simply add more hidden layers. This is what turns a neural network in a "deep" neural network (hence, deep learning)
- We can have several nodes in the output layer.

![title](figures/Deeper_network.jpg)

And there is one more thing that makes deep learning extremely powerful: unlike many other statistical and machine learning techniques, deep learning can deal extremely well with **unstructured data**.


In the ice cream vendor example, the input features can be seen as **structured data**. The input features very much take a form of a "classical" data set: observations are rows, features are columns. Examples or **unstructured data** however, are: images, audio files, text data, etc. Historically, and unlike humans, machines had a very hard time interpreting unstructured data. Deep learning was really able to drastically improve machine performance when using unstructured data!

To illustrate the power of deep learning, we describe some applications of deep learning below:

| x | y |
|---|---|
| features of an ice cream shop  | sales |
| Pictures of cats vs dogs | cat or dog? |
| Pictures of presidents | which president is it? |
| Dutch text | English text |
| audio files | text |
|  ... | ... |         



Types or Neural networks:
- Standard neural networks
- Convolutional neural networks (input = images, video)
- Recurrent neural networks (input = audio files, text, time series data)
- Generative adversarial networks

# 4. An introductory example

![title](santa/data/train/santa/00000022.jpg)

You'll see that there is quite a bit of theory and mathematical notation needed when using neural networks. We'll introduce all this for the first time by using an example.
Imagine we have a data set with images. Some of them have Santa in it, others don't. We'll use a neural network to train the model so it can detect whether Santa is in a picture or not.

As mentioned before, this is a kind of problem where the input data is composed of images. Now how does Python read images? To store an image, your computes stores 3 matrices which correspond with 3 color channels: red, green and blue (also referred to as RGB). The numbers in each of the three matrices correspond with the pixel intensity values in each of the three colors. The picture below denotes a hypothetical representation of a 4x4 pixel image (note that 4 x 4 is tiny, generally you'll have much bigger dimensions). Generally, pixel intensity values are on the scale [0,255].

![title](figures/RGB_sm.png)

Having 3 matrices associated with one image, we'll need to modify this shape to get to one input feature vector. You'll want to "unrow" your input feature values into one so-called "feature vector". You should start with unrowing the red pixel matrix, then the green one, then the blue one. Unrowing the RGB matrices  in the image above would result in:

 $x = \begin{bmatrix} 35  \\ 19 \\  \vdots \\ 9 \\7 \\\vdots \\ 4 \\ 6 \\ \vdots \end{bmatrix}$

The resulting feature vector is a matrix with 1 column and 4 x 4 x 3 = 48 rows. Let's introduce some more notation to formalize this all.

$(x,y)$ = a training sample, where $x \in  \mathbb{R}^n , y \in \{0,1\}$. Note that $n$ is the number of inputs in the feature vector (48 in the example).

Let's say we have $l$ training samples. Your training set then looks like this: $\{(x^{(1)},y^{(1)}), \ldots, (x^{(l)},y^{(l)})\}$
Similarly, let's say the test set has $m$ test samples.

Note that the resulting matrix $x$ has dimensions ($n$ x $l%$), and looks like this:

 $ \hspace{1.1cm} x^{(1)} \hspace{0.4cm} x^{(2)} \hspace{1.4cm} x^{(l)} $
 
 $x $= $\begin{bmatrix} 35 & 23 & \cdots & 1\\ 19 & 88 &\cdots & 230\\  \vdots & \vdots & \ddots & \vdots \\ 9 & 3 &\cdots & 222 \\7 &166 &\cdots  &43 \\ \vdots & \vdots & \ddots & \vdots  \\ 4 & 202 & \cdots & 98 \\ 6 & 54 & \cdots & 100 \\ \vdots & \vdots & \ddots & \vdots \end{bmatrix}$

The training matrix for has dimensions $(1$ x $ l)$, and would look something like this:

$y $= $\begin{bmatrix} 1 & 0 & \cdots & 1 \end{bmatrix}$

where 1 means that the image contains a Santa, 0 means there is no Santa in the image.


# 5. Logistic regression as a neural network

So how will we be able to predict wheather y is 0 or 1 for a certain image? You might remember from logistic regression models that the eventual predictor, $\hat y$, is generally never exactly 0 or 1, but some value in between. 

Formally, you'll denote that $ \hat y = P(y=1 \mid x) $. Remember that $x \in  \mathbb{R}^n $. As in classical (logistic) regression we'll need some parameters. 

We'll need some expression here in order to predict 
The parameters here are $w \in  \mathbb{R}^n$ and $b \in \mathbb{R}$. Some expression to get to $\hat y$ could be $\hat y = w^T x + b$. The problem here is, however, that this type of expression does not ensure that the eventual outcome $\y hat$ will be between zero and one, and it could be much bigger than one or even negative! 
A popular solution is to.

This is why $\hat y = \sigma(w^T x + b)$. let's denote $z = w^T x + b$, then $y = \sigma(z)$


\displaystyle\frac{\sum_e^f}{\sum_g^h}$

![title](figures/sigmoid_smaller.png)

![title](figures/log_reg.png)

sources

https://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure1.png

https://towardsdatascience.com/multi-layer-neural-networks-with-sigmoid-function-deep-learning-for-rookies-2-bf464f09eb7f

https://playground.tensorflow.org/
https://www.quora.com/What-tools-are-good-for-drawing-neural-network-architecture-diagrams
https://tex.stackexchange.com/questions/140782/drawing-a-neural-network-architecture
https://tex.stackexchange.com/questions/132444/diagram-of-an-artificial-neural-network

draw latex: https://www.overleaf.com/15216948hcxfqmhtwtyg#/57543386/
