A neural network is based on perceptrons:
 - a perceptron takes a set of inputs $[x_0 ... x_n]$ and turns them into one ouput $y$
 
![image.png](attachment:image.png)

 - Rosenblatt proposed a simple rule to compute the output. He introduced weights, $[w_0 ... w_n]$ real numbers expressing the importance of the respective inputs to the output
 - The neuron's output, 00 or 11, is determined by whether the weighted sum is less than or greater than some **threshold** value. This is called an **activation** function

Convolutional neural networks interpret images into high-level features.

 - Convolution
 - Non Linearity (ReLU)
 - Pooling or Sub Sampling
 - Classification (Fully Connected Layer)
 
 
Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data. It applies a convolution filter/kernel to a neighbourhood of pixels, and summarises it to one value. The matrix formed by sliding the filter over the image and computing the **dot product** is called the ‘Convolved Feature’ or ‘Activation Map’ or the ‘Feature Map‘. It is important to note that filters acts as feature detectors from the original input image. For a 3x3 pixel neighbourhood $a$ and a convolution filter $b$, the dot product is:

${\displaystyle \mathbf {a} \cdot \mathbf {b} =\sum _{i=1}^{n}a_{i}b_{i}=a_{1}b_{1}+a_{2}b_{2}+\cdots +a_{n}b_{n}}$

And visually:

[https://ujwlkarn.files.wordpress.com/2016/07/convolution_schematic.gif?w=268&h=196](https://ujwlkarn.files.wordpress.com/2016/07/convolution_schematic.gif?w=268&h=196)

So a convolution layer converts an image of pixels into a feature map $I\rightarrow F$.

Here are some resources:
 - [THE DEEP LEARNING BOOK](http://www.deeplearningbook.org/)
 - [dimensionality and 1x1 convs](https://stats.stackexchange.com/questions/194142/what-does-1x1-convolution-mean-in-a-neural-network)
 - [explaining convnets with intuition and diagrams](https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/)
 - [diff between neural and deep neural. explains connections, feedforward nets](https://stats.stackexchange.com/questions/182734/what-is-the-difference-between-a-neural-network-and-a-deep-neural-network)
 - [visual overview of convnets tools and concepts](https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/)
 - [invariance in image recog](https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features)
 - " a nonlinear system is a system in which the change of the output is not proportional to the change of the input."
 - NN's and [universal approx](https://en.wikipedia.org/wiki/Universal_approximation_theorem), [activation func](https://en.wikipedia.org/wiki/Activation_function), [perceptron](https://en.wikipedia.org/wiki/Perceptron)
 - [visually: why activations must be nonlinear](http://www.kdnuggets.com/2016/08/role-activation-function-neural-network.html)
 - [how neuronal connections work](http://swanintelligence.com/first-steps-with-neural-nets-in-keras.html)
 
https://qph.ec.quoracdn.net/main-qimg-6718d32785c4b612b6182f52752f24f8?convert_to_webp=true![image.png](attachment:image.png)
https://qph.ec.quoracdn.net/main-qimg-11fc98286eff9cd494a5b4614ba4a01d?convert_to_webp=true![image.png](attachment:image.png)
http://swanintelligence.com/images/2016q1/neuron.png![image.png](attachment:image.png)

## Visualising
Keras conv filter https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py

http://ankivil.com/visualizing-deep-neural-networks-classes-and-features/

new variant of deconv approach https://arxiv.org/pdf/1412.6806v3.pdf

example of deconv  https://datascience.stackexchange.com/questions/20469/keras-visualizing-the-output-of-an-intermediate-layer

class activation maps for visualising where they pay attn https://jacobgil.github.io/deeplearning/class-activation-maps

visualising in tensorflow simply https://medium.com/@awjuliani/visualizing-neural-network-layer-activation-tensorflow-tutorial-d45f8bf7bbc4

visualising intermediate output https://datascience.stackexchange.com/questions/20469/keras-visualizing-the-output-of-an-intermediate-layer

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

https://github.com/fchollet/keras-resources

### Batch normalization
[Batch normalization invention](https://arxiv.org/pdf/1502.03167.pdf)

We define Internal Covariate Shift as the change in the
distribution of network activations due to the change in
network parameters during training. To improve the training,
we seek to reduce the internal covariate shift. By
fixing the distribution of the layer inputs x as the training
progresses, we expect to improve the training speed. It has
been long known (LeCun et al., 1998b; Wiesler & Ney,
2011) that the network training converges faster if its inputs
are whitened – i.e., linearly transformed to have zero
means and unit variances, and decorrelated.


[SELU Paper](https://arxiv.org/pdf/1706.02515.pdf) [SELU](https://github.com/bioinf-jku/SNNs/blob/master/selu.py), [parameters](https://github.com/bioinf-jku/SNNs/blob/master/getSELUparameters.ipynb)



The insight is that you want your neural network to have uniform activations - that is, you want the activations of a layer to be in the same magnitude and not vary too much.

The inputs to a neural network are random variables, and likewise the weights of the NN are the same.

Scaled Exponential Linear Units do precisely this, but on an activation-level. They have properties which are inducing to zero mean and unit variance. 

When we refer to normalizing the activations, what we really are talking about is normalizing the mean and variance of the weight matrix.