# Deep Learning week 1 notes

### Books to read
- [Deep Learning](http://www.deeplearningbook.org/)
- [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/)
- [Grokking Deep Learning](https://www.manning.com/books/grokking-deep-learning)

### Python lib tutorials
- [Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html#min)
- [SciKit-learn](http://scikit-learn.org/stable/tutorial/basic/tutorial.html)
- [Pyplot](http://matplotlib.org/users/pyplot_tutorial.html)

## Linear regression
**Linear regression** is a method of finding best fitting values *m* and *b* for a line (*y = mx + b*), which passes through the points of a given array with a minimal mean error. Linear regression uses **gradient descent** to closesly approach the local minimum, by computing a partial derivative for each *m* and *b* along the way. As the line approaches the minimal error, derivatives decrease as well as iteration step:
```
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
```

The learning rate and number of iterations is chosen manualy. Wrong learning rate, too big or too small, decreases the speed of reaching the vertex. If the learning rate is too small, low derivative will slow the descent down drastically, so the values may not reach the lower point at all. On the other hand, if the learning rate is too big, iterations will reach some level in the lower part quickly, by jumping from slope to slope in zig-zags, but they might not precisely reach the bottom of the curve in any reasonable time/number of iterations.

[Code](https://github.com/llSourcell/linear_regression_live) from Siraj [live video](https://www.youtube.com/watch?v=uwwWVAgJBcM).

## Neural networks

**Neural network** - a network of neurons, which convert input data to output data. 
A **neuron** or **perceptrons** resemble a body, dendrites (inputs) and an axon (output). As signals enter the neuron through inputs, they are multiplied by **weights** - measure of how much each signal means to the output signal. Then, weighted signals are summed. The resulting signal is feeded through the **activation function**, usually a **sigmoid**, in order to normalize the output.
```
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

output = sigmoid(dot(weights, inputs) + bias)
```

### Learning
Learning of the neural network is done by adjusting the weights by gradient descent of the error - difference between the target output and the neural network output.
``` 
def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))
    
error = target - output
delta_weights = learnrate * error * sigmoid_derivative(dot(weights, inputs)) * input
weights =+ delta_weights 
```
Starting weights are chosen randomly, due to this, weight can get into a local minima. One of the method to avoid this - [momentum](http://sebastianruder.com/optimizing-gradient-descent/index.html#momentum).

### Supporting materials
- [Khan academy's Gradient and directional derivatives course](https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/gradient-and-directional-derivatives/v/gradient)
- [Khan academy's Chain rule course](https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/gradient-and-directional-derivatives/v/gradient)
- [An overview of gradient descent optimization algorithms](http://sebastianruder.com/optimizing-gradient-descent/index.html#momentum)
- [Khan academy's Matrices course](https://www.khanacademy.org/math/precalculus/precalc-matrices)
- [Khan academy's Vectors course](https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/vectors/v/vector-introduction-linear-algebra)