Everyone wants to make learning deep learning easier... what if we try to learn two things at once to keep it more interesting?

## 1 Assumptions

* You already know a programming language like Python, R or Javascript
* You know basic linear algebra
* Have installed [Anaconda](https://www.anaconda.com/) (easy way to get Jupyter + deps)
* Have installed [Julia 1.1](https://julialang.org/)

Then

* Start with `jupyter notebook`
* Open this or create a new Julia 1.1. notebook.

Jupyter notebook has two modes: edit and command. Esc enters command mode from edit mode. Enter does the opposite. Print this [cheat sheet](jupyter-shortcuts.pdf) (for Mac) or hit `h` in command mode to see shortcuts.

## 2 Basic Julia

(It works mostly like you'd expect)

In [1]:
print("hello world")

hello world

In [2]:
4 + 3

7

Julia has support for concise/clear definition of functions.

In [4]:
f(x) = x + 1
f(5)

6

And anonymous functions

In [24]:
x -> x + 1

#3 (generic function with 1 method)

Julia has built in support for maths on vectors (e.g. elementwise ops use a . in front of standard operator) and a nice syntax for doing vectors and matrices

In [26]:
[1 2 3] .* [3 4 5]

1×3 Array{Int64,2}:
 3  8  15

In [27]:
[1 2; 4 5]

2×2 Array{Int64,2}:
 1  2
 4  5

In [28]:
[1 3; 6 7] * [-1 0; 1 2]

2×2 Array{Int64,2}:
 2   6
 1  14

Julia has the awesome pipeline operator `|>` for clearly combining functions. No(ridiculous(nested(brackets))). Take an input pass it to a function, then pass output to next function and so on. Even Javascript is (hopefully) [adding this](https://github.com/tc39/proposal-pipeline-operator)

In [25]:
5 |> f |> f |> x -> x * 2 

14

## 3 A very simple neural network

Neural networks basically take an input of some numbers and produce an output of some numbers.

In [8]:
using LinearAlgebra

In [16]:
function simple_nn(input)
   dot(input, [0.1 0.3 0.2]) 
end


simple_nn([0.4 0.2 0.1])

0.12000000000000001

Yes, it is just a weighted sum (dot product of some input with some hardcoded values). The input could be some observed characteristics and the output a prediction of membership of a group. 

We can also do a multi-input, multi-output network (yes this is just matrix multiplication):

In [20]:
function simple_nn2(input)
   input * [0.1 0.2 0.3; 0.4 -0.1 0.1; 0.1 0.4 0.2] 
end

simple_nn2([0.4 0.2 0.1])

1×3 Array{Float64,2}:
 0.13  0.1  0.16

Typically we will stack multiple neural networks. Let's combine the two we have used so far (using the pipeline operator for clarity):

In [23]:
[0.1 0.2 0.3] |> simple_nn2 |> simple_nn

0.07

## 4 Gradient Descent (i)

So we can make very simple neural networks. But how do they learn? How do we update the weights? We'll start with a very simple gradient descent process for a trivial model.

In [29]:
prediction(x, weight) = x * weight

prediction (generic function with 1 method)

We want a way of calculating the error. A standard approach is Mean Squared Error (mse)

In [36]:
mse(pred, actual) = (pred * actual) ^ 2

mse (generic function with 1 method)

In [65]:
input = 0.4
goal = 0.32
weight = 0.5 # 'correct' weight is 0.8, want to learn this

p = prediction(input, weight)
mse(p, goal)

0.004096

Okay so we have an error for this trivial, fake situation. But how would we update the weight to improve (reduce the error)? Let's multiply the absolute error by the input.

In [66]:
abserr_times_input = (p - goal) * input
abserr_times_input

-0.048

Now let's adjust the weights:

In [67]:
new_weight = weight - abserr_times_input

0.548

In [68]:
input = 0.4

p2 = prediction(input, new_weight)
mse(p2, goal)

0.004920180736000002

Let's try iterating

In [72]:
w = new_weight

for i = 1:10
    p = prediction(input, w)
    delta = (p - goal)
    w -= delta
    np = prediction(input, w)
    print("($(i)) Weight $(w) MSE $(mse(np, goal))\n")
end

(1) Weight 0.6488 MSE 0.006896704552960001
(2) Weight 0.70928 MSE 0.008242431891865602
(3) Weight 0.745568 MSE 0.009107400992751617
(4) Weight 0.7673407999999999 MSE 0.009647094224398581
(5) Weight 0.78040448 MSE 0.009978366400988288
(6) Weight 0.788242688 MSE 0.010179813952478665
(7) Weight 0.7929456128 MSE 0.010301648811766049
(8) Weight 0.7957673676799999 MSE 0.010375097605560012
(9) Weight 0.7974604206079999 MSE 0.010419292117996146
(10) Weight 0.7984762523647999 MSE 0.010445853910475337


As you can see at each step we are reducing the error and getting close to the correct solution