In [1]:
using Flux

## Approximate A Function by A Neural Network
Here is the simple function we are trying to approximate.

In [2]:
actual(x) = 4x + 2

actual (generic function with 1 method)

In [3]:
x_train, x_test = hcat(0:5...), hcat(6:10...)

([0 1 … 4 5], [6 7 … 9 10])

Note how Flux puts the input data differently from Aurelien Geron/`keras`/`tensorflow`:

- Flux: `(n_features, n_instances)`
- `keras`: `(n_instances, n_features)`

In [4]:
y_train, y_test = actual.(x_train), actual.(x_test)

([2 6 … 18 22], [26 30 … 38 42])

In [5]:
model = Dense(1,1)

Dense(1, 1)         [90m# 2 parameters[39m

In [6]:
model.weight

1×1 Matrix{Float32}:
 -0.8380664

In [7]:
model.bias

1-element Vector{Float32}:
 0.0

In [8]:
model(x_train)

1×6 Matrix{Float32}:
 0.0  -0.838066  -1.67613  -2.5142  -3.35227  -4.19033

In [9]:
model.weight * x_train .+ model.bias

1×6 Matrix{Float32}:
 0.0  -0.838066  -1.67613  -2.5142  -3.35227  -4.19033

Let's create the inputs in Aurelien Geron's way, i.e. of shape `(n_instances, n_features)`

In [10]:
X_train = hcat(0:5)

6×1 Matrix{Int64}:
 0
 1
 2
 3
 4
 5

In [11]:
model(X_train)

LoadError: DimensionMismatch("matrix A has dimensions (1,1), matrix B has dimensions (6,1)")

It seems that we can't. We have to follow Flux's rule to play the game, unless we do this manually like below.

In [12]:
X_train * model.weight .+ model.bias

6×1 Matrix{Float32}:
  0.0
 -0.8380664
 -1.6761328
 -2.5141993
 -3.3522656
 -4.190332

 But it would be too tedious if we have to do this every time. So, follow Flux's convention when in Flux's land.

In [13]:
loss(x, y) = Flux.Losses.mse(model(x), y)

loss (generic function with 1 method)

In [14]:
loss(x_train, y_train)

266.94382f0

## Optimizer, (Trainable) Parameters and `train!`

In [15]:
parameters = params(model)

Params([Float32[-0.8380664], Float32[0.0]])

In [16]:
model.weight in parameters, model.bias in parameters

(true, true)

In [17]:
0.0 in parameters

false

In [18]:
[0.0] in parameters

true

In [19]:
optimizer = Descent()

Descent(0.1)

In [20]:
data = [(x_train, y_train)]

1-element Vector{Tuple{Matrix{Int64}, Matrix{Int64}}}:
 ([0 1 … 4 5], [2 6 … 18 22])

In [21]:
# Before one train!
parameters, loss(x_train, y_train)

(Params([Float32[-0.8380664], Float32[0.0]]), 266.94382f0)

In [22]:
train!(loss, parameters, data, optimizer)

LoadError: UndefVarError: train! not defined

**(?)** Why other functionalities from Flux don't need, but only `train!` needs this?
```julia
using Flux: train!
```

In [23]:
using Flux: train!
train!(loss, parameters, data, optimizer)

In [24]:
# After one train!
parameters, loss(x_train, y_train)

(Params([Float32[9.031723], Float32[2.8190334]]), 253.3604f0)

One `train!` is basically one epoch through the training set `data`.

Let's run another 200 epochs.

In [25]:
for i in 1:200
  train!(loss, parameters, data, optimizer)
end

In [26]:
# After another 200 epochs
parameters, loss(x_train, y_train)

(Params([Float32[4.02637], Float32[2.0074282]]), 0.0074088597f0)

The parameters have come pretty close to `actual`

In [27]:
model(x_test)

1×5 Matrix{Float32}:
 26.1656  30.192  34.2184  38.2448  42.2711

In [28]:
y_test

1×5 Matrix{Int64}:
 26  30  34  38  42