### Flux

In [1]:
using Flux
using Flux.Tracker: gradient

In [11]:
# 一些基本的操作
(1:5) .* (1:5)'

5×5 Array{Int64,2}:
 1   2   3   4   5
 2   4   6   8  10
 3   6   9  12  15
 4   8  12  16  20
 5  10  15  20  25

In [12]:
x = [1, 2, 3]
x = [1 2; 3 4]
x = rand(5, 3)
x = rand(BigFloat, 5, 3)
x = rand(Float32, 5, 3)
length(x)  # 长度
size(x)    # 尺寸
x[2, 3]    # 索引
x[:, 3]    # 获取整行，像ode那种就不行
x + x      # 矩阵相加
x .+ 1     # 矩阵加常数
zeros(5,5) .+ (1:5)   # 理解这是怎么加的
(1:5) .* (1:5)'
W = randn(5, 10)      # 矩阵乘法
x = rand(10)
W * x


5-element Array{Float64,1}:
 -1.5280229671582855
 -0.8907901449186502
  2.4692493354385805
 -0.7573421933228726
  1.0693267200698322

》》》在原本的tutorial中，这个是为了derivate函数来设计的，不过这个函数现在已经被丢弃了

In [30]:
f(x) = 3 .*x.^2 .+ 2 .*x .+ 1
println(f([1,2,3]))
df(x) = gradient(f, x, nest=true)[1]    # 可以直接生成导函数
println(df(6))
ddf(x) = gradient(df, x)
ddf(5)

[6, 17, 34]
38.0 (tracked)


(6.0 (tracked),)

》》》考虑当输入x为张量（矩阵）时，如下所示

In [31]:
# Tracker应该是类似tf中graph的存在
using Flux.Tracker: gradient
myloss(W, b, x) = sum(W * x .+ b)
W = randn(3, 5)
b = zeros(3)
x = rand(5)
# W, b, x 分别对应myloss的三个参数，计算结果分别对应三个参数的导数
gradient(myloss, W, b, x) 

([0.9391502998225114 0.7674645991546505 … 0.5224331036148129 0.0626184715521072; 0.9391502998225114 0.7674645991546505 … 0.5224331036148129 0.0626184715521072; 0.9391502998225114 0.7674645991546505 … 0.5224331036148129 0.0626184715521072] (tracked), [1.0, 1.0, 1.0] (tracked), [0.33835051365464414, 0.4639248567104407, -1.0556493345309845, -0.49252737911456956, -1.301568942977636] (tracked))

### 图的形式
上一小节的表达直接用了函数来计算

In [36]:
using Flux.Tracker: param, back!, grad

W = param(randn(3, 5))   # 使用param来标注出不同
b = param(zeros(3))
x = rand(5)

y = sum(W * x .+ b)     # 不同于之前的

back!(y)                # 从结果y开始，反向传播
println(grad(W))
println(grad(b))        # 很显然这部分应该全是1

[0.11254548320266577 0.1430648854859362 0.3318530809523499 0.9815254552559451 0.790644575891239; 0.11254548320266577 0.1430648854859362 0.3318530809523499 0.9815254552559451 0.790644575891239; 0.11254548320266577 0.1430648854859362 0.3318530809523499 0.9815254552559451 0.790644575891239]
[1.0, 1.0, 1.0]


### 使用成型的层
一般我们在构造网络的时候会直接用已经封装好的`layers`，所以下面介绍一下对layers的操作，主要包括：
1. 前向计算
2. `layers`的连接
3. 获取模型中的参数，以及对指定参数求导

In [40]:
using Flux

m = Dense(10, 5)             # 一个10输入，5输出的Dense模块

x = rand(Float32, 10)

m(x)                         # 直接计算输出

m(x) == m.W * x .+ m.b      # m.W 和 m.b为Dense对象的参数

params(m)                    # 这个很重要，获取参数并且对参数进行修改

m = Chain(Dense(10, 5, relu), Dense(5, 2), softmax)     # 使用Chain连接不同的模型

l = sum(Flux.crossentropy(m(x), [0.5, 0.5]))            # 将结果进行交叉熵计算
back!(l)                     # 反向传播计算

grad.(params(m))             # 使用params(m)获取是哪些参数，并计算导数

4-element Array{Array{Float32,N} where N,1}:
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.010352412 0.05010504 … 0.093890734 0.05208994; 0.0 0.0 … 0.0 0.0]
 [0.0, 0.0, 0.0, 0.11023957, 0.0]                                                                              
 [0.0 0.0 … 0.08931084 0.0; 0.0 0.0 … -0.089310855 0.0]                                                        
 [0.13391215, -0.13391218]                                                                                     

### 更新权重参数
不是一定要使用层，当层不满足需求的时候我们也可以自定义，不过使用`layers`的确可以加快开发迭代

下面是一个手写的更新过程

In [41]:
using Flux.Tracker: update!

η = 0.1
for p in params(m)
  update!(p, -η * grad(p))       # 对每一个layers中的参数迭代计算新的数值
end

### 使用库中的迭代方法
随着系统变得复杂，更新值的方法也更复杂，就没必要全部自己手写了

好像这里举例说明的两个已经被抛弃了！

In [42]:
opt = SGD(params(m), 0.01)
opt()     # 更新权值

│   caller = top-level scope at In[42]:1
└ @ Core In[42]:1
│   caller = ip:0x0
└ @ Core :-1


In [43]:
?fill

search: [0m[1mf[22m[0m[1mi[22m[0m[1ml[22m[0m[1ml[22m [0m[1mf[22m[0m[1mi[22m[0m[1ml[22m[0m[1ml[22m! [0m[1mf[22m[0m[1mi[22mna[0m[1ml[22m[0m[1ml[22my [0m[1mf[22m[0m[1mi[22mnda[0m[1ml[22m[0m[1ml[22m [0m[1mf[22m[0m[1mi[22m[0m[1ml[22mter [0m[1mf[22m[0m[1mi[22m[0m[1ml[22mter! [0m[1mf[22m[0m[1mi[22m[0m[1ml[22mesize [0m[1mf[22m[0m[1mi[22m[0m[1ml[22memode is[0m[1mf[22m[0m[1mi[22m[0m[1ml[22me



```
fill(x, dims)
```

Create an array filled with the value `x`. For example, `fill(1.0, (5,5))` returns a 5×5 array of floats, with each element initialized to `1.0`.

# Examples

```jldoctest
julia> fill(1.0, (5,5))
5×5 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
```

If `x` is an object reference, all elements will refer to the same object. `fill(Foo(), dims)` will return an array filled with the result of evaluating `Foo()` once.


In [49]:
data, labels = rand(10, 100), fill(0.5, 2, 100)
loss(x, y) = sum(Flux.crossentropy(m(x), y))
Flux.train!(loss, [(data,labels)], opt)
# Flux.train!(loss, params(m),[(data,labels)], opt)

│   caller = top-level scope at In[49]:3
└ @ Core In[49]:3


In [47]:
rand(2,3)

2×3 Array{Float64,2}:
 0.162184  0.113996  0.330483
 0.358728  0.884668  0.311931

### 一个完整训练过程
下面是例子训练的图片
![title](https://pytorch.org/tutorials/_images/cifar10.png)
实际数据为`32X32`，RGB三通道

In [74]:
using Statistics
# using CuArrays
using Flux, Flux.Tracker, Flux.Optimise
using Metalhead, Images
using Metalhead: trainimgs
using Images.ImageCore
using Flux: onehotbatch, onecold
using Base.Iterators: partition

In [75]:
# 先下载 CIFAR10数据集，并且将其分batch
Metalhead.download(CIFAR10)
X = trainimgs(CIFAR10)
labels = onehotbatch([X[i].ground_truth.class for i in 1:50000],1:10)

# 随机显示其中的图片
image(x) = x.img # handy for use later
ground_truth(x) = x.ground_truth
image.(X[rand(1:end, 10)])

In [76]:
getarray(X) = float.(permutedims(channelview(X), (2, 3, 1)))
imgs = [getarray(X[i].img) for i in 1:50000]

train = gpu.([(cat(imgs[i]..., dims = 4), labels[:,i]) for i in partition(1:49000, 1000)])
valset = 49001:50000
valX = cat(imgs[valset]..., dims = 4) |> gpu
valY = labels[:, valset] |> gpu

10×1000 Flux.OneHotMatrix{Array{Flux.OneHotVector,1}}:
 0  0  0  0  1  0  1  0  0  0  0  0  0  …  0  0  0  0  1  0  1  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  1  0  0  0  1  0  0  0  0  1  1
 0  0  0  0  0  0  0  0  1  0  0  0  0     0  0  0  1  0  0  0  1  0  0  0  0
 0  0  0  0  0  0  0  0  0  1  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0  0  0  0     0  0  1  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  1  0  0  0  0  0  0  0  …  1  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  1  0  0  0
 0  0  0  0  0  0  0  0  0  0  1  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  0  0  0  0  0  0  1  0  0  0  1  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  1  0  1  0  0  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  1  0  0

In [77]:
m = Chain(
  Conv((5,5), 3=>16, relu),
  x -> MaxPool((2,2)),
  Conv((5,5), 16=>8, relu),
  x -> MaxPool((2,2)),
  x -> reshape(x, :, size(x, 4)),
  Dense(200, 120),
  Dense(120, 84),
  Dense(84, 10),
  softmax) |> gpu

Chain(Conv((5, 5), 3=>16, relu), #47, Conv((5, 5), 16=>8, relu), #48, #49, Dense(200, 120), Dense(120, 84), Dense(84, 10), softmax)

In [78]:
using Flux: crossentropy, Momentum

loss(x, y) = sum(crossentropy(m(x), y))
opt = Momentum(params(m), 0.01)

│   caller = top-level scope at In[78]:4
└ @ Core In[78]:4


#24 (generic function with 1 method)

In [79]:
accuracy(x, y) = mean(onecold(m(x), 1:10) .== onecold(y, 1:10))

accuracy (generic function with 1 method)

In [80]:
epochs = 10

for epoch = 1:epochs
  for d in train
    l = loss(d...)
    back!(l)
    opt()
    @show accuracy(valX, valY)
  end
end

MethodError: MethodError: no method matching (::Conv{2,4,typeof(relu),TrackedArray{…,Array{Float32,4}},TrackedArray{…,Array{Float32,1}}})(::MaxPool{2,4})
Closest candidates are:
  Conv(!Matched::AbstractArray) at /root/.julia/packages/Flux/dkJUV/src/layers/conv.jl:53