In [5]:
using Flux

In [3]:
f(x) = 3x^2 + 2x + 1;



In [7]:
df(x) = Flux.gradient(f, x)[1]; # df/dx = 6x + 2
df(2)

14.0 (tracked)

In [8]:
d2f(x) = Flux.gradient(df, x)[1]
d2f(2)

ErrorException: Use `gradient(...; nest = true)` for nested derivatives

In [9]:
W = rand(2, 5)
b = rand(2)

predict(x) = W*x .+ b


predict (generic function with 1 method)

In [10]:
function loss(x, y)
  ŷ = predict(x)
  sum((y .- ŷ).^2)
end

x, y = rand(5), rand(2) # Dummy data
loss(x, y) # ~ 3

11.807289289127482

# Building Layers

In [11]:
W1 = rand(3, 5)
b1 = rand(3)
layer1(x) = W1 * x .+ b1

W2 = rand(2, 3)
b2 = rand(2)
layer2(x) = W2 * x .+ b2

model(x) = layer2(σ.(layer1(x)))

model(rand(5)) # => 2-element vector

2-element Array{Float64,1}:
 2.22418481693637
 2.433389591664276

In [12]:
?foldl

search: [0m[1mf[22m[0m[1mo[22m[0m[1ml[22m[0m[1md[22m[0m[1ml[22m map[0m[1mf[22m[0m[1mo[22m[0m[1ml[22m[0m[1md[22m[0m[1ml[22m [0m[1mf[22m[0m[1mo[22m[0m[1ml[22m[0m[1md[22mr map[0m[1mf[22m[0m[1mo[22m[0m[1ml[22m[0m[1md[22mr



```
foldl(op, itr; [init])
```

Like [`reduce`](@ref), but with guaranteed left associativity. If provided, the keyword argument `init` will be used exactly once. In general, it will be necessary to provide `init` to work with empty collections.

# Examples

```jldoctest
julia> foldl(=>, 1:4)
((1 => 2) => 3) => 4

julia> foldl(=>, 1:4; init=0)
(((0 => 1) => 2) => 3) => 4
```


In [14]:
?softmax

search: [0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m [0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m! ∇[0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m ∇[0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m! log[0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m log[0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m! ∇log[0m[1ms[22m[0m[1mo[22m[0m[1mf[22m[0m[1mt[22m[0m[1mm[22m[0m[1ma[22m[0m[1mx[22m



```
softmax(x; dims=1)
```

[Softmax](https://en.wikipedia.org/wiki/Softmax_function) turns input array `x`  into probability distributions that sum to 1 along the dimensions specified by `dims`. It is semantically equivalent to the following:

```
softmax(x; dims=1) = exp.(x) ./ sum(exp.(x), dims=dims)
```

with additional manipulations enhancing numerical stability.

For a matrix input `x` it will by default (`dims=1`) treat it as a batch of vectors, with each column independent. Keyword `dims=2` will instead treat rows independently,  etc...

```julia-repl
julia> softmax([1, 2, 3])
3-element Array{Float64,1}:
  0.0900306
  0.244728
  0.665241
```

See also [`logsoftmax`](@ref).


In [15]:
layers = [Dense(10, 5, σ), Dense(5, 2), softmax]

model(x) = foldl((x, m) -> m(x), layers, init = x)

model(rand(10)) # => 2-element vector

Tracked 2-element Array{Float32,1}:
 0.5703416f0
 0.4296584f0

In [16]:
model2 = Chain(
  Dense(10, 5, σ),
  Dense(5, 2),
  softmax)

model2(rand(10)) # => 2-element vector

Tracked 2-element Array{Float32,1}:
 0.4832265f0
 0.5167735f0

In [17]:
Flux.@functor Affine

LoadError: UndefVarError: @functor not defined

In [18]:
m = Chain(Conv((3, 3), 3 => 16), Conv((3, 3), 16 => 32))

Chain(Conv((3, 3), 3=>16), Conv((3, 3), 16=>32))

In [21]:
Flux.outdims(m, (10, 10))

UndefVarError: UndefVarError: outdims not defined