## Softmax

$$softmax(x)_i =  \frac{\exp{x_i}}{\sum_{j=1}^{n} \exp{x_j}}$$

Exponential

In [4]:
exp(32)

7.896296018268069e13

Summation

In [5]:
test = [1,2,3]

3-element Array{Int64,1}:
 1
 2
 3

In [6]:
sum(test)

6

In [10]:
den_softmax(x) = exp.(x)./sum(exp.(x))

den_softmax (generic function with 1 method)

In [11]:
den_softmax([0.1, 0.2])

2-element Array{Float64,1}:
 0.4750208125210601
 0.52497918747894  

In [21]:
using Pkg;
Pkg.add("Knet")
Pkg.build("SpecialFunctions")
Pkg.build("CodecZlib")

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `~/.julia/environments/v1.0/Manifest.toml`
[90m [no changes][39m
[32m[1m  Building[22m[39m SpecialFunctions → `~/.julia/packages/SpecialFunctions/fvheQ/deps/build.log`
[32m[1m  Building[22m[39m CodecZlib → `~/.julia/packages/CodecZlib/DAjXH/deps/build.log`


In [23]:
using Knet

In [24]:
softmax([0.1, 0.2])

2-element Array{Float64,1}:
 0.47502081252106
 0.52497918747894

In [28]:
den_softmax([0.000000000000000000000000001, 0.000000000000000000000000000000002])

2-element Array{Float64,1}:
 0.5
 0.5

In [29]:
softmax([0.000000000000000000000000001, 0.000000000000000000000000000000002])

2-element Array{Float64,1}:
 0.5
 0.5

Housing example to illustrate gradient descent
https://denizyuret.github.io/Knet.jl/latest/backprop.html#Stochastic-Gradient-Descent-1

In [30]:
include(Knet.dir("data","housing.jl"))


In [31]:
x,y = housing()  # x is (13,506); y is (1,506)


┌ Info: Downloading https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data to /home/barton/.julia/packages/Knet/pfgZS/data/housing/housing.data
└ @ Main /home/barton/.julia/packages/Knet/pfgZS/data/housing.jl:27
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49082  100 49082    0     0   111k      0 --:--:-- --:--:-- --:--:--  111k


([-0.419367 -0.416927 … -0.407361 -0.41459; 0.284548 -0.48724 … -0.48724 -0.48724; … ; 0.440616 0.440616 … 0.402826 0.440616; -1.0745 -0.491953 … -0.864446 -0.668397], [24.0 21.6 … 22.0 11.9], [-0.419367 -0.416927 … -0.407361 -0.41459; 0.284548 -0.48724 … -0.48724 -0.48724; … ; 0.440616 0.440616 … 0.402826 0.440616; -1.0745 -0.491953 … -0.864446 -0.668397], [24.0 21.6 … 22.0 11.9])

# Dataset description https://www.kaggle.com/c/boston-housing

In [32]:
x

13×506 Array{Float64,2}:
 -0.419367  -0.416927  -0.416929  …  -0.413038  -0.407361  -0.41459 
  0.284548  -0.48724   -0.48724      -0.48724   -0.48724   -0.48724 
 -1.28664   -0.592794  -0.592794      0.115624   0.115624   0.115624
 -0.272329  -0.272329  -0.272329     -0.272329  -0.272329  -0.272329
 -0.144075  -0.73953   -0.73953       0.157968   0.157968   0.157968
  0.413263   0.194082   1.28145   …   0.983986   0.724955  -0.362408
 -0.119895   0.366803  -0.265549      0.796661   0.736268   0.434302
  0.140075   0.556609   0.556609     -0.772919  -0.667776  -0.61264 
 -0.981871  -0.867024  -0.867024     -0.981871  -0.981871  -0.981871
 -0.665949  -0.986353  -0.986353     -0.802418  -0.802418  -0.802418
 -1.45756   -0.302794  -0.302794  …   1.1753     1.1753     1.1753  
  0.440616   0.440616   0.396035      0.440616   0.402826   0.440616
 -1.0745    -0.491953  -1.20753      -0.982076  -0.864446  -0.668397

In [33]:
y

1×506 Array{Float64,2}:
 24.0  21.6  34.7  33.4  36.2  28.7  …  16.8  22.4  20.6  23.9  22.0  11.9

Playing with grad

In [48]:
funny(x) = x^2

funny (generic function with 1 method)

In [49]:
funny(3)

9

In [50]:
grad(funny)(3)

6

In [54]:
d_funny = grad(funny)

(::getfield(AutoGrad, Symbol("#gradfun#8")){getfield(AutoGrad, Symbol("##gradfun#6#7")){typeof(funny),Int64,Bool}}) (generic function with 1 method)

In [55]:
d_funny(3)

6

Continuing with the example

In [56]:
predict(w,x) = w[1]*x .+ w[2]

predict (generic function with 1 method)

In [57]:
loss(w,x,y) = mean(abs2,y-predict(w,x))
lossgradient = grad(loss)	# grad gives the gradient function wrt w
w = [ 0.1*rand(1,13), 0.0 ]	# initialize the weight vector and bias

2-element Array{Any,1}:
  [0.0948516 0.0765842 … 0.0626769 0.0850844]
 0.0                                         

In [58]:
for epoch in 1:10
    dw = lossgradient(w, x, y)
    for i in 1:length(w)
        w[i] -= lr * dw[i]
    end
end

UndefVarError: UndefVarError: lr not defined