---
title       : "Multiple Layer Networks" 
subtitle    : ""
author      : Paul Schrimpf
date        : `j using Dates; print(Dates.today())`
bibliography: "../ml.bib"
options:
      out_width : 100%
      wrap : true
      fig_width : 800
      dpi : 192
---

[![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png)](http://creativecommons.org/licenses/by-sa/4.0/)

This work is licensed under a [Creative Commons Attribution-ShareAlike
4.0 International
License](http://creativecommons.org/licenses/by-sa/4.0/) 


### About this document 

This document was created using Weave.jl. The code is available in
[on github](https://github.com/schrimpf/NeuralNetworkEconomics.jl). The same
document generates both static webpages and associated [jupyter
notebook](slp.ipynb). 

$$
\def\indep{\perp\!\!\!\perp}
\def\Er{\mathrm{E}}
\def\R{\mathbb{R}}
\def\En{{\mathbb{E}_n}}
\def\Pr{\mathrm{P}}
\newcommand{\norm}[1]{\left\Vert {#1} \right\Vert}
\newcommand{\abs}[1]{\left\vert {#1} \right\vert}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
$$

In [1]:
markdown = try
  "md" in keys(WEAVE_ARGS) && WEAVE_ARGS["md"]
catch
  false
end

if !("DISPLAY" ∈ keys(ENV))
  # Make gr and pyplot backends for Plots work without a DISPLAY
  ENV["GKSwstype"]="nul"
  ENV["MPLBACKEND"]="Agg"
end
# Make gr backend work with λ and other unicode
ENV["GKS_ENCODING"] = "utf-8"

using NeuralNetworkEconomics
docdir = joinpath(dirname(Base.pathof(NeuralNetworkEconomics)), "..","docs")

using Pkg
Pkg.activate(docdir)
Pkg.instantiate()

[32m[1mActivating[22m[39m environment at `~/.julia/dev/NeuralNetworkEconomics/docs/Project.toml`
[32m[1m  Updating[22m[39m registry at `~/.julia/registries/General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[?25l[2K[?25h

# Introduction

[The previous notes](slp.md) discussed single layer neural
networks. These notes will look at multiple layer networks.

## Additional Reading

- @goodfellow2016 [*Deep Learning*](http://www.deeplearningbook.org)
- [`Knet.jl`
  documentation](https://denizyuret.github.io/Knet.jl/latest/)
  especially the textbook
- @klok2019 *Statistics with Julia:Fundamentals for Data Science,
  MachineLearning and Artificial Intelligence*
    
  
# Multiple Layer Neural Networks

- Many hidden layers
    - $x^{(0)} = x$
    - $x^{(\ell)}_j = \psi(a_j^{(\ell)} x^{(\ell-1)} + b_j^{(\ell)})$

There is a gap between applied use of neural networks and this
statistical theory. These rate results are for networks with a single
hidden layer. In prediction applications, the best performance is
typically achieved by deep neural networks with many hidden
layers. Intuitively, multiple hidden layers should do at least as well
as a single hidden layer. 

There are some recent theoretical results that formalize this intuition.
FIXME: ADD CITATIONS.

# Training

In [2]:
using Plots, Flux, Statistics, ColorSchemes
# some function to estimate
f(x) = sin(x^x)/2^((x^x-pi/2)/pi)
function simulate(n,s=1)
  x = rand(n,1).*pi
  y = f.(x) .+ randn(n).*s
  (x,y)
end
x, y = simulate(1000, 0.5)
xt = reshape(x, 1, length(x))
yt = reshape(y, 1, length(y))
xg = 0:0.01:pi
units = [5, 7, 9]
cscheme = colorschemes[:BrBG_4];

In [3]:
dimx = 1
xt = reshape(Float32.(x), 1, length(x))
yt = reshape(Float32.(y), 1, length(y))
models = [ Chain(x->Flux.normalise(x, dims=2),
                 Dense(dimx, 15, Flux.leakyrelu),
                 Dense(15, 1)),
           Chain(x->Flux.normalise(x, dims=2),
                 Dense(dimx, 3, Flux.leakyrelu),
                 Dense(3, 3, Flux.leakyrelu),
                 Dense(3, 3, Flux.leakyrelu),
                 Dense(3, 1))
           ]

figs = Array{typeof(plot(0)),1}(undef,length(models))
initmfigs = Array{typeof(plot(0)),1}(undef,length(models))

for r in eachindex(models)
  m = models[r]
  initmfigs[r] = plot(xg, Tracker.data(m[1:(end-1)](xg'))', lab="", legend=false)
  figs[r]=plot(xg, f.(xg), lab="", title="Model $m", color=:red)
  figs[r]=scatter!(x,y, alpha=0.4, markersize=1, markerstrokewidth=0, lab="")
  maxiter = 300
  for i = 1:maxiter
    Flux.train!((x,y)->Flux.mse(m(x),y), Flux.params(m), [(xt, yt)], Flux.AMSGrad() ) #,
                #cb = Flux.throttle(()->@show(Flux.mse(m(xt),yt)),100))
    if i==1 || (i % (div(maxiter,5))==0)
      l=Tracker.data(Flux.mse(m(xt), yt))
      println("Model $(m), $i iterations, loss=$l")
      yg = Tracker.data(m(xg'))'
      loc=Int64.(ceil(length(xg)*i/maxiter))
      figs[r]=plot!(xg,yg, lab="", color=get(cscheme, i/maxiter), alpha=1.0,
                    annotations=(xg[loc], yg[loc],
                                 Plots.text("i=$i", i<maxiter/2 ? :left : :right, pointsize=10,
                                            color=get(cscheme, i/maxiter)) )
                    )
    end
  end
  #display(figs[r])
end

Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 1 iterations, loss=0.46546507
Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 60 iterations, loss=0.28682372
Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 120 iterations, loss=0.26754394
Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 180 iterations, loss=0.26247495
Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 240 iterations, loss=0.2610244
Model Chain(#3, Dense(1, 15, leakyrelu), Dense(15, 1)), 300 iterations, loss=0.26012135
Model Chain(#4, Dense(1, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 1)), 1 iterations, loss=0.70414084
Model Chain(#4, Dense(1, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 1)), 60 iterations, loss=0.27061445
Model Chain(#4, Dense(1, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 1)), 120 iterations, loss=0.26338327
Model Chain(#4, Dense(1, 3, leakyrelu), Dense(3, 3, leakyrelu), Dense(3, 3, l

# References