In [1]:
using Flux

## <span style="color:orange"> Create a 'Scale' layer which takes in inputs and scales the values linearly </span>

- Does not perform any transformations

In [2]:
m1 = Chain( Dense(1=>3) )

Chain(
  Dense(1 => 3),                        [90m# 6 parameters[39m
) 

In [5]:
fieldnames(typeof(m1.layers[1]))

(:weight, :bias, :σ)

In [6]:
m1.layers[1].weight

3×1 Matrix{Float32}:
 -1.0532591
 -1.0067676
  0.01625458

In [13]:
m1( [1] ) #values returned / mapped 

3-element Vector{Float32}:
 -1.0532591
 -1.0067676
  0.01625458

In [18]:
m1s = Flux.Scale( [0.1, 10, 100] ) #scaling layer

Scale(3)            [90m# 6 parameters[39m

In [19]:
m1scaled = Chain( m1 , m1s )

Chain(
  Chain(
    Dense(1 => 3),                      [90m# 6 parameters[39m
  ),
  Scale(3),                             [90m# 6 parameters[39m
) [90m                  # Total: 4 arrays, [39m12 parameters, 360 bytes.

In [20]:
m1scaled( [1] )

3-element Vector{Float64}:
  -0.10532591342926026
 -10.067676305770874
   1.6254579648375511

In [24]:
m1s2 = Flux.Scale( repeat( [0.01] , 3 ) ) #scaling layer

Scale(3)            [90m# 6 parameters[39m

In [25]:
m1scaled2 = Chain( m1 , m1s2 )

Chain(
  Chain(
    Dense(1 => 3),                      [90m# 6 parameters[39m
  ),
  Scale(3),                             [90m# 6 parameters[39m
) [90m                  # Total: 4 arrays, [39m12 parameters, 360 bytes.

In [26]:
m1scaled2( [1] )

3-element Vector{Float64}:
 -0.010532591342926025
 -0.010067676305770875
  0.0001625457964837551

In [28]:
m1s3 = Flux.Scale( [1,0,1] ) #scaling layer which operates as a removing a neuron (dimension) component

Scale(3)            [90m# 6 parameters[39m

In [29]:
m1scaled3 = Chain( m1 , m1s3 )

Chain(
  Chain(
    Dense(1 => 3),                      [90m# 6 parameters[39m
  ),
  Scale(3),                             [90m# 6 parameters[39m
) [90m                  # Total: 4 arrays, [39m12 parameters, 360 bytes.

In [30]:
m1scaled3( [1] )

3-element Vector{Float32}:
 -1.0532591
  0.0
  0.01625458

## <span style="color:orange"> Convolution Models </span>

- Images (2D data) fed into CNN layers require the <u>W-H-C-N</u> order, which is Width-Height-ChannelNum-BatchSize. Eg if we have images width=28, height=28, 3 channels (R-G-B) and there are 100 of these images in the batch, the size of that data structure fed to the CNN layer is (28,28,3,100). If the images were monochromatic (gray scale) then the size would be (28,28,1,100). 
- Data can be 1D, for 1D convolutions, on time series or mono audio data, so that the format becomes W-C-N, eg. if we have 100 audio samples in one channel (mono) and each sample has 250 dimensions the size is, (250,1,1). If the data was stereo (2 channel) then the size would be (250,2,1)
- If the data was 3D (like data from a magnetic scan), it could look like this (28,28,28,1,100) where the last 2 dimensions are the channel number and batch size. 
- The Convolutional (CNN) layer takes in a <u>filter size</u> to scan over the data.
- The Convolutional (CNN) layer takes in the <u>channel size mapping</u>, which does a transformation from the input channel number to the output channel number. 
- Key parameter values are <u>stride</u>, <u>padding</u>, <u>dilation</u>
- Padding specifies the number of pixels (elements) placed on the borders (boundaries of the data in each dimension). A single value (integer) for uniform padding around the data array, or if there 3 spatial dimensions a tuple of 3 integers is needed to specify the padding if not uniform. You can supply a 2x3 = 6 dimension tuple to padding to specify the padding for each boundary specifically. **note padding will change the size of the image after the application of the cnn filter**
- <u>SamePad()</u> is a useful utility function so that it applies the necessary padding in order for the data dimensions to stay the same after the cnn filter application; that is a 28x28 image with SamePad() applied will still become 28x28 after exiting that layer and becoming input into the subsequent layer.


#### convolution gifs from [ Vincent Dumoulin, Francesco Visin - A guide to convolution arithmetic for deep learning]

### <span style="color:orange"> No padding, no strides </span>

![image info](./demoPics/no_padding_no_strides.gif)

In [58]:
x = rand(Float32, 100, 100, 1, 10); #batch of monochrome images
size( x )

(100, 100, 1, 10)

In [59]:
cnn_layer = Conv( (5,5) , 1 => 1 )

Conv((5, 5), 1 => 1)  [90m# 26 parameters[39m

In [60]:
cnn_output = cnn_layer(x)
size( cnn_output )

(96, 96, 1, 10)

In [61]:
size( x ) == size( cnn_output )

false

In [62]:
cnn_layer2 = Conv( (5,5) , 1 => 1 , pad = 2 ) #apply some padding

Conv((5, 5), 1 => 1, pad=2)  [90m# 26 parameters[39m

In [64]:
cnn_output2 = cnn_layer2( x )
size( cnn_output2 )

(100, 100, 1, 10)

In [65]:
size( x ) == size( cnn_output2 )

true

#### <span style="color:orange"> try SamePad() </span>

In [67]:
cnn_layer3 = Conv( (5,5) , 1 => 1 , pad = SamePad() ) #apply some padding using the SamePad() function helper

Conv((5, 5), 1 => 1, pad=2)  [90m# 26 parameters[39m

In [68]:
cnn_output3 = cnn_layer3( x )
size( cnn_output3 )

(100, 100, 1, 10)

In [69]:
size( x ) == size( cnn_output3 )

true

<span style="color:orange"> What happens to the size on repeated applications of a filter </span>

In [70]:
x = rand(Float32, 100, 100, 1, 10); #batch of monochrome images

In [71]:
cnn_layer = Chain( Conv( (5,5) , 1=>3 ) , Conv( (5,5) , 3=>5 ) , Conv( (5,5) , 5=>8 ) ) 

Chain(
  Conv((5, 5), 1 => 3),                 [90m# 78 parameters[39m
  Conv((5, 5), 3 => 5),                 [90m# 380 parameters[39m
  Conv((5, 5), 5 => 8),                 [90m# 1_008 parameters[39m
) [90m                  # Total: 6 arrays, [39m1_466 parameters, 6.781 KiB.

In [73]:
cnn_output = cnn_layer( x )
size( cnn_output )

(88, 88, 8, 10)

In [74]:
cnn_layer_same_pad = Chain( Conv((5,5),1=>3,pad=SamePad()),Conv((5,5),3=>5,pad=SamePad()),Conv((5,5),5=>8,pad=SamePad()) ) 

Chain(
  Conv((5, 5), 1 => 3, pad=2),          [90m# 78 parameters[39m
  Conv((5, 5), 3 => 5, pad=2),          [90m# 380 parameters[39m
  Conv((5, 5), 5 => 8, pad=2),          [90m# 1_008 parameters[39m
) [90m                  # Total: 6 arrays, [39m1_466 parameters, 6.781 KiB.

In [75]:
cnn_layer_same_pad = cnn_layer_same_pad( x )
size( cnn_layer_same_pad )

(100, 100, 8, 10)

### <span style="color:orange"> Example like the gif above </span>

In [78]:
x = rand(Float32, 4, 4, 1, 10); #batch of monochrome images
cnn_layer = Conv( (3,3) , 1 => 1 )
cnn_layer( x ) |> size

(2, 2, 1, 10)

### <span style="color:orange"> Add arbitrary padding (zeros) </span>

![image info](./demoPics/arbitrary_padding_no_strides.gif)

In [83]:
x = rand(Float32, 4, 4, 1, 1); #batch of monochrome images
cnn_layer = Conv( (3,3) , 1 => 1 , pad = 5)
println( cnn_layer( x ) |> size )
cnn_layer( x )

(12, 12, 1, 1)


12×12×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.0  0.0  0.0   0.0        0.0        0.0       …   0.0        0.0  0.0  0.0
 0.0  0.0  0.0   0.0        0.0        0.0           0.0        0.0  0.0  0.0
 0.0  0.0  0.0   0.0        0.0        0.0           0.0        0.0  0.0  0.0
 0.0  0.0  0.0  -0.249219  -0.262202  -0.196942     -0.0217773  0.0  0.0  0.0
 0.0  0.0  0.0   0.25504    0.980319   0.295236     -0.109904   0.0  0.0  0.0
 0.0  0.0  0.0   0.784704   0.342678  -0.433007  …  -0.217018   0.0  0.0  0.0
 0.0  0.0  0.0   0.401285   0.048833  -0.18198      -0.444861   0.0  0.0  0.0
 0.0  0.0  0.0   0.749026   0.830429   0.137994     -0.24143    0.0  0.0  0.0
 0.0  0.0  0.0   0.382977  -0.223528  -0.338113     -0.3098     0.0  0.0  0.0
 0.0  0.0  0.0   0.0        0.0        0.0           0.0        0.0  0.0  0.0
 0.0  0.0  0.0   0.0        0.0        0.0       …   0.0        0.0  0.0  0.0
 0.0  0.0  0.0   0.0        0.0        0.0           0.0        0.0  0.0  0.0

In [88]:
x = rand(Float32, 4, 4, 1, 1); 
cnn_layer = Conv( (3,3) , 1 => 1 , pad = (0,5) ) #non-uniform padding
println( cnn_layer( x ) |> size )
cnn_layer( x )

(2, 12, 1, 1)


2×12×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.0  0.0  0.0  -0.579736    0.527731  …  0.493209  0.296352  0.0  0.0  0.0
 0.0  0.0  0.0  -0.0546827  -0.698075     0.397847  0.345251  0.0  0.0  0.0

### <span style="color:orange"> On 1D data </span>

In [97]:
x = rand(Float32, 100, 1, 1); #100elements, 1 channel, 10 samples in the batch 
cnn_layer = Conv( (4,) , 1 => 1 ) #4 element filter
cnn_layer( x ) |> size

(97, 1, 1)

In [98]:
x = rand(Float32, 100, 1, 1); #100elements, 1 channel, 10 samples in the batch 
cnn_layer = Conv( (4,) , 1 => 1 , pad = SamePad() ) #4 element filter
cnn_layer( x ) |> size

(100, 1, 1)

### <span style="color:orange"> Now apply 'Strides' </span>
![image info](./demoPics/no_padding_strides.gif)

(no padding but with strides)

In [96]:
x = rand(Float32, 100, 1, 1); #100elements, 1 channel, 10 samples in the batch 
cnn_layer = Conv( (4,) , 1 => 1 , stride=5 ) #4 element filter
cnn_layer( x ) |> size

(20, 1, 1)

In [99]:
x = rand(Float32, 100, 1, 1); #100elements, 1 channel, 10 samples in the batch 
cnn_layer = Conv( (4,) , 1 => 1 , stride=1 ) 
cnn_layer( x ) |> size

(97, 1, 1)

In [100]:
x = rand(Float32, 50, 50, 2, 10); #batch of monochrome images
cnn_layer = Conv( (3,3) , 2 => 3 , stride = 5 )
cnn_layer( x ) |> size

(10, 10, 3, 10)

### <span style="color:orange"> Padding and Strides </span>

![image info](./demoPics/padding_strides.gif)

In [106]:
x = rand(Float32, 200, 200, 3, 10); #batch of monochrome images
cnn_layer = Conv( (4,4) , 3 => 3 , pad=4 , stride = 5 )
cnn_layer( x ) |> size

(41, 41, 3, 10)

### <span style="color:orange"> Dilation, adds element skipping between element of the filter </span>

![image info](./demoPics/dilation.gif)

In [108]:
x = rand(Float32, 200, 200, 3, 10); 
cnn_layer = Conv( (4,4) , 3 => 3 , dilation = 1 )
cnn_layer( x ) |> size

(197, 197, 3, 10)

In [109]:
x = rand(Float32, 200, 200, 3, 10); 
cnn_layer = Conv( (4,4) , 3 => 3 , dilation = 2 )
cnn_layer( x ) |> size

(194, 194, 3, 10)

In [118]:
x = rand(Float32, 200, 200, 3, 10); 
cnn_layer = Conv( (5,5) , 3 => 3 , pad = 2 , dilation=1 )
cnn_layer( x ) |> size

(200, 200, 3, 10)

In [120]:
x = rand(Float32, 200, 200, 1, 10); 
cnn_layer = Conv( (5,5) , 1 => 1 , pad = 2 , dilation=4 )
cnn_layer( x ) |> size

(188, 188, 1, 10)

In [123]:
x = rand(Float32, 200, 200, 1, 10); 
cnn_layer = Conv( (5,5) , 1 => 1 , pad = SamePad() , dilation=4 )
cnn_layer( x ) |> size

(200, 200, 1, 10)

## <span style="color:orange"> Transposed convolutional  </span>

- Transposed convolutional layers 'upsample' the data, so that the output feature map goes from 'low' resolution to 'high' resolution.
- Previously the operations would 'downsample' that reduces resolution.
- Increases the size of the output
- Takes each kernel element and projects onto the surroundings to create a patch

![image info](./demoPics/no_padding_no_strides_transposed.gif)

(transposed convolution, no padding no strides)

![image info](./demoPics/full_padding_no_strides_transposed.gif)

(transposed convolution, full padding no strides)

![image info](./demoPics/padding_strides_transposed.gif)

(transposed convolution, padding no strides)

In [124]:
x = rand(Float32, 10, 10, 1, 10); 
cnn_layer = ConvTranspose( (5,5) , 1 => 1  )
cnn_layer( x ) |> size

(14, 14, 1, 10)

In [128]:
x = rand(Float32, 10, 10, 1, 10); 
cnn_layer = ConvTranspose( (5,5) , 1 => 1 , pad=6 )
cnn_layer( x ) |> size

(2, 2, 1, 10)

In [130]:
x = rand(Float32, 10, 10, 1, 10); 
cnn_layer = ConvTranspose( (5,5) , 1 => 1 , pad=1 , stride=2 )
cnn_layer( x ) |> size

(21, 21, 1, 10)

In [131]:
x = rand(Float32, 10, 10, 1, 10); 
cnn_layer = ConvTranspose( (5,5) , 1 => 1 , pad=SamePad() , stride=2 )
cnn_layer( x ) |> size

(20, 20, 1, 10)

In [133]:
x = rand(Float32, 10, 10, 1, 1); 
cnn_layer = ConvTranspose( (5,5) , 1 => 1 , pad=SamePad() , stride=4 )
cnn_layer( x ) 

40×40×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 -0.0446279     0.00479112    0.0992612   …   0.12348     -0.149842
 -0.0602849    -0.0472154     0.186933        0.232544     0.186137
  0.186731      0.11041       0.114736        0.14273     -0.0278084
 -0.0270291     0.0811776    -0.0489352      -0.18712     -0.113562
 -0.00543167    0.000583127   0.0120811       0.133592    -0.162113
 -0.00733727   -0.00574659    0.0227516   …   0.251587     0.201381
  0.022727      0.0134381     0.0139645       0.154419    -0.0300857
 -0.242599      0.226335     -0.155518       -0.190078    -0.13336
 -0.0638132     0.00685079    0.141933        0.13292     -0.161297
 -0.0862009    -0.067513      0.267295        0.250321     0.200367
  0.267005      0.157875      0.16406     …   0.153641    -0.0299342
 -0.20779       0.269063     -0.175681       -0.141707    -0.172945
 -0.0524018     0.0056257     0.116552        0.0877207   -0.106448
  ⋮                                       ⋱               
 -0.0365235

#### <span style="color:orange"> Depthwise Convolutional Layers apply kernels to each channel separately so that the kernel does not span all channels </span>

In [135]:
x = rand(Float32, 10, 10, 5, 10); 
cnn_layer = DepthwiseConv( (5,5) , 5 => 5 , pad=SamePad() )
cnn_layer( x ) |> size

(10, 10, 5, 10)

### <span style="color:orange"> Common layers often used 'after' a CNN layer </span>

- they reduce the size of the output, and do not have parameters to train
- MaxPool, takes the maximum element in the region, and the corresponding 'MeanPool' exists
- Adaptive Max Pool, this takes in a tuple for the size of the data desired for each channel and batch size and finds for you the MaxPool size for the target output size needed. (equivalently there is the AdaptiveMeanPool)

- GlobalMaxPool produces a single value for each of the channels and each of the batch samples (also GlobalMeanPool is there)

In [140]:
x = rand(Float32, 100, 100, 1, 10); 
m = Chain( Conv( (5,5) , 1 => 2 ) , MaxPool((5,5)) )
m( x ) |> size

(19, 19, 2, 10)

In [141]:
x = rand(Float32, 100, 100, 1, 10); 
m = Chain( Conv( (5,5) , 1 => 2 , pad=SamePad() ) , MaxPool((5,5)) )
m( x ) |> size

(20, 20, 2, 10)

In [143]:
x = rand(Float32, 100, 100, 1, 10); 
m = Chain( Conv( (5,5) , 1 => 2  ) , MaxPool( (5,5), pad=SamePad() ) )
m( x ) |> size

(20, 20, 2, 10)

In [144]:
x = rand(Float32, 100, 100, 1, 10); 
m = Chain( Conv( (5,5) , 1 => 2  ) , AdaptiveMaxPool( (40,40) ) ) #we want a 40x40 output from MaxPool
m( x ) |> size

(40, 40, 2, 10)

In [145]:
x = rand(Float32, 100, 100, 1, 10); 
m = Chain( Conv( (5,5) , 1 => 3  ) , GlobalMaxPool() ) #we want a 40x40 output from MaxPool
m( x ) |> size

(1, 1, 3, 10)

### <span style="color:orange"> Upsampling </span>

- These do the opposite of pooling. Instead of a transposed convolution these layers 'upsample' by increasing the resolution from the perspective of further interpolating across the domain in the statistical sense. 
- There are different methods for doing so, such as the bilinear and nearest

In [151]:
x = rand(Float32, 100, 100, 2, 10); 
m = Upsample( :nearest, size= (200,200) )
m( x ) |> size

(200, 200, 2, 10)

In [152]:
x = rand(Float32, 100, 100, 2, 10); 
m = Upsample( :bilinear, scale=4 )
m( x ) |> size

(400, 400, 2, 10)

important function utilities ; Flux.flatten    

In [None]:
![image info](./demoPics/arbitrary_padding_no_strides.gif)

In [None]:
<span style="color:orange">  </span>