# Fully-Connected (Linear) Layer
In this notebook, we will look into the forward and the backward the the ```nn.Linear``` layer. We will also manualy derive the expressions for the gradient of the output respect to the input $\frac{\partial O}{\partial I}$ and also the derivate of the output with respect to the weights $\frac{\partial O}{\partial W}$.

In [2]:
require 'nn';
n = torch.rand(5)
lin = nn.Linear(5,4)
m = lin:forward(n)

#### Input

In [3]:
n

 0.9052
 0.7486
 0.2017
 0.2014
 0.0126
[torch.DoubleTensor of size 5]



#### Output

In [4]:
m

-0.6340
-0.4447
-0.0561
 0.1812
[torch.DoubleTensor of size 4]



#### Gradient of the output with respect of input of the next layer $\frac{\partial O}{\partial I}$ of next layer

In [5]:
nextgrad = torch.rand(4)
lin:backward(n, nextgrad)

#### Gradient of the output with respect of input of this layer $\frac{\partial O}{\partial I}$

In [6]:
lin.gradInput

-0.6341
-0.1365
 0.4231
 0.0697
 0.2226
[torch.DoubleTensor of size 5]



#### Relation for calcuating this layers gradient of output with respect to the input: $\frac{\partial O^{l}}{\partial I^{l}} = W^{T} \times \frac{\partial O^{l+1}}{\partial I^{l+1}}$

In [7]:
lin.weight:t()*nextgrad

-0.6341
-0.1365
 0.4231
 0.0697
 0.2226
[torch.DoubleTensor of size 5]



#### This layers gradient of output with respect to the weights: $\frac{\partial O^{l}}{\partial W^{l}}$

In [8]:
lin.gradWeight

 0.7481  0.6187  0.1667  0.1664  0.0104
 0.8719  0.7211  0.1943  0.1940  0.0122
 0.5630  0.4656  0.1255  0.1252  0.0078
 0.4919  0.4068  0.1096  0.1094  0.0069
[torch.DoubleTensor of size 4x5]



#### Relation for calcuating this layers gradient of output with respect to the weights: $\frac{\partial O^{l}}{\partial W^{l}} = I \times \frac{\partial O^{l+1}}{\partial I^{l+1}}$

In [9]:
(n:reshape(5,1) * nextgrad:reshape(1,4)):t()

 0.7481  0.6187  0.1667  0.1664  0.0104
 0.8719  0.7211  0.1943  0.1940  0.0122
 0.5630  0.4656  0.1255  0.1252  0.0078
 0.4919  0.4068  0.1096  0.1094  0.0069
[torch.DoubleTensor of size 4x5]



#### This layers gradient of output with respect to the bias: $\frac{\partial O^{l}}{\partial b^{l}}$

In [11]:
lin.gradBias

 0.8265
 0.9632
 0.6219
 0.5434
[torch.DoubleTensor of size 4]



#### Relation for calcuating this layers gradient of output with respect to the bias: $\frac{\partial O^{l}}{\partial b^{l}} = \frac{\partial O^{l+1}}{\partial I^{l+1}}$

In [12]:
nextgrad:reshape(1,4):t()

 0.8265
 0.9632
 0.6219
 0.5434
[torch.DoubleTensor of size 4x1]

