# Fully-Connected (Linear) Layer
In this notebook, we will look into the forward and the backward the the ```nn.Linear``` layer. We will also manualy derive the expressions for the gradient of the loss respect to the input $\frac{\partial L}{\partial I}$ of this layer and also the gradient of the loss with respect to the weights $\frac{\partial L}{\partial W}$.

In [1]:
require 'nn';
n = torch.rand(5)
lin = nn.Linear(5,4)
m = lin:forward(n)

#### Input

In [2]:
n

 0.7926
 0.5777
 0.9319
 0.1711
 0.2915
[torch.DoubleTensor of size 5]



#### Output

In [3]:
m

 0.4889
-0.1365
 0.2820
-0.2786
[torch.DoubleTensor of size 4]



#### Set gradient of the loss with respect of input of the next layer $\frac{\partial L}{\partial I^{l+1}}$ of next layer to random values

In [4]:
nextgrad = torch.rand(4)
lin:backward(n, nextgrad)

#### Gradient of the loss with respect of input of this layer $\frac{\partial L}{\partial I}$

In [5]:
lin.gradInput

 0.3529
-0.4070
 0.0661
 0.2786
-0.0447
[torch.DoubleTensor of size 5]



#### Relation for calcuating the gradient of loss with respect to the input: $\frac{\partial L}{\partial I^{l}} = \frac{\partial O^{l+1}}{\partial I^{l+1}} \times W^{l}$

In [6]:
nextgrad:reshape(1,4)*lin.weight

 0.3529 -0.4070  0.0661  0.2786 -0.0447
[torch.DoubleTensor of size 1x5]



#### This layers gradient of Loss with respect to the weights: $\frac{\partial L}{\partial W^{l}}$

In [7]:
lin.gradWeight

 0.4952  0.3610  0.5823  0.1069  0.1821
 0.7512  0.5476  0.8832  0.1622  0.2763
 0.5335  0.3889  0.6272  0.1152  0.1962
 0.4709  0.3433  0.5537  0.1017  0.1732
[torch.DoubleTensor of size 4x5]



#### Relation for calcuating the gradient of the loss with respect to the weights of this layer: $\frac{\partial L}{\partial W^{l}} = \frac{\partial L}{\partial O} \frac{\partial O}{\partial W_{l}}$. <br/>
Let us first calcuate $\frac{\partial O}{\partial W_{l}}$ which is a jacobian of size $4\times20$. 

In [8]:
dodw = torch.Tensor(4,20)
st = 1
for i = 1, 4 do
    for j = 1, 5 do
        dodw[i][st]=n[j]
        st = st + 1
    end
end

Finally, we can now calculate $\frac{\partial L}{\partial W^{l}} = \frac{\partial L}{\partial O} \frac{\partial O}{\partial W_{l}}$

In [9]:
(nextgrad:reshape(1,4) * dodw):reshape(4,5)

 0.4952  0.3610  0.5823  0.1069  0.1821
 0.7512  0.5476  0.8832  0.1622  0.2763
 0.5335  0.3889  0.6272  0.1152  0.1962
 0.4709  0.3433  0.5537  0.1017  0.1732
[torch.DoubleTensor of size 4x5]



#### This layers gradient of output with respect to the bias: $\frac{\partial L}{\partial b^{l}}$

In [10]:
lin.gradBias

 0.6248
 0.9478
 0.6731
 0.5942
[torch.DoubleTensor of size 4]



#### Relation for calcuating this layers gradient of output with respect to the bias: $\frac{\partial L}{\partial b^{l}} = \frac{\partial L}{\partial I^{l+1}}$

In [11]:
nextgrad:reshape(1,4):t()

 0.6248
 0.9478
 0.6731
 0.5942
[torch.DoubleTensor of size 4x1]

