## Essentials to implement a Neural Net using PyTorch

From theory, we know that we need:
- The Neural Net definition i.e.the input, the hidden layers that propagate the input and the output (a little bit more work for a RNN or more precise convolutional layers and pooling and flattening for a CNN)
- An intialization of the weights or filters
- A loss function and desired outputs

To run our computation, we need to keep doing:
- Push input after input forward along the net
- Compute a loss
- Propagate backwards and get gradients for each weight or filter
- Alter the weights in direction opposite to gradient to decrease loss
... until we have optimized weights enough that loss is within acceotable range

### Defining the Neural Net Architecture

In [135]:
# Imports
# Torch package
import torch
# Variable Class
from torch.autograd import Variable
# Neural Net Sub-package from Torch
import torch.nn as nn
# Functions from the Neural Net Sub-Package such as RELU
import torch.nn.functional as F

# input_size will dictate size of first hidden layer
# Here, have 3 samples with 8 coordinates each
inputs = Variable(torch.randn(3,8), requires_grad=True)
input_size = inputs.size()

# Implement the Neural Net class from torch.nn.Module Class
# Has all the useful stuff needed for a Neural Net
class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # Basic net architecture i.e. the layers needed in all
        # How they are chained and non-linearity addition defined in forward()
        # A layer is defined by number of inputs and outputs
        # Input layer already specified by size of input fed when instantiating the net
        self.hidden = nn.Linear(input_size[1], 5)
        self.output = nn.Linear(5, 2)

    def forward(self, inputs):
        # How lyers link to each other is defined here
        # Non-linearity added in between this chaining definition
        
        # Input 3 samples of size 8 into the hidden layer
        x = self.hidden(inputs)
        x = F.tanh(x)
        x = self.output(x)
        x = F.log_softmax(x)
        predicted = torch.max(x, 1)
        return predicted

    # Can also define any other helpers to be used by this NN here
    
# Instantiate Net architecture first
net = Net()
print 'This net looks like: ', net
# Calling the net on a set of inputs does calls forward() i.e. forward props on all of them,
# Useful if say you have a trained net and just need to classify test data
result = net(inputs)
print 'Classifying the inputs as if it is a trained model gives: ', result

This net looks like:  Net (
  (hidden): Linear (8 -> 5)
  (output): Linear (5 -> 2)
)
Classifying the inputs as if it is a trained model gives:  (Variable containing:
-0.6546
-0.6232
-0.5901
[torch.FloatTensor of size 3]
, Variable containing:
 0
 0
 0
[torch.LongTensor of size 3]
)


## Training the Neural Net
So far we have just deined the Neural Net architecture, weights have been randomized as none have been specified and we saw how to forward propagate

To train we still need:
- Initializing weights
- Specifying needed desired outputs for each input and... 
- A loss function to gauge how we far we are from those

We need to proceed as follows:
- Do full epochs through test data
- Calculate the Loss and back propagate to minimize it
- Repeat until loss satisfactory

In [136]:
# In one epoch:
#  The net forward propagates on the input
#  Calculates loss using the loss function on outputs obtained and desired_output
#  Optimizes the loss function using the optimizer
#    That is, finds gradient of loss wrt. each weight (Recall the Variable Class that wraps around a Tensor)
#    (SGD is an example seen in theory)
#  Updates the weights based on those gradients

def feed_forward_one_time(net, inputs, desired_outputs):
    # Forward prop
    # Output from net is in the form of a list of:
    #  1. actual logsoftmax value of biggest probability result
    #  2. the biggest probability result
    net_result = net(inputs)
    output = Variable(net_result[1].data.float(), requires_grad=True)
    
    # Calculate loss
    loss_function = nn.MSELoss()
    loss = loss_function(output, desired_outputs)
    print('Loss is now valued at: ', loss)

net = Net()
feed_forward_one_time(net, inputs, Variable(torch.arange(1, 4)))

# Note: Both output and desired output ned to be Variables with same type of tensor inside
# Cast a tensor by doing that_tensor_name.float() or .double() or .long() etc etc

('Loss is now valued at: ', Variable containing:
 1.6667
[torch.FloatTensor of size 1]
)


In [150]:
# Loss is a Variable that has its grad_fn spanning all the way back to inputs 
    # This allows Backpropagation wrt. every parameter
    # Each parameter is in the net.parameters() generator:
layer_count = 0
for x in net.parameters():
    print 'layer #', layer_count, 'Parameters'
    print x
    layer_count += 1
    print x.data
        
# Clear all gradient buffers for params to get fresh gradients
net.zero_grad() 
# Backprop and get gradient for each and every param
loss.backward()
    
    

layer # 0 Parameters
Parameter containing:
-0.3197 -0.3518  0.0701  0.0383  0.0509 -0.0275  0.2857  0.1449
-0.1431  0.2345  0.2526 -0.2436  0.1383 -0.0528  0.1218  0.0756
-0.1567 -0.2732  0.3497  0.2894  0.1846 -0.1207 -0.2182  0.2501
 0.2786  0.0375  0.2465 -0.0741 -0.1640  0.0972  0.1692  0.3066
 0.2091  0.0810 -0.2323  0.2969 -0.0808 -0.1355 -0.1512  0.3311
[torch.FloatTensor of size 5x8]


-0.3197 -0.3518  0.0701  0.0383  0.0509 -0.0275  0.2857  0.1449
-0.1431  0.2345  0.2526 -0.2436  0.1383 -0.0528  0.1218  0.0756
-0.1567 -0.2732  0.3497  0.2894  0.1846 -0.1207 -0.2182  0.2501
 0.2786  0.0375  0.2465 -0.0741 -0.1640  0.0972  0.1692  0.3066
 0.2091  0.0810 -0.2323  0.2969 -0.0808 -0.1355 -0.1512  0.3311
[torch.FloatTensor of size 5x8]

layer # 1 Parameters
Parameter containing:
 0.0632
 0.2503
 0.2324
-0.1708
 0.2452
[torch.FloatTensor of size 5]


 0.0632
 0.2503
 0.2324
-0.1708
 0.2452
[torch.FloatTensor of size 5]

layer # 2 Parameters
Parameter containing:
-0.1124  0.3807 -0.00

NameError: name 'loss' is not defined