<a href="https://colab.research.google.com/github/rithikJha/AIMechanics/blob/master/3_KnowYourNetwork.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 3. Know Your Network

Since you are in 3rd notebook of this series, you must be comfortable with **tensors** and data presented in **Pytorch tensor** . In this notebook we will learn about the **DEEP NEURAL NETWORK** in pytorch.

## What we will be doing?

---


Any Deep Learning pipeline has 4 components - 


1.   Prepare the data
2.   **Build the Model** ( Our concern in this notebook )
3.   Train on your model
4.   Analyse the model's result


---

## What does we mean when we say network(or neural network)



By network we just mean, **a function** that maps inputs to outputs. 

Example -

1. Classification

   input image ----**f(x)**---> Image contains what?? (Cats or dogs or elephants)

2. Regression
   
   input data -------**f(x)**----> What is the price of the house??

   
Here **f(x)** is our network that is suppose to approximate/turn into a function, that does the job required (predicting house prices, finding a tumour to be malignant or benign, what object does the image contains? and other such jobs)


![alt text](https://www.pyimagesearch.com/wp-content/uploads/2016/08/simple_neural_network_header-768x377.jpg)

But in real life, its difficult to build a function that takes a image as a tensor and tells if it is a cat or dog or horse because of soooo many variables that these function depends upon , it is nearly impossible to approximate these function that yeilds great accuracy.

---

## How deep learning helps us in this problem?



Here enters **deep learning**, a highly iterative process, where your job is only to think about how this function must look like without worrying about the accurate values that would yeild required result.


Researchers have been working very hard to just find a suitable look for this function. Mostly by intelligent **hit n trial** methods. But there is always a possiblity that a better **network architecture**(look of the function) can emerge due to enormous time spent in the research. 

Here are some examples of the **architecture**/looks built to optimize the prediction accuracy.

![alt text](http://kim.hfg-karlsruhe.de/wp-content/uploads/2017/12/neural-network-chart-768x506.png)

Soon after you comfortable with basics of designing a network and approximate greater results, you must dive in to read research papers (Google it) to bring tweakings to your network so as to improve the accuracy of prediction further.

Those tweaks which you will be implementing for the purpose of further improving the accuracy yeild by your network, are discovered by researcher in a highly iterative manner( hit and trial (not completely actually, as they get the approximate idea of what would be better because of their experience) ).

Example - The accuracy yeild by normal **Deep Neural Network**(DNN) is not that great, so someone thought why to treat image as whole (as if all locations in the image are equally important), then he went to preserve the spatial information and **Convolutional Neural Networks**(CNN) was born.



---

## Components of each network


In this notebook we will be decoupling the network and look at each component closely.

A network architecture is just a collection of **Layers**. What each layer does is basically transforms the incoming tensor in some way using **Weights**, **biases**, **activations** and other transformation operations like **pooling**, **regularizations**, **Normalizations** etc. and yeilds us an output tensor which is then passed to next layer. 

This coming of tensor and undergoing some transformation to become some other tensor, is called a **forward pass** of that tensor through that layer. Each forward pass yeild us some transformation( except if its a dummy layer which yeilds no transformations ).

We can think of our network as a **big layer**, where input data tensor undergoes a **big transformation**(collection of all transformation of each sub layer). The forward pass( undergoing big transformation ) through our complete network is **Forward Propagation**.

Our input tensor flows through these layers, undergoing various transformation and yeilding us an output tensor.

![alt text](http://www.wildml.com/wp-content/uploads/2015/11/Screen-Shot-2015-11-07-at-7.26.20-AM-1024x279.png)


Here we can see that the boat image is undergoing through transformations and yeilding us the probability of it being a dog, cat, boat or bird.



---

The process of conversion of a network from a **look-alike** function to an **accurate** function is called **learning** . You will hear people saying that their network is learning.

**Look-Alike**----------------------------------------->**Accurate** (Well !! something must be changing in the network to account for this change)

The only things that changes in a network while *the network is learning* are **Weights and Biases**. 

We have seen that Weights and biases are responsible for transformation of a tensor during forward pass. So, if these weights and biases changes, the forward pass will yeild us different transformation than before. The purpose of *learning* is to find appropriate values for these *weights and biases* , so that after learning, the forward pass will yeild us more appropriate transformations.

Let's see what weights and biases are - 

**Weights and biases** are the attributes of each layer and are contained within each layer in pytorch. They are also called **learnable parameters** because those are the only things that undergoes changes when network is learning. 


## How to make neural networks in pytorch?

We are talking about making our own custom network which will be a combination of *already implemented layers* present in torch.nn Class of pytorch. 

**We use deep learning library just for this purpose, so that all the standard transformations ( forward passes through standard types of layers ) comes predefines to us.**

Let's see how pytorch implements their layers. And we will use this to implement our **big layer**/neural network later.

 Eg- Linear Layer (Edited pytorch source code)

```
class Linear(Module):
    r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    Examples::

        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    __constants__ = ['in_features', 'out_features']
    in_features: int
    out_features: int
    weight: Tensor

    def __init__(self, in_features: int, out_features: int, bias: bool = True) -> None:
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)

```
Let's understand the pytorch mechanics.

Every Layer essentially does these three things-  

1. 

```
class Linear(Module)
``` 
Extending from torch.nn.Module so as to utilise the power of forward(), parameters() etc.



2. 

```
self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
```
Here layer is wrapping its attributes in torch.nn.parameter class or in other words, it is **registering** it to be the learnable parameters which is an attribute of torch.nn.Module. Parameters are Tensor subclasses, that have a very special property when used with Module - when they’re assigned as Module attributes(here it means layer's attributes) they are automatically added to the list of its parameters (module's parameters implementations are extended by our custom network), and will appear e.g. in parameters() iterator. Assigning a Tensor doesn’t have such effect. Also **Dimension of weight is [out_features, in_features]**

3. 
```
    def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)
```
We can see that weights and biases are attribute of Linear class (torch.nn.Linear) which inherits from torch.nn.Module , and this it has to override the **forward function**. It is doing that and returning a function from torch.nn.functional class.

Let's see what it means to undergo forward pass through linear layer means.

```
def linear(input, weight, bias=None):
    # type: (Tensor, Tensor, Optional[Tensor]) -> Tensor
    """
        Applies a linear transformation to the incoming data: :math:`y = xA^T + b`.
    """
    tens_ops = (input, weight)
    if not torch.jit.is_scripting():
        if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
            return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret
```

the above code can be understood through following diagram.

![alt text](https://rgl.s3.eu-central-1.amazonaws.com/media/uploads/wjakob/2017/12/05/neuralnetwork_single_neuron.png)

input, weight, bias - passed as argument(all are tensor). 

**output = input.matmul(weight.t()) + bias** ----> this output is basically returned which is also a tensor.



---


In [None]:
import torch
import torch.nn as nn

## Let's make our own **big layer** (Neural Network)



Now we will be making our custom network i.e. **big layer**( i.e. combination of layers present in torch.nn like linear, conv2d, avg_pool etc.).

We have seen that a layer must do three things (int this case our network/**big layer**) - 
1. **Extend from torch.nn.Module** - nn.Module has methods like - **forward()** which defines forward pass, **parameters()** which returns an iterator over learnable parameters.

2. **Wrap its learnable parameters to Parameter class** - Since the learnable parameters of **big layer** is just the collection of learnable parameters of small layers which makes this **big layer**/network. Therefore, all of these *learnable parameters* are already wrapped in torch.nn.parameter class. So, we won't have to do so in case we are using these already implemented torch.nn.Layers.

3. **Override forward method** - forward pass for each layer is already overrided within each layer but the *forward pass of the big layer* (**forward propagation**) is not defined yet. So, We must define how the tensor must flow through our network (sequentially? or jumping and skipping layers?? etc etc. )


Let's go ahead and implement it in pytorch - 




In [None]:
class network(nn.Module): # 1. Extending from nn.Module class
  def __init__(self):     # This is called a constructor, helps in assigning the instance of a class with attributes defined inside it.
    super().__init__()    # 
    self.layer1 = nn.Linear(in_features = 5, out_features = 10)  # 2. Linear layer 1 (Weights and biases are registered automatically by nn.Module)
    self.layer2 = nn.Linear(in_features = 10, out_features = 2)  # 2. Linear layer 2 (Weights and biases are registered automatically by nn.Module)
                                                                 # As nn.Module registers all the instances of nn.Parameter class
                                                                 # Our layer wraps weights and biases into nn.Parameter class

  def forward(self, x): # 3. Overriding forward method to show how must the tensor flow through our network
    x = self.layer1(x)  # Transformation through layer 1
    x = self.layer2(x)  # Transformation through layer 2
    return x

In [None]:
model = network() # Making a instance of our network, different data may flow through different instances
print(model)      # nn.Module has overidden __repr__() which affects the print function

network(
  (layer1): Linear(in_features=5, out_features=10, bias=True)
  (layer2): Linear(in_features=10, out_features=2, bias=True)
)


In [None]:
input = torch.rand(100,5)                            # Generating a random tensor to witness the tranformation that our network yeilds
output = model.forward(input)                        # Passing a tensor(100,5) through our network for forward propagation
print("The shape of input tensor is ",input.shape)   
print("The shape of output tensor is ",output.shape) # It is expected that it will be of size [100,2]

The shape of input tensor is  torch.Size([100, 5])
The shape of output tensor is  torch.Size([100, 2])


In [None]:
input2 = torch.rand(500,5)
output2 = model(input2) # Notice how we are calling the forward function of our network.
                        # It happens because nn.Module also overrides __call__() default function of python.
                        # It makes instance of classes callable. 
                        # In this case, when instance is called, forward function is invoked.
print("The shape of input tensor is ",input2.shape) 
print("The shape of output tensor is ",output2.shape) 

The shape of input tensor is  torch.Size([500, 5])
The shape of output tensor is  torch.Size([500, 2])


Let's go ahead and inspect weights and biases which is expected to be contained in each layer.

In [None]:
print("Weight for layer l is of dimension [out_feature,in_feature]")
print("Layer 1 weights:- ",model.layer1.bias.shape)
print("Layer 2 weights:- ",model.layer2.weight.shape)

Weight for layer l is of dimension [out_feature,in_feature]
Layer 1 weights:-  torch.Size([10])
Layer 2 weights:-  torch.Size([2, 10])


In [None]:
# We have seen that weights are registered as parameters of nn.module, let's see how we can access it without the help of layer
x = model.parameters() # A function present in nn.Module which returns an iterator over its parameters
iter = next(x)         # Parameters are just special tensors in nn.Module whose tracking is done by the network
print(iter)            # Parameter class extends from tensor class (So, Every parameter of network is just a tensor but special one)
                       # Parameter class also overrides __repr__(), that why we don't get tensor representation on printing the weight
                       # Note the extra line "Parameter containing:", this shows how weights are special tensors.

Parameter containing:
tensor([[ 0.0804, -0.1029,  0.2460,  0.0537, -0.1454],
        [ 0.0417,  0.2886, -0.3899,  0.0274,  0.0948],
        [ 0.0091,  0.1687, -0.3315,  0.1854, -0.4071],
        [ 0.3136, -0.0224,  0.2963, -0.3472, -0.1303],
        [ 0.1397,  0.2461, -0.0257, -0.3231, -0.0745],
        [-0.3723, -0.3247, -0.2792,  0.3368,  0.3552],
        [-0.2218, -0.1265,  0.1949, -0.4301, -0.0656],
        [-0.4380, -0.4365, -0.3661, -0.3785, -0.4132],
        [ 0.1279,  0.3702,  0.3009,  0.1315,  0.0290],
        [ 0.3986, -0.0950,  0.3000,  0.4077,  0.1804]], requires_grad=True)


In [None]:
# Let's do an intresting thing, forward propagation in multiple way.
input2 = torch.rand(100,5)

# Way 1 - Let model tackel the transformation
outputModel = model(input2)

# Way 2 - Extracting required tensor from the model and doing mathematical operation
outputL1 = input2.matmul(model.layer1.weight.t()) + model.layer1.bias.t() # Broadcasting
outputL2 = outputL1.matmul(model.layer2.weight.t()) + model.layer2.bias.t()
print(outputL2.eq(outputModel).sum() == outputModel.numel()) 

# Way 3 - Use the iterator which provides us with neccesary parameters(special tensors = weights n biases)
# Try to decode what an iterator means 
for i,params in enumerate(model.parameters()):
  if i%2 == 0:
    input2 = input2.matmul(params.t())
  else: 
    input2 = input2.add(params.t())
print(input2.eq(outputModel).sum() == outputModel.numel()) 

tensor(True)
tensor(True)
