# Build PyTorch CNN

__ML Pipeline__: Prepare data -> __build model__ -> train model -> analyze model's results

To build neural networks in PyTorch, we extend the `torch.nn.Module` PyTorch class. This means we need to utilize a little bit of object oriented programming (OOP) in Python.

## OOP Review

When we’re writing programs or building software, there are two key components, code and data. With object oriented programming, we orient our program design and structure around objects.

Objects are defined in code using classes. A class defines the object's specification or spec, which specifies what data and code each object of the class should have.

When we create an object of a class, we call the object an instance of the class, and all instances of a given class have two core components:
- Methods (code)
- Attributes (data)

The methods represent the code, while the attributes represent the data, and so the methods and attributes are defined by the class.

In a given program, many objects, a.k.a instances of a given class, can exist simultaneously, and all of the instances will have the same available attributes and the same available methods. They are uniform from this perspective.

The difference between objects of the same class is the values contained within the object for each attribute. Each object has its own attribute values. These values determine the internal state of the object. The code and data of each object is said to be encapsulated within the object.

In [1]:
class Lizard:
    def __init__(self, name):
        self.name = name

    def set_name(self, name):
        self.name = name

In [2]:
lizard = Lizard('Deer')
print(lizard.name)

Deer


In [3]:
lizard.set_name('DL')
print(lizard.name)

DL


## `torch.nn`

As we know, deep neural networks are built using multiple layers. This is what makes the network deep. Each layer in a neural network has two primary components:

* A transformation (code)
* A collection of weights (data)

Within the `nn` package, there is a class called `Module`, and it is the __base class__ for all of neural network modules which includes layers.

This means that all of the layers in PyTorch extend the `nn.Module` class and inherit all of PyTorch’s built-in functionality within the `nn.Module` class. 

In OOP this concept is known as __inheritance__.

In [4]:
import torch.nn as nn


### `forward()` method

When we pass a tensor to our network as input, the __tensor flows__ forward though each layer transformation until the tensor reaches the output layer. This process of a tensor flowing forward though the network is known as a __forward pass__.

Each layer has its own transformation (code) and the tensor passes forward through each layer. The composition of all the individual layer forward passes defines the overall forward pass transformation for the network. The goal of the overall transformation is to transform or map the input to the correct prediction output class, and during the training process, the layer weights (data) are updated in such a way that cause the mapping to adjust to make the output closer to the correct prediction. This is achieved efficiently by __backpropagation__.

What this all means is that, every PyTorch `nn.Module` has a `forward()` method, and so when we are building layers and networks, we must provide an implementation of the `forward()` method. The forward method is the actual transformation.

### `torch.nn.functional`

When we implement the `forward()` method of our `nn.Module` subclass, we will typically use functions from the `nn.functional` package. This package provides us with many neural network operations that we can use for building layers. In fact, many of the `nn.Module` layer classes use `nn.functional` functions to perform their operations.

The `nn.functional` package contains methods that __subclasses__ of `nn.Module` use for implementing their `forward()` functions. One reason for this is that during backpropagation, the network must perform a __symbolic differentiation__ of the operations involved in the layers to calculate the gradient of the loss with respect to the weights.

## Building a Neural Network in PyTorch

We now have enough information to provide an outline for building neural networks in PyTorch. The steps are as follows:

Short version:

- Extend the `nn.Module` base class.
- Define layers as class attributes.
- Implement the `forward()` method.


More detailed version:

- Create a neural network class that extends the `nn.Module` base class.
- In the class constructor, define the network’s layers as class attributes using pre-built layers from `torch.nn`.
- Use the network’s layer attributes as well as operations from the `nn.functional` API to define the network’s forward pass.

In [5]:
# a trivial neural network (zero layers)
class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init()
        self.layer = None

    def forward(self, t):
        t = self.layer(t)
        return t


Let’s replace this now with some real layers that come pre-built for us from PyTorch's `nn` library. We’re building a CNN, so the two types of layers we'll use are linear layers and convolutional layers.

In [6]:
class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10)

    def forward(self, t):
        return t

In [7]:
network = Network()
network

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)

We used the abbreviation `fc` in `fc1` and `fc2` because linear layers are also called fully connected layers. They also have a third name that we may hear sometimes called dense. So linear, dense, and fully connected are all ways to refer to the same type of layer. PyTorch uses the word linear, hence the `nn.Linear` class name.

We used the name `out` for the last linear layer because the last layer in the network is the output layer.

The above neural net has three hyperparameters that need to be manually specified:
* `kernel_size.`  size of each convolutional filter
* `out_channels` number of filters in the convolutional layer 
* `out_features` size of output tensor, i.e. the number of neurons in the dense layer

Having `out_features=10` on the final output layer is a data dependent hyperparameter, i.e. fixed due to the nature of the problem.

### CNN Layer Parameters

### Parameters vs Arguments

Well parameters are used in function definitions as place-holders while arguments are the actual values that are passed to the function. The parameters can be thought of as local variables that live inside a function.

In our network's case, the names are the parameters and the values that we have specified are the arguments.

### Two Types of Parameters

To better understand the argument values for these parameters, let's consider two categories or types of parameters that we used when constructing our layers.

- Hyperparameters
- Data dependent hyperparameters

A lot of terms in deep learning are used loosely, and the word parameter is one of them. Try not to let it through you off. The main thing to remember about any type of parameter is that the parameter is a place-holder that will eventually hold or have a value.

The goal of these particular categories is to help us remember how each parameter's value is decided.

When we construct a layer, we pass values for each parameter to the layer’s constructor. With our convolutional layers have three parameters and the linear layers have two parameters.

- Convolutional layers
    - in_channels
    - out_channels  - Sets the number of filters. One filter produces one output channel.
    - kernel_size   - Sets the filter size. The words kernel and filter are interchangeable.

- Linear layers
    - in_features
    - out_features - Sets the size of the output tensor.

#### Hyperparameters
In general, hyperparameters are parameters whose values are chosen manually and arbitrarily.

As neural network programmers, we choose hyperparameter values mainly based on trial and error and increasingly by utilizing values that have proven to work well in the past. For building our CNN layers, these are the parameters we choose manually.

- `kernel_size`
- `out_channels`
- `out_features`

This means we simply choose the values for these parameters. In neural network programming, this is pretty common, and we usually test and tune these parameters to find values that work best.

One pattern that shows up quite often is that we increase our out_channels as we add additional conv layers, and after we switch to linear layers we shrink our out_features as we filter down to our number of output classes.

#### Data Dependent Hyperparameters
Data dependent hyperparameters are parameters whose values are dependent on data. The first two data dependent hyperparameters that stick out are the `in_channels` of the first convolutional layer, and the `out_features` of the output layer.



| Layer 	| Param name   	| Param value 	| The param value is                                      	|
|-------	|--------------	|-------------	|---------------------------------------------------------	|
| conv1 	| in_channels  	| 1           	| the number of color channels in the input image.        	|
| conv1 	| kernel_size  	| 5           	| a hyperparameter.                                       	|
| conv1 	| out_channels 	| 6           	| a hyperparameter.                                       	|
| conv2 	| in_channels  	| 6           	| the number of out_channels in previous layer.           	|
| conv2 	| kernel_size  	| 5           	| a hyperparameter.                                       	|
| conv2 	| out_channels 	| 12          	| a hyperparameter (higher than previous conv layer).     	|
| fc1   	| in_features  	| 12*4*4      	| the length of the flattened output from previous layer. 	|
| fc1   	| out_features 	| 120         	| a hyperparameter.                                       	|
| fc2   	| in_features  	| 120         	| the number of out_features of previous layer.           	|
| fc2   	| out_features 	| 60          	| a hyperparameter (lower than previous linear layer).    	|
| out   	| in_features  	| 60          	| the number of out_channels in previous layer.           	|
| out   	| out_features 	| 10          	| the number of prediction classes.                       	|