<a href="https://colab.research.google.com/github/punitarani/MAT-494/blob/master/3.7%20Neural%20Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 3.7 Neural Networks

A neural network is a collection of connected layers of nodes to loosely model the neurons in the brain.

### How does a Neural Network work?

- The input layer receives the input data.
- The hidden layers transform the input data into the output data.
- The output layer produces the output data.
- The weights and biases are updated using the backpropagation algorithm.
- The loss function is used to measure the performance of the model.
- The loss function is minimized using the gradient descent algorithm.
- The model is tuned using the hyperparameters.

## High-Level Model

$ \text{Input} \rightarrow \text{Hidden Layer} \rightarrow \text{Output} $

$ \text{Input} \rightarrow \Sigma f(x) \rightarrow \text{Output} $


## Neuron (Perceptron)

A Neural Network consists of Neurons, which are the basic building blocks of a neural network.
A Neuron is a single node in a neural network.
A Neuron is defined as:

$y_k = \varphi(\sum_{i=1}^n w_{ki} x_i + b_k)$

where:
- $y_k$ is the output of the $k$-th perceptron.
- $x_i$ is the input of the $i$-th perceptron.
- $w_{ki}$ is the weight of the $k$-th perceptron.
- $b_k$ is the bias of the $k$-th perceptron.
- $\varphi$ is the activation or transfer function.


### Activation Function

The activation function for a neuron is used to transform the input data into the output data.
The activation function is defined as:

$u = \sum_{i=1}^n w_i x_i + b$

$y = \varphi(u)$

where:
- $u$ is the weighted sum of the inputs.
- $y$ is the output of the neuron.
- $x_i$ is the input of the $i$-th neuron.
- $w_i$ is the weight of the $i$-th neuron.
- $b$ is the bias of the neuron.

#### Step Function

The step function is a simple activation function that returns 1 if the input is greater than 0, otherwise it returns 0.

$y = \begin{cases} 1 & \text{if } u > 0 \\ 0 & \text{otherwise} \end{cases}$

#### ReLU Function

ReLU stands for Rectified Linear Unit.
It is one of the most popular activation functions.
ReLU performs a linear transformation of the input data.

$y = \begin{cases} u & \text{if } u > 0 \\ 0 & \text{otherwise} \end{cases}$

#### Sigmoid Function

The sigmoid function is a smooth activation function that returns a value between 0 and 1.

$y = \frac{1}{1 + e^{-u}}$

#### Softmax Function

The softmax function is a smooth activation function that returns a probability distribution.

$y_k = \frac{e^{u_k}}{\sum_{i=1}^n e^{u_i}}$

It is often used in the final output layer, especially with classification problems.


## Demo

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

### Define Neural Network

In [2]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()

        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)

        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [3]:
net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [4]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 5, 5])


In [5]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.1158,  0.0415,  0.0989,  0.1751, -0.0513,  0.0245,  0.1228, -0.0626,
          0.1342, -0.0330]], grad_fn=<AddmmBackward0>)


In [6]:
net.zero_grad()
out.backward(torch.randn(1, 10))

### Loss Function

In [7]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.0853, grad_fn=<MseLossBackward0>)


In [8]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0])
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

<MseLossBackward0 object at 0x000001FDED70A680>
<AddmmBackward0 object at 0x000001FDED709E70>
<AccumulateGrad object at 0x000001FDED70A680>


### Backpropagation

In [9]:
net.zero_grad()

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 2.3788e-02, -2.7721e-03,  8.8365e-03,  5.4275e-03, -1.7853e-06,
        -2.6267e-02])


### Update weights

In [10]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

### Optimize

In [11]:
import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

optimizer.zero_grad()
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()