<a href="https://colab.research.google.com/github/jiyewise/ML-with-PyTorch-Tutorials/blob/main/Introduction_to_torch_nn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### NN의 일반적인 학습 과정

신경망의 일반적인 학습 과정
* 학습 가능한 매개변수/weight 가지는 신경망을 정의
* 데이터셋 입력, 입력을 신경망에서 전파
   
   입력을 받아 여러 계층에 차례로 전달한 후, 최종 출력 제공
* 손실(loss) 계산
* gradient backpropagation
* update weight.
   
   new weight = weight - lr * gradient



In [2]:
# Example 1: CNN

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
      super(Net, self).__init__()
      # input image channel 1 (one color), output channel 6, kernel size: 3*3 conv matrix
      # convolutional kernel
      self.conv1 = nn.Conv2d(1, 6, 3)
      self.conv2 = nn.Conv2d(6, 16, 3)
      # affine calcuations: y = Wx + b
      self.fc1 = nn.Linear(16*6*6, 120) # we flatten the output of conv layers # 6*6: image size. - pooling 에서 cover
      self.fc2 = nn.Linear(120, 84)
      self.fc3 = nn.Linear (84, 10)
  
    def forward(self, x):
      # max pooling on 2*2 size window
      print((self.conv1(x)).size())
      x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
      print(x.size())
      print((self.conv2(x)).size())
      # if the size is in a n * n form, just write n
      x = F.max_pool2d(F.relu(self.conv2(x)), 2)
      print(x.size())
      x = x.view(-1, self.num_flat_features(x)) # flatten the vector
      x = F.relu(self.fc1(x))
      x = F.relu(self.fc2(x))
      x = self.fc3(x)
      return x

    def num_flat_features(self, x):
      size = x.size()[1:] # x.size(): torch.Size([1, 16, 6, 6]) the first elements stands for batch
      num_features = 1
      for s in size:
          num_features *= s
      return num_features

net = Net()
print(net)
input = torch.randn(1, 1, 32, 32) # input image size: 32 * 32
net.forward(input)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
torch.Size([1, 6, 30, 30])
torch.Size([1, 6, 15, 15])
torch.Size([1, 16, 13, 13])
torch.Size([1, 16, 6, 6])


tensor([[-0.1049, -0.0720,  0.0236, -0.0371,  0.0264, -0.0132, -0.0282, -0.0623,
          0.0011,  0.0080]], grad_fn=<AddmmBackward>)

### Example CNN 코드 설명

##### **Image Size**
1. input = torch.Size([1, 1, 32, 32): 1 batch, 1 color, 32x32 image

2. output 6 channel, kernel size 3x3인 conv1 거치고 나면 32-3+1 인 30x30 image

3. 2*2 size window의 max_pool2d 거치고 나면 30/2 = 15, 15x15 image

4. 마찬가지로  kernel size 3x3인 conv2 거치고 나면 15-3+1 인 13x13 image

5. 마찬가지로 2*2 size window의 max_pool2d 거치고 나면 13/2 = 6, 6x6 image

##### **Flattening the vector**
max_pool2d를 거치고 난 후 fully connected layer 통과하려면 2d vector가 1d vector 로 flatten 되어야 하는데, 이 부분을 계산하는 게 num_flat_features 함수이다.

주어진 텐서, e.g. [1,16,6,6]에서 첫번째 인자인 1은 batch size를 나타내는 거니까 제외하고 나머지 16x6x6 길이로 flatten 시킨다.

#####  **Mini-Batches in Torch.nn**
`torch.nn` 는 mini batch만 지원한다. 즉 torch.nn으로 들어오는 Input들은 모두 sample들의 Mini-batch 형태여야 한다. 위 코드의 경우, `nn.Conv2d`는 nSamples, nChannels, Height, Width 의 4D 텐서를 input으로 받을 것이다.

### Learning
##### **Parameters**
`net.parameters` return the learnable parameters of the given model.
##### **Backpropagation**
When the `forward` function is defined, the `backward` function is automatically defined for using `autograd`. 

In [4]:
# parametes of the neural network
params = list(net.parameters())
# print(params)
print(len(params))
for i in range(len(params)):
  print(params[i].size())

# 10
# torch.Size([6, 1, 3, 3])
# torch.Size([6])
# torch.Size([16, 6, 3, 3])
# torch.Size([16])
# torch.Size([120, 576])
# torch.Size([120])
# torch.Size([84, 120])
# torch.Size([84])
# torch.Size([10, 84])
# torch.Size([10])

# backpropagation with .backward
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

net.zero_grad() # zero the gradient buffers of all parameters
out.backward(torch.randn(1, 10)) # backprop with random gradients

10
torch.Size([6, 1, 3, 3])
torch.Size([6])
torch.Size([16, 6, 3, 3])
torch.Size([16])
torch.Size([120, 576])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])
torch.Size([1, 6, 30, 30])
torch.Size([1, 6, 15, 15])
torch.Size([1, 16, 13, 13])
torch.Size([1, 16, 6, 6])
tensor([[-0.1063, -0.0624,  0.0514, -0.0618,  0.0289,  0.0026, -0.0120, -0.0517,
         -0.0046,  0.0198]], grad_fn=<AddmmBackward>)
tensor([[-0.9720, -0.6361,  0.4538,  0.9858,  0.5082,  0.8630,  0.8153,  1.0377,
          0.8112, -0.1919]])


### Loss 
A loss function takes the (output, target) pair as an input and calculate the difference between.

Following `loss` in the backward direction, the graph of computations is as follows:
```
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
```

Therefore, when we call `loss.backward()`, the error is backpropagated. 