<a href="https://colab.research.google.com/github/suyeon-9706/MNIST/blob/master/Update_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MNIST
- Provided on Yann LeCun's website
- A simple computer vision data set
- It consists of handwritten images.
- 28 * 28 * 1 image, 1 channel gray image, 0~9 digits
- Each data is labeled 'What is the number of that data' with the data.






In [71]:
#@title Install pytorch
!pip install torch
!pip install torchvision



In [0]:
import torch
import torch.nn as nn # Packages for creating a Neural Network
import torch.nn.functional as F

import numpy 

# torch.autograd: A package that is central to the Neural network, providing automatic differentiation for all operations of the Tensor
from torch.autograd import Variable # Variable class: core class of autograd package

# torchvision: use for image classification training, easy to vision training
import torchvision.datasets as dsets # Data loader for datasets such as CIFAR10, MNIST, etc.
import torchvision.transforms as transf # A package that transforms images of PIL type into torch tensor type

In [0]:
#@title Hyperparameter
batch_size = 100
num_epochs = 20
num_classes = 10 # discrete range [0,9]
learning_rate = 1e-3


*batch_size*: the size of input data took for one iteration

*num_epochs*: number of times which the entire dataset is passed throughout the model

*num_classes*: number of output classes(MNIST label: discrete range [0,9])



In [0]:
#@title Loading MNIST data
mnist_train = dsets.MNIST(root='data/',
                          train=True, # train set
                          transform=transf.ToTensor(), # image to Tensor 
                          download=True) # If MNIST image does not exist in root, download data
mnist_test = dsets.MNIST(root='data/',
                         train=False, # val(test set)
                         transform=transf.ToTensor(), # image to Tensor
                         download=True) # If MNIST image does not exist in root, download data

In [0]:
#@title Create batch operator to enter data in batch units
# To update the parameters of a model: gradient descent(Update parameters after reporting batch size of data)
train_data = torch.utils.data.DataLoader(dataset=mnist_train,
                                         batch_size=batch_size, 
                                         shuffle=True) # shuffle data
test_data = torch.utils.data.DataLoader(dataset=mnist_test, 
                                        batch_size=batch_size, 
                                        shuffle=False) # don't shuffle data

In [76]:
#@title Define model(★Update★)
print("Define model...")

class Net(nn.Module):
  # Initialize all modules here(instantiate)
  def __init__(self, num_classes):    
    super(Net, self).__init__() # Always 'torch.nn.Module' inheritance, then start
    
    # input=28*28*1
    # padding=2 for same padding
    self.conv1 = nn.Conv2d(1, 32, 5, padding=2) # 1 input image channel, 32 output image channels 
    # feature map size is 14*14 by pooling
    # padding=2 for same padding
    self.conv2 = nn.Conv2d(32, 64, 5, padding=2) # 32 input image channels, 64 output image channels
    # feature map size is 7*7 by pooling
    self.fc1 = nn.Linear(7*7*64, 1024)
    self.fc2 = nn.Linear(1024, 10)
    
  # A function in which the model receives training data and proceeds to 'forward propagation'
  def forward(self, x):
    # Conv -> ReLU -> MaxPool
    out = F.max_pool2d(F.relu(self.conv1(x)), 2)
    out = F.max_pool2d(F.relu(self.conv2(out)), 2)
    
    out = out.view(-1, 7*7*64) # Conv --> F.C
    
    #Fully Connected
    out = self.fc1(out)
    out = F.relu(out)
    out = self.fc2(out)
    return F.log_softmax(out, dim=1)

Define model...


*num_classes*: number of output classes(MNIST label: discrete range [0,9])

*RELU() function*: a function treated as zero only for negative numbers, such as max(0, x)


---

Convolution Layer 만드는 방법

Conv -> ReLU -> MaxPool 이 3가지를 순서대로 진행!

In [0]:
#@title Define Train

class train(Net):
  def __init__(self, Net):
    print('\nStart training...')
    optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)
    
    for epoch in range(num_epochs):
      avg_loss = 0
      
      for i ,(images,labels) in enumerate(train_data):
        images = Variable(images).cuda()
        labels = Variable(labels).cuda()

        # grad init
        optimizer.zero_grad()
        # forward propagation
        outputs = net(images)
        # calculate loss
        loss = F.nll_loss(outputs, labels)
        # backpropagation(calc gradient)
        loss.backward()
        # weight update(update gradient)
        optimizer.step()

        avg_loss += loss.data / len(train_data)

      print('Epoch [%d/%d], Loss: %f' %(epoch+1, num_epochs, avg_loss))
      
    print('Finish training')

*epoch*: one forward pass and one backward pass of all  the training examples

*step*: Mnist에서 6만장을 한 번에 학습에 사용하면 좋겠지만, 메모리와 속도를 고려하여 데이터를 나누어 학습하는데  현재 batch Size가 100이므로 총 600개의 Batch를 얻을 수 있다.

In [0]:
#@title Define Test(Evaluating accuracy of the model)

class test():
  def __init__(self):
    correct = 0
    total = 0
    for images,labels in test_data:
      images = Variable(images).cuda()
      labels = Variable(labels).cuda()

      output = net(images)
      _, predicted = torch.max(output.data,1)
      correct += (predicted == labels).sum().item()
      total += labels.size(0)
    print('\nAccuracy of the model: %f %%' %(100*correct/total))

*torch.optim*: a package implementing various optimization algorithms. 

*torch.optim.Adam(params, lr=~)*:  An algorithm for first-order "gradient-based optimization" of stochastic objective functions, based on adaptive estimates of lower-order moments.

- *params(iterable)*: iterable of parameters to optimize or dicts defining parameter groups

- *lr(float, otional)*: learning rate(default: le-3) (1e-3 --> 1∗10^−3=0.001)

- straightforward to implement(1), computationally efficient(2), little memory requirements(3), well suited for problems that are large in terms of data and/or parameters(4)

In [0]:
#@title Count trainable parameters
class count_params():
  def count_parameters(self,Net):
    print('Count trainable parameters...')
    print('[Each layer of CNN has a weight parameter and a bias parameter.]')
    
    total_param = 0
    # nn.named_parameters(): Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
    for name, param in Net.named_parameters():
        if param.requires_grad: # If autograd should record operations on the returned tensor.
            num_param = numpy.prod(param.size()) # multiply each array element
            if param.dim() > 1:
                print(name, ':', 'x'.join(str(x) for x in list(param.size())), '=', num_param)
            else:
                print(name, ':', num_param)
            total_param += num_param
    return total_param

In [80]:
#@title Build the model

net = Net(num_classes)
print(net)

if torch.cuda.is_available():
  net.cuda()

Net(
  (conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (fc1): Linear(in_features=3136, out_features=1024, bias=True)
  (fc2): Linear(in_features=1024, out_features=10, bias=True)
)


*torch.cuda.is_available() function*: Returns a bool indicating if CUDA is currently available. (Verify that GPUs are available in given environment)

*cuda()*: Used to replace the existing Tensor with a data type that allows GPU operation


In [81]:
#@title Train & Test

# count parameter
p = count_params()
print('number of trainable parameters = ', p.count_parameters(net))

# train
net.train() # train 과정이라고 내부적으로 알려줌
train(net)

# test
net.eval() # test 과정이라고 내부적으로 알려줌
test()

Count trainable parameters...
[Each layer of CNN has a weight parameter and a bias parameter.]
conv1.weight : 32x1x5x5 = 800
conv1.bias : 32
conv2.weight : 64x32x5x5 = 51200
conv2.bias : 64
fc1.weight : 1024x3136 = 3211264
fc1.bias : 1024
fc2.weight : 10x1024 = 10240
fc2.bias : 10
number of trainable parameters =  3274634

Start training...
Epoch [1/20], Loss: 0.135739
Epoch [2/20], Loss: 0.040027
Epoch [3/20], Loss: 0.026287
Epoch [4/20], Loss: 0.017917
Epoch [5/20], Loss: 0.012644
Epoch [6/20], Loss: 0.011869
Epoch [7/20], Loss: 0.008842
Epoch [8/20], Loss: 0.007742
Epoch [9/20], Loss: 0.007229
Epoch [10/20], Loss: 0.006825
Epoch [11/20], Loss: 0.004909
Epoch [12/20], Loss: 0.004994
Epoch [13/20], Loss: 0.004073
Epoch [14/20], Loss: 0.005742
Epoch [15/20], Loss: 0.003609
Epoch [16/20], Loss: 0.005891
Epoch [17/20], Loss: 0.003650
Epoch [18/20], Loss: 0.002003
Epoch [19/20], Loss: 0.005269
Epoch [20/20], Loss: 0.002056
Finish training

Accuracy of the model: 99.310000 %


<__main__.test at 0x7fc7de403240>

# Conv layer's parameter

*Each symbol is defined as follows*

* Wc: Number of weights of the Conv layer

* Bc: Number of biases of the Conv layer

* Pc: Number of parameters of the Conv layer

* K: Size(width) of kernels used in the Conv layer

* N: Number of kernels

* C: Number of channels of the input image


---



> **Wc =  N  x  C  x  K^2** 

> **Bc = N**

> **Pc = Wc + Bc**


---

- The total number of parameters in the network is the sum of the number of conv layer parameters.

- In the Conv layer, the depth of all kernels is always equal to the number of channels in the input image.

- So every kernel has K^2 x C parameters, and there are N such kernels

# FC layer's parameter

*Each symbol is defined as follows*

- Wcf:  Number of weights of a FC layer which is connected to a Conv layer
- Bcf: Number of biases of a FC layer which is connected to a Conv layer
- Pcf: Number of parameters of a FC layer which is connected to a Conv layer
- O: Size(width) of th output image of the previous Conv layer
- N: Number of kernels in the previous Conv layer
- F: Number of neurons in the FC Layer
<br></br>
- Wff: Number of weights of a FC layer which is connected to a FC layer
- Bff: Number of biases of a FC layer which is connected to a FC layer
- Pff: Number of parameters of a FC layer which is connected to a FC layer
- F: Number of neurons in th FC layer
-F_prev: Number of neurons in the previous FC layer


---


**Case1: FC layer connected to a Conv layer**


> **Wcf = F  x  O^2  x  N** 

> **Bcf = F**

> **Pcf = Wcf + Bcf**
<br></br>

**Case2: FC layer connected to a FC Layer**


> **Wff = F  x  F_prev** 

> **Bff = F**

> **Pff = Wff + Bff**