# Learning rate Schedule


<div align='right'> Hoe Sung Ryu ( 류 회 성 ) </div>
    
    
    
<img src=https://cs231n.github.io/assets/nn3/learningrates.jpeg width=50%>    
    
    
> Author: Hoe Sung Ryu, Minsuk Sung  <p>
> Tel: 010-6636-7275 / skainf23@gamil.com // 010-5134-3621 / mssung94@gmail.com  <p>
> 본 내용은 파이토치를 활용한 딥러닝 과외 자료입니다. 본 내용을 제작자의 동의없이 무단으로 복제하는 행위는 금합니다.
    
refer: https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/lr_scheduling/
    

---

Syllabus
    
|Event Type|Date|Topic|
|--:|:---:|:---|
|1 |July 27| Environment setting and Python basic|
|2 |July 28| Pytorch basic and Custom Data load |
|3 |July 29| Traditional Machine Learning(1) |
|4 |July 30| Traditional Machine Learning(2) |
|5 |July 31| CNN(Convolutional Neural Network)(1)  |
|6 |Aug 03| CNN(Convolutional NeuralNetwork)(2) |
|7 |Aug 04|  RNN(Recurrent Neural Networks)(1) |
|8 |Aug 05|  RNN(Recurrent Neural Networks)(2) |
|9 |Aug 06|  Transfer learning(VGG pertained on ImageNEt for CIfar-10)| 
|10|Aug 07|**Mini_Kaggle**: Facial Expression Recognition on `AffectNet` | 
|11|Aug 08|`Awards` and `Closing`| 

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Learning-Rate-Scheduling" data-toc-modified-id="Learning-Rate-Scheduling-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Learning Rate Scheduling</a></span></li><li><span><a href="#Step-wise-Decay" data-toc-modified-id="Step-wise-Decay-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Step-wise Decay</a></span></li><li><span><a href="#Step-wise-Decay:-Every-Epoch" data-toc-modified-id="Step-wise-Decay:-Every-Epoch-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Step-wise Decay: Every Epoch</a></span></li></ul></div>

## Learning Rate Scheduling 

- If we set `learning rate` to be a large value $\rightarrow$ learn too much(rapid leanring)

- If we set `learning rate` to be a small value $\rightarrow$ learn too little(slow learning)

<img src=https://www.deeplearningwizard.com/deep_learning/boosting_models_pytorch/images/lr1.png>

---

## Step-wise Decay

 You would want to decay your `Learning Rate` gradually add new import as blow:
 
```python
from torch.optim.lr_scheduler import StepLR
```

1. At every epoch,
 - $\eta_t = \eta_{t-1}*\gamma$, where $\gamma = 1e-1$
 
 
2. Practical example 
 - Given $\eta_t = 1e-4$ and $\gamma = 1e-1$
 - Epoch 0: $\eta_0= 1e-4$
 - Epoch 1: $\eta_1= 1e-4* (1e-1) = 1e-5 $ 
 - Epoch 2: $\eta_2= 1e-4* (1e-1)^2 = 1e-7 $  


In [None]:
model = nn.Linear(10, 2)
optimizer = optim.SGD(model.parameters(), lr=1.)
steps = 10
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

for epoch in range(5):
    #  update the learning rate for start_epoch timesb
    for idx in range(steps):
        scheduler.step()
        print(scheduler.get_lr())
    
    print('Reset scheduler')
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, steps)

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

# Set seed
torch.manual_seed(42)
from torch.optim.lr_scheduler import StepLR

In [2]:
'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

In [3]:
'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

In [5]:
'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity
        self.relu = nn.ReLU()
        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)  

    def forward(self, x):
        # Linear function
        out = self.fc1(x)
        # Non-linearity
        out = self.relu(out)
        # Linear function (readout)
        out = self.fc2(out)
        return out

In [6]:
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

In [7]:
'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

In [19]:
'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 1e-4

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## Step-wise Decay: Every Epoch

In [20]:
'''
STEP 7: INSTANTIATE STEP LEARNING SCHEDULER CLASS
'''
# step_size: at how many multiples of epoch you decay
# step_size = 1, after every 1 epoch, new_lr = lr*gamma 
# step_size = 2, after every 2 epoch, new_lr = lr*gamma 

# gamma = decaying factor
scheduler = StepLR(optimizer, step_size=1, gamma=1e-1)

In [21]:
'''
STEP 7: TRAIN THE MODEL
'''
iteration = 0

In [22]:
for epoch in range(num_epochs):
    # Decay Learning Rate
    scheduler.step()
    # Print Learning Rate
    print('Epoch:', epoch,'LR:', scheduler.get_lr())
    for i, (images, labels) in enumerate(train_loader):
        # Load images
        images = images.view(-1, 28*28).requires_grad_()

        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()

        # Forward pass to get output/logits
        outputs = model(images)

        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)

        # Getting gradients w.r.t. parameters
        loss.backward()

        # Updating parameters
        optimizer.step()

        iteration += 1

        if iteration % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = images.view(-1, 28*28)

                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)

                # Total number of labels
                total += labels.size(0)

                # Total correct predictions
                correct += (predicted == labels).sum()

            accuracy = 100 * correct / total

            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iteration, loss.item(), accuracy))

Epoch: 0 LR: [1.0000000000000002e-06]
Iteration: 500. Loss: 2.290219306945801. Accuracy: 12
Epoch: 1 LR: [1.0000000000000002e-07]
Iteration: 1000. Loss: 2.2748613357543945. Accuracy: 12
Epoch: 2 LR: [1.0000000000000004e-08]
Iteration: 1500. Loss: 2.3049936294555664. Accuracy: 12
Epoch: 3 LR: [1.0000000000000005e-09]
Iteration: 2000. Loss: 2.2977685928344727. Accuracy: 12
Epoch: 4 LR: [1.0000000000000006e-10]
Iteration: 2500. Loss: 2.2933199405670166. Accuracy: 12
Iteration: 3000. Loss: 2.3006865978240967. Accuracy: 12


<!-- ### Type of  Learning rate schedules 
- Reduce on Loss Plateau Decay 
Reduce on Loss Plateau Decay, Patience=0, Factor=0.1
Reduce learning rate whenever loss plateaus
- Patience: number of epochs with no improvement after which learning rate will be reduced
a -->