In [14]:
import paddle
import paddle.vision.transforms as T
from paddle.static import InputSpec

inputs = [InputSpec([-1, 1, 28, 28], 'float32', 'image')]
labels = [InputSpec([None, 1], 'int64', 'label')]

transform = T.Compose([
    T.Transpose(),
    T.Normalize([127.5], [127.5])
])
train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=transform)
eval_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)
lenet = paddle.vision.models.LeNet()
model = paddle.Model(lenet,
                     inputs, labels)

base_lr = 1e-3
boundaries = [5, 8]
wamup_steps = 4

def make_optimizer(parameters=None):
    momentum = 0.9
    weight_decay = 5e-4
    values = [base_lr * (0.1**i) for i in range(len(boundaries) + 1)]
    print('values',values)
    print('boundaries',boundaries)
    learning_rate = paddle.optimizer.lr.PiecewiseDecay(boundaries=boundaries, values=values,verbose=True)
    learning_rate = paddle.optimizer.lr.LinearWarmup(
        learning_rate=learning_rate,
        warmup_steps=wamup_steps,
        start_lr=base_lr / 5.,
        end_lr=base_lr,
        verbose=True)
    optimizer = paddle.optimizer.Momentum(
        learning_rate=learning_rate,
        weight_decay=weight_decay,
        momentum=momentum,
        parameters=parameters)
    return optimizer

optim = make_optimizer(parameters=lenet.parameters())
model.prepare(optimizer=optim,
              loss=paddle.nn.CrossEntropyLoss(),
              metrics=paddle.metric.Accuracy())




values [0.001, 0.0001, 1.0000000000000003e-05]
boundaries [5, 8]
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 0: LinearWarmup set learning rate to 0.0002.


In [15]:
# if LRScheduler callback not set, an instance LRScheduler update by step
# will be created auto.
callbacks=[]
model.fit(train_dataset, batch_size=128)

The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/1
Epoch 1: LinearWarmup set learning rate to 0.0004.
Epoch 2: LinearWarmup set learning rate to 0.0006000000000000001.
Epoch 3: LinearWarmup set learning rate to 0.0008.
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 4: LinearWarmup set learning rate to 0.001.
Epoch 1: PiecewiseDecay set learning rate to 0.001.
Epoch 5: LinearWarmup set learning rate to 0.001.
Epoch 2: PiecewiseDecay set learning rate to 0.001.
Epoch 6: LinearWarmup set learning rate to 0.001.
Epoch 3: PiecewiseDecay set learning rate to 0.001.
Epoch 7: LinearWarmup set learning rate to 0.001.
Epoch 4: PiecewiseDecay set learning rate to 0.001.
Epoch 8: LinearWarmup set learning rate to 0.001.
Epoch 5: PiecewiseDecay set learning rate to 0.0001.
Epoch 9: LinearWarmup set learning rate to 0.0001.
step  10/469 - loss: 2.5466 - acc: 0.0867 - 39ms/step
Epoch 6: PiecewiseDecay se

# 以上结果解释
最开始看到也是不明白
```python
base_lr = 1e-3
boundaries = [5, 8]
wamup_steps = 4

def make_optimizer(parameters=None):
    momentum = 0.9
    weight_decay = 5e-4
    values = [base_lr * (0.1**i) for i in range(len(boundaries) + 1)]
    print('values',values)
    print('boundaries',boundaries)
    learning_rate = paddle.optimizer.lr.PiecewiseDecay(boundaries=boundaries, values=values,verbose=True)
    learning_rate = paddle.optimizer.lr.LinearWarmup(
        learning_rate=learning_rate,
        warmup_steps=wamup_steps,
        start_lr=base_lr / 5.,
        end_lr=base_lr,
        verbose=True)
```
输出结果如下：
```plain

values [0.001, 0.0001, 1.0000000000000003e-05]
boundaries [5, 8]
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 0: LinearWarmup set learning rate to 0.0002.

The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/1
Epoch 1: LinearWarmup set learning rate to 0.0004.
Epoch 2: LinearWarmup set learning rate to 0.0006000000000000001.
Epoch 3: LinearWarmup set learning rate to 0.0008.
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 4: LinearWarmup set learning rate to 0.001.
Epoch 1: PiecewiseDecay set learning rate to 0.001.
Epoch 5: LinearWarmup set learning rate to 0.001.
Epoch 2: PiecewiseDecay set learning rate to 0.001.
Epoch 6: LinearWarmup set learning rate to 0.001.
Epoch 3: PiecewiseDecay set learning rate to 0.001.
Epoch 7: LinearWarmup set learning rate to 0.001.
Epoch 4: PiecewiseDecay set learning rate to 0.001.
Epoch 8: LinearWarmup set learning rate to 0.001.
Epoch 5: PiecewiseDecay set learning rate to 0.0001.
Epoch 9: LinearWarmup set learning rate to 0.0001.
step  10/469 - loss: 2.5466 - acc: 0.0867 - 39ms/step
Epoch 6: PiecewiseDecay set learning rate to 0.0001.
Epoch 10: LinearWarmup set learning rate to 0.0001.
Epoch 7: PiecewiseDecay set learning rate to 0.0001.
Epoch 11: LinearWarmup set learning rate to 0.0001.
Epoch 8: PiecewiseDecay set learning rate to 1.0000000000000003e-05.
Epoch 12: LinearWarmup set learning rate to 1.0000000000000003e-05.
```
看结果比较奇怪，为何是这种结果？
其中LinearWarmup的代码如下：
```python
def get_lr(self):
        if self.last_epoch < self.warmup_steps:
            return (self.end_lr - self.start_lr) * float(
                self.last_epoch) / float(self.warmup_steps) + self.start_lr
        else:
            if isinstance(self.learning_rate, LRScheduler):
                lr_value = self.learning_rate()
                self.learning_rate.step()
                return lr_value

            return self.learning_rate

```
从get_lr可以看到当代数小于预热代数时，也就是4时，学习率是线性预热中定义的线性学习率，0代是2e-4,然后在1、2、3代依次为4e-4、6e-4、8e-4。注意这时候分段学习率PiecewiseDecay仍为0代，和LinearWarmup处于不同代。
从4代开始，转到else部分，从PiecewiseDecay取学习率，分段学习率为
```plain
values [0.001, 0.0001, 1.0000000000000003e-05]
boundaries [5, 8]
```
代数<5为0.001，代数在\[5,8)之间为e-4，大于8代为e-5。
LinearWarmup为4代，PiecewiseDecay为0代，学习率为0.001
所以有如下结果
```palin
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 4: LinearWarmup set learning rate to 0.001.

Epoch 1: PiecewiseDecay set learning rate to 0.001.
Epoch 5: LinearWarmup set learning rate to 0.001.

Epoch 2: PiecewiseDecay set learning rate to 0.001.
Epoch 6: LinearWarmup set learning rate to 0.001.

Epoch 3: PiecewiseDecay set learning rate to 0.001.
Epoch 7: LinearWarmup set learning rate to 0.001.

Epoch 4: PiecewiseDecay set learning rate to 0.001.
Epoch 8: LinearWarmup set learning rate to 0.001.

Epoch 5: PiecewiseDecay set learning rate to 0.0001.
Epoch 9: LinearWarmup set learning rate to 0.0001.

Epoch 6: PiecewiseDecay set learning rate to 0.0001.
Epoch 10: LinearWarmup set learning rate to 0.0001.

Epoch 7: PiecewiseDecay set learning rate to 0.0001.
Epoch 11: LinearWarmup set learning rate to 0.0001.

Epoch 8: PiecewiseDecay set learning rate to 1.0000000000000003e-05.
Epoch 12: LinearWarmup set learning rate to 1.0000000000000003e-05.
```

In [16]:
optim = make_optimizer(parameters=lenet.parameters())
model.prepare(optimizer=optim,
              loss=paddle.nn.CrossEntropyLoss(),
              metrics=paddle.metric.Accuracy())
# create a learning rate scheduler update by epoch
callback = [paddle.callbacks.LRScheduler(by_step=False, by_epoch=True),
            paddle.callbacks.VisualDL(log_dir='logdir')]
model.fit(train_dataset, eval_data=eval_dataset,batch_size=64, callbacks=callback)

values [0.001, 0.0001, 1.0000000000000003e-05]
boundaries [5, 8]
Epoch 0: PiecewiseDecay set learning rate to 0.001.
Epoch 0: LinearWarmup set learning rate to 0.0002.
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/1
step  10/938 - loss: 1.9463 - acc: 0.3438 - 25ms/step
step  20/938 - loss: 1.8681 - acc: 0.3430 - 23ms/step
step  30/938 - loss: 1.8197 - acc: 0.3557 - 22ms/step
step  40/938 - loss: 1.7500 - acc: 0.3730 - 21ms/step
step  50/938 - loss: 1.6359 - acc: 0.3922 - 21ms/step
step  60/938 - loss: 1.3407 - acc: 0.4125 - 21ms/step
step  70/938 - loss: 1.5479 - acc: 0.4317 - 21ms/step
step  80/938 - loss: 1.4480 - acc: 0.4465 - 21ms/step
step  90/938 - loss: 1.3629 - acc: 0.4663 - 21ms/step
step 100/938 - loss: 1.2381 - acc: 0.4808 - 21ms/step
step 110/938 - loss: 1.3117 - acc: 0.4950 - 21ms/step
step 120/938 - loss: 1.2857 - acc: 0.5076 - 22ms/step
step 130/938 - loss: 1.1125 - acc: 0.5206 - 22

In [None]:
!pip show paddlepaddle-gpu