Loss plateau detection #7

dereyly · 2016-10-05T17:45:15Z

Hello!
Sorry if my question is off-top.
Many interesting things i think about when read your paper.
One of this thing how to train own architecture from scratch
And idea to detect plateau is good. And you write that gives significant improvement. I analysis my graphics of accuracy and i want to try create "plateau detector".
I know that you use inhouse library. But maybe you give some advice how to create it in Caffe:

What interval (period, number of iteration) is good to analyze?
Train loss analyze or some constant validation set part?
What step policy is prefer -- 0.1 step, 0.2 step or 0.5 step?

zimenglan-sysu-512 · 2016-10-12T10:10:20Z

hi @dereyly can you share some experiences about how to set the plateau configurations? thanks.

xiaoxiongli · 2016-10-12T11:05:43Z

@dereyly How can i train in plateau mode?

sanghoon · 2016-10-12T16:52:52Z

Hi,
The caffe submodule in this repository contains a source of a plateau detector.
For the usages, please refer to BVLC/caffe#4606

Let me answer @dereyly 's question one by one

Detection windows start from 20000 and are doubled every x/10 LR.
For simplicity, we monitor training loss only.
If you're talking about 'gamma' (= LR decay ratio), we've used 0.1 and 0.3165 (which is sqrt of 1/10)

xiaoxiongli · 2016-10-17T10:40:27Z

@sanghoon Dear sanghoon, I wonder why you set the "plateau_winsize" are doubled every x/10 LR ?

plateau_winsize: 10000 suppose that here lr=0.01
plateau_winsize: 20000 0.001
plateau_winsize: 40000 0.0001
plateau_winsize: 80000 0.00001

another confuse: if plateau_winsize is 40000, does it means that we at least do 40000 iterators training(using the same lr 0.0001) regardless of the training loss "decrease/not decrease"?

dereyly · 2016-10-17T13:32:29Z

@sanghoon Thank you!
Plateau detector form caffe-fast-rcnn seems good enough
@zimenglan-sysu-512 , @xiaoxiongli
lr_policy: "plateau" from caffe-fast-rcnn better and simpler then my python layer that decrease gradient, it decrease bottom after loss function.
I try:
lr_policy: "plateau"
gamma: 0.33
plateau_winsize: 10000
plateau_winsize: 20000
plateau_winsize: 20000
plateau_winsize: 20000
plateau_winsize: 20000

but learning process not finished now

sanghoon · 2016-10-18T14:01:53Z

Hi @xiaoxiongli,
As loss converges, it seemed for me that fluctuations covers the slight improvements in training loss. That's why I wanted to set the increase window size as training continues.

Answering your second question.
that's true. For example, at least 40k iterations of the training will be doen with lr=0.0001.

zimenglan-sysu-512 · 2016-10-19T08:01:25Z

hi @dereyly have you finished your training? what about the performance? hi @sanghoon i found the using plateau lr policy need very large iterations (e.g. > 30w iterations) and the plateau_winsize variable will be increased and the first one to be 4w will be better.

dereyly · 2016-10-19T14:00:32Z

@zimenglan-sysu-512
Yes one of my experiment is finished, I lost about 0.5% acc in plateau mode vs schedule mod. Maybe I do some other experiments with smaller window size or continue experiments with validation loss. Now I have broken logs and hard to compare models throw iterations (

xiaoxiongli · 2016-10-20T04:05:09Z

@sanghoon thank you very much! I got it~^_^

@zimenglan-sysu-512 @dereyly when i use plateau training mode(VOC2007), the mAP is 0.6983, when i do not use this mode, the mAP is about 0.714.

totally iterators = 20w.

train_net: "models/pvanet/example_train_384/train.prototxt"

base_lr: 0.001
lr_policy: "plateau"
gamma: 0.1
plateau_winsize: 10000
plateau_winsize: 20000
plateau_winsize: 40000
plateau_winsize: 80000

display: 20
average_loss: 100
momentum: 0.9
weight_decay: 0.0002

We disable standard caffe solver snapshotting and implement our own snapshot

function

snapshot: 0

We still use the snapshot prefix, though

snapshot_prefix: "pvanet_frcnn_384"
iter_size: 2

sanghoon · 2016-10-21T16:37:49Z

Hi all,
I'd like to share how the network is trained.
I hope this helps you.

One more thing...
I've found there is a bug in the existing py-faster-rcnn code related to 'average_loss'
(the function doesn't work with the current codes)

If you want to train a network with 'plateau',
please checkout 'develop' branch which contains a hotfix for the issue.

ImageNet pre-training (1000 class)

The one you can get by running 'download_imagenet_models.sh'

COCO + VOC2007 + VOC2012 (80 class)

The resulted model is not currently available online

iter_size: 3
base_lr: 0.003
gamma: 0.3165
plateau_window:  50000    # 0.003165
plateau_window:  70700    # 0.001
plateau_window: 100000    # 0.0003165
# No significant improvement after this
plateau_window: 141400    # 0.0001
plateau_window: 200000    # 0.0000317

# Expected number of iterations: 1.2~2M

VOC2007 + VOC2012 (20 class)

The one you can get by running 'download_models.sh'

iter_size: 3
base_lr: 0.001
gamma: 0.1
plateau_window: 50000     # 0.001
# No significant improvement after this
plateau_window: 100000    # 0.0001

# Expected number of iterations: 0.5~1M

Po-Hsuan-Huang · 2017-06-05T08:28:26Z

@sanghoon
Dear sanghoon,
Thanks for the good work. I am trying to use pvanet to detect other classes. Do you also suggest use plateau for fine-tuning ? or Adam is more preferable according to your experience ?

Thank you.

sanghoon mentioned this issue Oct 12, 2016

The Mean AP is 0.7190 when I test the model trained by example_train_384, is normal? #10

Closed

sanghoon self-assigned this Oct 21, 2016

hengck23 mentioned this issue Oct 23, 2016

repeat experment for "example_train_384" #18

Closed

sanghoon closed this as completed Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss plateau detection #7

Loss plateau detection #7

dereyly commented Oct 5, 2016 •

edited

Loading

zimenglan-sysu-512 commented Oct 12, 2016

xiaoxiongli commented Oct 12, 2016

sanghoon commented Oct 12, 2016 •

edited

Loading

xiaoxiongli commented Oct 17, 2016 •

edited

Loading

dereyly commented Oct 17, 2016 •

edited

Loading

sanghoon commented Oct 18, 2016

zimenglan-sysu-512 commented Oct 19, 2016

dereyly commented Oct 19, 2016

xiaoxiongli commented Oct 20, 2016

sanghoon commented Oct 21, 2016

Po-Hsuan-Huang commented Jun 5, 2017

Loss plateau detection #7

Loss plateau detection #7

Comments

dereyly commented Oct 5, 2016 • edited Loading

zimenglan-sysu-512 commented Oct 12, 2016

xiaoxiongli commented Oct 12, 2016

sanghoon commented Oct 12, 2016 • edited Loading

xiaoxiongli commented Oct 17, 2016 • edited Loading

dereyly commented Oct 17, 2016 • edited Loading

sanghoon commented Oct 18, 2016

zimenglan-sysu-512 commented Oct 19, 2016

dereyly commented Oct 19, 2016

xiaoxiongli commented Oct 20, 2016

We disable standard caffe solver snapshotting and implement our own snapshot

function

We still use the snapshot prefix, though

sanghoon commented Oct 21, 2016

ImageNet pre-training (1000 class)

COCO + VOC2007 + VOC2012 (80 class)

VOC2007 + VOC2012 (20 class)

Po-Hsuan-Huang commented Jun 5, 2017

dereyly commented Oct 5, 2016 •

edited

Loading

sanghoon commented Oct 12, 2016 •

edited

Loading

xiaoxiongli commented Oct 17, 2016 •

edited

Loading

dereyly commented Oct 17, 2016 •

edited

Loading