Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss plateau detection #7

Closed
dereyly opened this issue Oct 5, 2016 · 11 comments
Closed

Loss plateau detection #7

dereyly opened this issue Oct 5, 2016 · 11 comments
Assignees

Comments

@dereyly
Copy link

dereyly commented Oct 5, 2016

Hello!
Sorry if my question is off-top.
Many interesting things i think about when read your paper.
One of this thing how to train own architecture from scratch
And idea to detect plateau is good. And you write that gives significant improvement. I analysis my graphics of accuracy and i want to try create "plateau detector".
I know that you use inhouse library. But maybe you give some advice how to create it in Caffe:

  1. What interval (period, number of iteration) is good to analyze?
  2. Train loss analyze or some constant validation set part?
  3. What step policy is prefer -- 0.1 step, 0.2 step or 0.5 step?
@zimenglan-sysu-512
Copy link

hi @dereyly can you share some experiences about how to set the plateau configurations? thanks.

@xiaoxiongli
Copy link

@dereyly How can i train in plateau mode?

@sanghoon
Copy link
Owner

sanghoon commented Oct 12, 2016

Hi,
The caffe submodule in this repository contains a source of a plateau detector.
For the usages, please refer to BVLC/caffe#4606

Let me answer @dereyly 's question one by one

  1. Detection windows start from 20000 and are doubled every x/10 LR.
  2. For simplicity, we monitor training loss only.
  3. If you're talking about 'gamma' (= LR decay ratio), we've used 0.1 and 0.3165 (which is sqrt of 1/10)

@xiaoxiongli
Copy link

xiaoxiongli commented Oct 17, 2016

@sanghoon Dear sanghoon, I wonder why you set the "plateau_winsize" are doubled every x/10 LR ?

plateau_winsize: 10000 suppose that here lr=0.01
plateau_winsize: 20000 0.001
plateau_winsize: 40000 0.0001
plateau_winsize: 80000 0.00001

another confuse: if plateau_winsize is 40000, does it means that we at least do 40000 iterators training(using the same lr 0.0001) regardless of the training loss "decrease/not decrease"?

@dereyly
Copy link
Author

dereyly commented Oct 17, 2016

@sanghoon Thank you!
Plateau detector form caffe-fast-rcnn seems good enough
@zimenglan-sysu-512 , @xiaoxiongli
lr_policy: "plateau" from caffe-fast-rcnn better and simpler then my python layer that decrease gradient, it decrease bottom after loss function.
I try:
lr_policy: "plateau"
gamma: 0.33
plateau_winsize: 10000
plateau_winsize: 20000
plateau_winsize: 20000
plateau_winsize: 20000
plateau_winsize: 20000

but learning process not finished now

@sanghoon
Copy link
Owner

Hi @xiaoxiongli,
As loss converges, it seemed for me that fluctuations covers the slight improvements in training loss. That's why I wanted to set the increase window size as training continues.

Answering your second question.
that's true. For example, at least 40k iterations of the training will be doen with lr=0.0001.

@zimenglan-sysu-512
Copy link

hi @dereyly have you finished your training? what about the performance? hi @sanghoon i found the using plateau lr policy need very large iterations (e.g. > 30w iterations) and the plateau_winsize variable will be increased and the first one to be 4w will be better.

@dereyly
Copy link
Author

dereyly commented Oct 19, 2016

@zimenglan-sysu-512
Yes one of my experiment is finished, I lost about 0.5% acc in plateau mode vs schedule mod. Maybe I do some other experiments with smaller window size or continue experiments with validation loss. Now I have broken logs and hard to compare models throw iterations (

@xiaoxiongli
Copy link

@sanghoon thank you very much! I got it~^_^

@zimenglan-sysu-512 @dereyly when i use plateau training mode(VOC2007), the mAP is 0.6983, when i do not use this mode, the mAP is about 0.714.

totally iterators = 20w.

train_net: "models/pvanet/example_train_384/train.prototxt"

base_lr: 0.001
lr_policy: "plateau"
gamma: 0.1
plateau_winsize: 10000
plateau_winsize: 20000
plateau_winsize: 40000
plateau_winsize: 80000

display: 20
average_loss: 100
momentum: 0.9
weight_decay: 0.0002

We disable standard caffe solver snapshotting and implement our own snapshot

function

snapshot: 0

We still use the snapshot prefix, though

snapshot_prefix: "pvanet_frcnn_384"
iter_size: 2

@sanghoon
Copy link
Owner

Hi all,
I'd like to share how the network is trained.
I hope this helps you.

One more thing...
I've found there is a bug in the existing py-faster-rcnn code related to 'average_loss'
(the function doesn't work with the current codes)

If you want to train a network with 'plateau',
please checkout 'develop' branch which contains a hotfix for the issue.

ImageNet pre-training (1000 class)

  • The one you can get by running 'download_imagenet_models.sh'

COCO + VOC2007 + VOC2012 (80 class)

  • The resulted model is not currently available online
iter_size: 3
base_lr: 0.003
gamma: 0.3165
plateau_window:  50000    # 0.003165
plateau_window:  70700    # 0.001
plateau_window: 100000    # 0.0003165
# No significant improvement after this
plateau_window: 141400    # 0.0001
plateau_window: 200000    # 0.0000317

# Expected number of iterations: 1.2~2M

VOC2007 + VOC2012 (20 class)

  • The one you can get by running 'download_models.sh'
iter_size: 3
base_lr: 0.001
gamma: 0.1
plateau_window: 50000     # 0.001
# No significant improvement after this
plateau_window: 100000    # 0.0001

# Expected number of iterations: 0.5~1M

@Po-Hsuan-Huang
Copy link

@sanghoon
Dear sanghoon,
Thanks for the good work. I am trying to use pvanet to detect other classes. Do you also suggest use plateau for fine-tuning ? or Adam is more preferable according to your experience ?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants