Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

start decay after resume without "-continue" #76

Closed
vince62s opened this issue Jan 15, 2017 · 6 comments
Closed

start decay after resume without "-continue" #76

vince62s opened this issue Jan 15, 2017 · 6 comments

Comments

@vince62s
Copy link
Member

vince62s commented Jan 15, 2017

Here is the scenario.
first training:
5 epochs, lr 1.0, start decay 5 decay_rate 0.5

Then resume training with -"continue" works fine.
start_epoch 6 end_epoch 9 ==> lr of epoch 6 = 0.5

if I don't use "-continue"
lr of epoch 6 will start at 1 and not decay right away eventhough I am passing options.
I could always use -continue, but I guess one might want to change the decay rate after a resume.

clear ?

@guillaumekln
Copy link
Collaborator

The learning rate is only updated at the end of an epoch. So it will complete epoch 6 first, then a first decay will be applied.

@vince62s
Copy link
Member Author

well if I specify start_epoch 6 end_epoch 9 start_decay 5 decay_rate 0.5
without continue it should decay right away, no ?
Otherwise we have to manually adapt the initial learning rate.

@guillaumekln
Copy link
Collaborator

Why are you not using -continue in this case?

I believe the current behavior is correct. For example if -start_decay 2 -start_epoch 6 are set, do you expect the code to replay to entire decay history?

@vince62s
Copy link
Member Author

then to avoid confusion, if continue is not set and if you load from an existing model, I f would suggest we throw an error if start_decay_at if less or equal to start_epoch.
if you reset the lr and decay, I don't exactly undertstand the possibility to start from epoch X when load an existing model. do you see my point?

@guillaumekln
Copy link
Collaborator

Unless I'm missing something, that is reason we introduced the -continue option, to continue exactly where a checkpoint left off.

When you don't use -continue, it is actually a new training which uses the parameters from a checkpoint independently from their optimization history. It has other use cases like if you want to change the data and set a higher learning rate or change the optimization method.

@vince62s
Copy link
Member Author

ok best to discuss this on the forum.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants