Best practices question: decreasing learning rates between epochs #898

sergeyf · 2015-10-26T18:15:30Z

Howdy,

In published papers I often see that the learning rates are decreased after some hundreds of epochs when learning stalls. What is the best way to do this in Keras? Thus far, I have been recompiling, but (not knowing if there is a better way), that seems foolish.

An example:

First, I build some model and train it.

model = Sequential()
# insert model here
optimizer = adagrad(lr=0.01)
model.compile(optimizer=optimizer)
model.fit(X,y,nb_epoch=50)

UPDATE -- the following works without having to recompile

K.set_value(model.optimizer.lr, 0.001)
model(X,y,nb_epoch=50)

Thank you to @EderSantana for the quick reply.

The text was updated successfully, but these errors were encountered:

EderSantana · 2015-10-26T18:37:04Z

@sergeyf check the solution in this #888
Please leave this issue open, even if it solves the problem for you. This is the second time we get this question, which means we need a better documentation. Since I'm already working on something else, would anybody else volunteer to write the documentation? We should close this after somebody writes the docs.

sergeyf · 2015-10-26T18:41:24Z

Thank you very much! I will leave this open.

jiumem · 2015-10-27T10:55:42Z

This code may be work, to be tested...

class LrReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, reduce_nb=10, verbose=1):
        super(Callback, self).__init__()
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.current_reduce_nb = 0
        self.reduce_nb = reduce_nb
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get('val_acc')
        if current_score > self.best_score:
            self.best_score = current_score
            self.wait = 0
            if self.verbose > 0:
                print('---current best val accuracy: %.3f' % current_score)
        else:
            if self.wait >= self.patience:
                self.current_reduce_nb += 1
                if self.current_reduce_nb <= 10:
                    lr = self.model.optimizer.lr.get_value()
                    self.model.optimizer.lr.set_value(lr*self.reduce_rate)
                else:
                    if self.verbose > 0:
                        print("Epoch %d: early stopping" % (epoch))
                    self.model.stop_training = True
            self.wait += 1

sergeyf · 2015-10-27T19:48:48Z

Thanks @jiumem! This might be a good pull request to Keras?

NickShahML · 2015-11-06T15:24:38Z

@sergeyf I just saw this thread, and I'd thought I'd throw in my own function I made to address this. I always use nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
    if activate_halving_learning_rate and (learning_rate>=0.0001):
        if counter == 0:
            new_loss = hist.history['loss'][0]
            if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
                learning_rate = float(learning_rate)/float(2)
                print 'you readjusted the learning rate to', learning_rate
                with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
                    with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

            past_loss = new_loss
        return (learning_rate, new_loss, past_loss)

sergeyf · 2015-11-06T18:01:41Z

Awesome, thanks!

On Fri, Nov 6, 2015 at 7:24 AM, LeavesBreathe notifications@github.com
wrote:

@sergeyf https://github.com/sergeyf I just saw this thread, and I'd
thought I'd throw in my own function I made to address this. I always use
nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
if activate_halving_learning_rate and (learning_rate>=0.0001):
if counter == 0:
new_loss = hist.history['loss'][0]
if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
learning_rate = float(learning_rate)/float(2)
print 'you readjusted the learning rate to', learning_rate
with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
        past_loss = new_loss
    return (learning_rate, new_loss, past_loss)
—
Reply to this email directly or view it on GitHub
#898 (comment).

fenstea · 2016-02-23T12:59:39Z

I have a problem using the solution like

model.optimizer.lr.set_value(0.01)
model(X,y,nb_epoch=50)

with tensorflow backend.
Can't do set_value and get_value as it was discussed here and in another thread.

return model_systole.optimizer.lr.get_value()
AttributeError: 'Tensor' object has no attribute 'get_value'

Any suggestions please?

entron · 2016-04-01T08:56:23Z

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

jayinai · 2016-06-11T19:03:51Z

@entron when does the .set_value() happen? after every epoch?

entron · 2016-06-11T20:17:43Z

@shuaiw you can put the line K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr)) inside the epoch loop to set lr at each epoch.

jayinai · 2016-06-11T21:41:53Z

@entron thanks for your response.

my model doesn't have an epoch loop; instead it's like this:

N_EPOCH = 100

model = Model(..)

model.compile(...)

model.fit(X, y, batch_size=64, nb_epoch=N_EPOCH, verbose=1, shuffle=True, callbacks=...)

Anyway to fit in?

entron · 2016-06-12T07:59:17Z

Maybe you can set N_EPOCH=1 and loop outside.

greg-robinson · 2016-07-05T22:27:42Z

So I had problems with the model.fit_generator function, so I decided to use model.fit instead and put it inside a for loop like so:

for x, y in generate_arrays_from_file(): x = model.fit(x, y, batch_size=16, nb_epoch=1, verbose=1)

Here are my questions:

I am using Adam for my optimizer, and I saw on another thread that it is impossible to directly get the current learning rate. You have to calculate it yourself indirectly.
http://stackoverflow.com/questions/37091751/keras-learning-rate-not-changing-despite-decay-in-sgd
Since nb_epoch is only 1 in the above function, and model.fit is inside a loop, is the learning rate guaranteed to decrease (that's my understanding of how Adam works) or do I have to write a separate function to manually decrease the learning rate myself?
My loss initially decreases quite rapidly, but then appears to fluctuate and stop decreasing even after several days of training. This is true both when I train on single images and multi-channel images. Since the loss is fluctuating after every call of model.fit, will that screw up Adam's calculations since it relies on the number of iterations and previous losses, or is all of this taken care of by Theano?

I am unfamiliar with Theano, and I do not have time at this point to learn about it so any information you can provide on both these questions is much appreciated.

Thanks.

ishank26 · 2016-09-08T05:48:43Z

@sergeyf @shuaiw A more simpler solution for decay after specified epochs.

class decay_lr(Callback):
    ''' 
        n_epoch = no. of epochs after decay should happen.
        decay = decay value
    '''  
    def __init__(self, n_epoch, decay):
        super(decay_lr, self).__init__()
        self.n_epoch=n_epoch
        self.decay=decay

    def on_epoch_begin(self, epoch, logs={}):
        old_lr = self.model.optimizer.lr.get_value()
        if epoch > 1 and epoch%self.n_epoch == 0 :
            new_lr= self.decay*old_lr
            k.set_value(self.model.optimizer.lr, new_lr)
        else:
            k.set_value(self.model.optimizer.lr, old_lr)



decaySchedule=decay_lr(10, 0.95)

You can use this directly without the epoch loop.

anewlearner · 2016-10-21T03:11:58Z

In my case, the learning rate was supposed to be decayed by specific iterations. In a theano backend keras, I can do using the following code:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5000
        lr = optimizer.lr.get_value()
        iterations=optimizer.iterations.get_value()
        if iterations % iteration == 1:
            lr_now = np.array(lr * rate, dtype= 'float32')
            optimizer.lr.set_value(lr_now) 
            print('Ir reduced from %f to %f' % (lr, lr_now))

But when I changed to a tensorflow backend keras, I can not use above code, beacause optimizer.lr is a tensorflow variable, and there is no get_value(). So I changed the code as follows:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5
        lr = optimizer.lr 

        init_op = tf.initialize_all_variables()
        sess = tf.Session()
        sess.run(init_op)
        iterations=optimizer.iterations 
        lr_ori=sess.run(lr)
#        print('iter:', sess.run(iterations))  # this always prints 0 
        if iterations % iteration == 0:
            a= np.array(lr*rate, dtype='float32')
            optimizer.lr.set_value(a) 
            lr_now=sess.run(optimizer.lr)             
            print('Ir reduced from %f to %f' % (lr_ori, lr_now))

But it did not work. I found that the optimizer.iterations was always 0. So the learning rate will not chagne. Could someone help me to solve this?
Thanks!

Carol

alalbiol · 2016-10-30T12:30:37Z

Another option could be to use the LearningRateScheduler that
I found in the Keras documentation:
https://keras.io/callbacks/

You can use the schedule function that best fits your needs

FedericoMuciaccia · 2017-01-21T14:03:24Z

@sergeyf please update the answer inside your initial question, because model.optimizer.lr.set_value() is no longer valid.
the actual method should be model.optimizer.lr.assign(your_learning_rate).
this also solves the problem of @Rusianka

sergeyf · 2017-01-21T18:50:55Z

@FedericoMuciaccia Thanks, I did as you suggested.

marc-moreaux · 2017-03-13T01:30:32Z

With TF backend, I did this (for inception-V3)

from keras.callbacks import LearningRateScheduler

def scheduler(epoch):
    if epoch%2==0 and epoch!=0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

model.fit_generator(train_gen, (nb_train_samples//batch_size)*batch_size,
                  nb_epoch=100, verbose=1,
                  validation_data=valid_gen,    nb_val_samples=val_size,
                  callbacks=[lr_decay])

EDIT

I'm happy it helped.
What I use now is the following :

from keras.callbacks import LearningRateScheduler

def lr_decay_callback(lr_init, lr_decay):
    def step_decay(epoch):
        return lr_init * (lr_decay ** (epoch + 1))
    return LearningRateScheduler(step_decay)

lr_decay =  lr_decay_callback(lr_init, lr_decay)

# callback=[lr_decay, ]

stale · 2017-06-11T02:32:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

ViaFerrata · 2017-09-13T13:24:25Z

@FedericoMuciaccia @sergeyf
I think the syntax has changed again (using TF backend).

    adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    model.optimizer.lr.assign(100)
    # -> still trains perfectly, lr is not changed

Keras doesn't throw an exception, but the lr doesn't change anyways.

Fortunately, the backend method still works:

K.set_value(model.optimizer.lr, 100)

Edit:

By the way, is it also possible to change the learning rate during an epoch (e.g. after 1000 batches) while looping over fit_generator?
E.g. like

from keras.callbacks import LearningRateScheduler
def scheduler(batch_number):
    if batch_number % 1000 == 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

while 1:
    model.fit_generator(gen, epochs=1, steps_per_epoch=int(filesize/batchsize), callbacks=[lr_decay])
    # do some other stuff in between epochs

Would be very useful when the data sample is large and/or the network is deep such that one epoch takes about 24h.

ghShu · 2017-12-04T19:05:05Z

The "decay" option in the optimizer seems to be designed for learning rate decay. I did not see any suggestion on using this in the discussion. Could someone please comment on the use (or not use) the "decay" option?
adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

eduardramon · 2017-12-05T16:51:09Z

It does learning rate decay:
https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L420

This functionality was introduced roughly one year ago in within this commit:
b2e8d5a

Issam28 · 2018-01-05T19:59:22Z

when using rate decay in SGD do the optimizer.iterations reset at each epoch?

Ostnie · 2018-08-06T12:50:44Z

@marc-moreaux
Why you change your code ?Does the code before you edit can be used rightly ? I don't understand the late one clearly

casperdcl · 2020-04-16T11:21:12Z

FYI updated issue and much simpler solution at #5724 (comment)

stale bot added the stale label Jun 11, 2017

stale bot closed this as completed Jul 11, 2017

beckstev mentioned this issue Jun 16, 2019

Reduction of the Learning rate isn't working yet beckstev/MachineLearningSeminar#10

Closed

fchollet pushed a commit that referenced this issue Sep 22, 2023

syntax fix (#898)

1e6d1eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices question: decreasing learning rates between epochs #898

Best practices question: decreasing learning rates between epochs #898

sergeyf commented Oct 26, 2015 •

edited

EderSantana commented Oct 26, 2015

sergeyf commented Oct 26, 2015

jiumem commented Oct 27, 2015

sergeyf commented Oct 27, 2015

NickShahML commented Nov 6, 2015

sergeyf commented Nov 6, 2015

fenstea commented Feb 23, 2016

entron commented Apr 1, 2016

jayinai commented Jun 11, 2016

entron commented Jun 11, 2016

jayinai commented Jun 11, 2016 •

edited

entron commented Jun 12, 2016

greg-robinson commented Jul 5, 2016 •

edited

ishank26 commented Sep 8, 2016 •

edited

anewlearner commented Oct 21, 2016 •

edited

alalbiol commented Oct 30, 2016

FedericoMuciaccia commented Jan 21, 2017

sergeyf commented Jan 21, 2017

marc-moreaux commented Mar 13, 2017 •

edited

stale bot commented Jun 11, 2017

ViaFerrata commented Sep 13, 2017 •

edited

ghShu commented Dec 4, 2017 •

edited

eduardramon commented Dec 5, 2017 •

edited

Issam28 commented Jan 5, 2018

Ostnie commented Aug 6, 2018 •

edited

casperdcl commented Apr 16, 2020 •

edited

Best practices question: decreasing learning rates between epochs #898

Best practices question: decreasing learning rates between epochs #898

Comments

sergeyf commented Oct 26, 2015 • edited

EderSantana commented Oct 26, 2015

sergeyf commented Oct 26, 2015

jiumem commented Oct 27, 2015

sergeyf commented Oct 27, 2015

NickShahML commented Nov 6, 2015

sergeyf commented Nov 6, 2015

fenstea commented Feb 23, 2016

entron commented Apr 1, 2016

jayinai commented Jun 11, 2016

entron commented Jun 11, 2016

jayinai commented Jun 11, 2016 • edited

entron commented Jun 12, 2016

greg-robinson commented Jul 5, 2016 • edited

ishank26 commented Sep 8, 2016 • edited

anewlearner commented Oct 21, 2016 • edited

alalbiol commented Oct 30, 2016

FedericoMuciaccia commented Jan 21, 2017

sergeyf commented Jan 21, 2017

marc-moreaux commented Mar 13, 2017 • edited

EDIT

stale bot commented Jun 11, 2017

ViaFerrata commented Sep 13, 2017 • edited

Edit:

ghShu commented Dec 4, 2017 • edited

eduardramon commented Dec 5, 2017 • edited

Issam28 commented Jan 5, 2018

Ostnie commented Aug 6, 2018 • edited

casperdcl commented Apr 16, 2020 • edited

sergeyf commented Oct 26, 2015 •

edited

jayinai commented Jun 11, 2016 •

edited

greg-robinson commented Jul 5, 2016 •

edited

ishank26 commented Sep 8, 2016 •

edited

anewlearner commented Oct 21, 2016 •

edited

marc-moreaux commented Mar 13, 2017 •

edited

ViaFerrata commented Sep 13, 2017 •

edited

ghShu commented Dec 4, 2017 •

edited

eduardramon commented Dec 5, 2017 •

edited

Ostnie commented Aug 6, 2018 •

edited

casperdcl commented Apr 16, 2020 •

edited