Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices question: decreasing learning rates between epochs #898

Closed
sergeyf opened this issue Oct 26, 2015 · 26 comments
Closed

Best practices question: decreasing learning rates between epochs #898

sergeyf opened this issue Oct 26, 2015 · 26 comments

Comments

@sergeyf
Copy link

sergeyf commented Oct 26, 2015

Howdy,

In published papers I often see that the learning rates are decreased after some hundreds of epochs when learning stalls. What is the best way to do this in Keras? Thus far, I have been recompiling, but (not knowing if there is a better way), that seems foolish.

An example:

First, I build some model and train it.

model = Sequential()
# insert model here
optimizer = adagrad(lr=0.01)
model.compile(optimizer=optimizer)
model.fit(X,y,nb_epoch=50)

UPDATE -- the following works without having to recompile

K.set_value(model.optimizer.lr, 0.001)
model(X,y,nb_epoch=50)

Thank you to @EderSantana for the quick reply.

@EderSantana
Copy link
Contributor

@sergeyf check the solution in this #888
Please leave this issue open, even if it solves the problem for you. This is the second time we get this question, which means we need a better documentation. Since I'm already working on something else, would anybody else volunteer to write the documentation? We should close this after somebody writes the docs.

@sergeyf
Copy link
Author

sergeyf commented Oct 26, 2015

Thank you very much! I will leave this open.

@jiumem
Copy link

jiumem commented Oct 27, 2015

This code may be work, to be tested...

class LrReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, reduce_nb=10, verbose=1):
        super(Callback, self).__init__()
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.current_reduce_nb = 0
        self.reduce_nb = reduce_nb
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get('val_acc')
        if current_score > self.best_score:
            self.best_score = current_score
            self.wait = 0
            if self.verbose > 0:
                print('---current best val accuracy: %.3f' % current_score)
        else:
            if self.wait >= self.patience:
                self.current_reduce_nb += 1
                if self.current_reduce_nb <= 10:
                    lr = self.model.optimizer.lr.get_value()
                    self.model.optimizer.lr.set_value(lr*self.reduce_rate)
                else:
                    if self.verbose > 0:
                        print("Epoch %d: early stopping" % (epoch))
                    self.model.stop_training = True
            self.wait += 1

@sergeyf
Copy link
Author

sergeyf commented Oct 27, 2015

Thanks @jiumem! This might be a good pull request to Keras?

@NickShahML
Copy link
Contributor

@sergeyf I just saw this thread, and I'd thought I'd throw in my own function I made to address this. I always use nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
    if activate_halving_learning_rate and (learning_rate>=0.0001):
        if counter == 0:
            new_loss = hist.history['loss'][0]
            if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
                learning_rate = float(learning_rate)/float(2)
                print 'you readjusted the learning rate to', learning_rate
                with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
                    with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

            past_loss = new_loss
        return (learning_rate, new_loss, past_loss)

@sergeyf
Copy link
Author

sergeyf commented Nov 6, 2015

Awesome, thanks!

On Fri, Nov 6, 2015 at 7:24 AM, LeavesBreathe notifications@github.com
wrote:

@sergeyf https://github.com/sergeyf I just saw this thread, and I'd
thought I'd throw in my own function I made to address this. I always use
nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
if activate_halving_learning_rate and (learning_rate>=0.0001):
if counter == 0:
new_loss = hist.history['loss'][0]
if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
learning_rate = float(learning_rate)/float(2)
print 'you readjusted the learning rate to', learning_rate
with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

        past_loss = new_loss
    return (learning_rate, new_loss, past_loss)


Reply to this email directly or view it on GitHub
#898 (comment).

@fenstea
Copy link

fenstea commented Feb 23, 2016

I have a problem using the solution like

model.optimizer.lr.set_value(0.01)
model(X,y,nb_epoch=50)

with tensorflow backend.
Can't do set_value and get_value as it was discussed here and in another thread.

return model_systole.optimizer.lr.get_value()
AttributeError: 'Tensor' object has no attribute 'get_value'

Any suggestions please?

@entron
Copy link
Contributor

entron commented Apr 1, 2016

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

@jayinai
Copy link

jayinai commented Jun 11, 2016

@entron when does the .set_value() happen? after every epoch?

@entron
Copy link
Contributor

entron commented Jun 11, 2016

@shuaiw you can put the line K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr)) inside the epoch loop to set lr at each epoch.

@jayinai
Copy link

jayinai commented Jun 11, 2016

@entron thanks for your response.

my model doesn't have an epoch loop; instead it's like this:

N_EPOCH = 100

model = Model(..)

model.compile(...)

model.fit(X, y, batch_size=64, nb_epoch=N_EPOCH, verbose=1, shuffle=True, callbacks=...)

Anyway to fit in?

@entron
Copy link
Contributor

entron commented Jun 12, 2016

Maybe you can set N_EPOCH=1 and loop outside.

@greg-robinson
Copy link

greg-robinson commented Jul 5, 2016

So I had problems with the model.fit_generator function, so I decided to use model.fit instead and put it inside a for loop like so:

for x, y in generate_arrays_from_file(): x = model.fit(x, y, batch_size=16, nb_epoch=1, verbose=1)

Here are my questions:

  1. I am using Adam for my optimizer, and I saw on another thread that it is impossible to directly get the current learning rate. You have to calculate it yourself indirectly.
    http://stackoverflow.com/questions/37091751/keras-learning-rate-not-changing-despite-decay-in-sgd
    Since nb_epoch is only 1 in the above function, and model.fit is inside a loop, is the learning rate guaranteed to decrease (that's my understanding of how Adam works) or do I have to write a separate function to manually decrease the learning rate myself?

  2. My loss initially decreases quite rapidly, but then appears to fluctuate and stop decreasing even after several days of training. This is true both when I train on single images and multi-channel images. Since the loss is fluctuating after every call of model.fit, will that screw up Adam's calculations since it relies on the number of iterations and previous losses, or is all of this taken care of by Theano?

I am unfamiliar with Theano, and I do not have time at this point to learn about it so any information you can provide on both these questions is much appreciated.

Thanks.

@ishank26
Copy link

ishank26 commented Sep 8, 2016

@sergeyf @shuaiw A more simpler solution for decay after specified epochs.

class decay_lr(Callback):
    ''' 
        n_epoch = no. of epochs after decay should happen.
        decay = decay value
    '''  
    def __init__(self, n_epoch, decay):
        super(decay_lr, self).__init__()
        self.n_epoch=n_epoch
        self.decay=decay

    def on_epoch_begin(self, epoch, logs={}):
        old_lr = self.model.optimizer.lr.get_value()
        if epoch > 1 and epoch%self.n_epoch == 0 :
            new_lr= self.decay*old_lr
            k.set_value(self.model.optimizer.lr, new_lr)
        else:
            k.set_value(self.model.optimizer.lr, old_lr)



decaySchedule=decay_lr(10, 0.95)

You can use this directly without the epoch loop.

@anewlearner
Copy link

anewlearner commented Oct 21, 2016

In my case, the learning rate was supposed to be decayed by specific iterations. In a theano backend keras, I can do using the following code:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5000
        lr = optimizer.lr.get_value()
        iterations=optimizer.iterations.get_value()
        if iterations % iteration == 1:
            lr_now = np.array(lr * rate, dtype= 'float32')
            optimizer.lr.set_value(lr_now) 
            print('Ir reduced from %f to %f' % (lr, lr_now))

But when I changed to a tensorflow backend keras, I can not use above code, beacause optimizer.lr is a tensorflow variable, and there is no get_value(). So I changed the code as follows:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5
        lr = optimizer.lr 

        init_op = tf.initialize_all_variables()
        sess = tf.Session()
        sess.run(init_op)
        iterations=optimizer.iterations 
        lr_ori=sess.run(lr)
#        print('iter:', sess.run(iterations))  # this always prints 0 
        if iterations % iteration == 0:
            a= np.array(lr*rate, dtype='float32')
            optimizer.lr.set_value(a) 
            lr_now=sess.run(optimizer.lr)             
            print('Ir reduced from %f to %f' % (lr_ori, lr_now))

But it did not work. I found that the optimizer.iterations was always 0. So the learning rate will not chagne. Could someone help me to solve this?
Thanks!

Carol

@alalbiol
Copy link

Another option could be to use the LearningRateScheduler that
I found in the Keras documentation:
https://keras.io/callbacks/

You can use the schedule function that best fits your needs

@FedericoMuciaccia
Copy link

@sergeyf please update the answer inside your initial question, because model.optimizer.lr.set_value() is no longer valid.
the actual method should be model.optimizer.lr.assign(your_learning_rate).
this also solves the problem of @Rusianka

@sergeyf
Copy link
Author

sergeyf commented Jan 21, 2017

@FedericoMuciaccia Thanks, I did as you suggested.

@marc-moreaux
Copy link

marc-moreaux commented Mar 13, 2017

With TF backend, I did this (for inception-V3)

from keras.callbacks import LearningRateScheduler

def scheduler(epoch):
    if epoch%2==0 and epoch!=0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

model.fit_generator(train_gen, (nb_train_samples//batch_size)*batch_size,
                  nb_epoch=100, verbose=1,
                  validation_data=valid_gen,    nb_val_samples=val_size,
                  callbacks=[lr_decay])

EDIT

I'm happy it helped.
What I use now is the following :

from keras.callbacks import LearningRateScheduler

def lr_decay_callback(lr_init, lr_decay):
    def step_decay(epoch):
        return lr_init * (lr_decay ** (epoch + 1))
    return LearningRateScheduler(step_decay)

lr_decay =  lr_decay_callback(lr_init, lr_decay)

# callback=[lr_decay, ]

@stale stale bot added the stale label Jun 11, 2017
@stale
Copy link

stale bot commented Jun 11, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this as completed Jul 11, 2017
@ViaFerrata
Copy link

ViaFerrata commented Sep 13, 2017

@FedericoMuciaccia @sergeyf
I think the syntax has changed again (using TF backend).

    adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    model.optimizer.lr.assign(100)
    # -> still trains perfectly, lr is not changed 

Keras doesn't throw an exception, but the lr doesn't change anyways.

Fortunately, the backend method still works:

K.set_value(model.optimizer.lr, 100)

Edit:

By the way, is it also possible to change the learning rate during an epoch (e.g. after 1000 batches) while looping over fit_generator?
E.g. like

from keras.callbacks import LearningRateScheduler
def scheduler(batch_number):
    if batch_number % 1000 == 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

while 1:
    model.fit_generator(gen, epochs=1, steps_per_epoch=int(filesize/batchsize), callbacks=[lr_decay])
    # do some other stuff in between epochs

Would be very useful when the data sample is large and/or the network is deep such that one epoch takes about 24h.

@ghShu
Copy link

ghShu commented Dec 4, 2017

The "decay" option in the optimizer seems to be designed for learning rate decay. I did not see any suggestion on using this in the discussion. Could someone please comment on the use (or not use) the "decay" option?
adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

@eduardramon
Copy link

eduardramon commented Dec 5, 2017

It does learning rate decay:
https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L420

This functionality was introduced roughly one year ago in within this commit:
b2e8d5a

@Issam28
Copy link

Issam28 commented Jan 5, 2018

when using rate decay in SGD do the optimizer.iterations reset at each epoch?

@Ostnie
Copy link

Ostnie commented Aug 6, 2018

@marc-moreaux
Why you change your code ?Does the code before you edit can be used rightly ? I don't understand the late one clearly

@casperdcl
Copy link

casperdcl commented Apr 16, 2020

FYI updated issue and much simpler solution at #5724 (comment)

fchollet pushed a commit that referenced this issue Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests