# Learning Rate Finder

In [None]:
from fastai.gen_doc.nbdoc import *
from fastai import *
from fastai.vision import *

Learning rate finder plots lr vs loss relationship for a [`Learner`](/basic_train.html#Learner). The idea is to reduce the amount of guesswork on picking a good starting learning rate.

**Overview:**  
1. First run lr_find `learn.lr_find()`
2. Plot the learning rate vs loss `learn.recorder.plot()`
3. Pick a learning rate before it diverges then start training

**Technical Details:** (first [described]('https://arxiv.org/abs/1506.01186') by Leslie Smith)  
>Train [`Learner`](/basic_train.html#Learner) over a few iterations. Start with a very low `start_lr` and change it at each mini-batch until it reaches a very high `end_lr`. [`Recorder`](/basic_train.html#Recorder) will record the loss at each iteration. Plot those losses against the learning rate to find the optimal value before it diverges.

## Choosing a good learning rate

For a more intuitive explanation, please check out [Sylvain Gugger's post](https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html)

In [None]:
data = URLs.get_mnist()
def simple_learner(): return Learner(data, simple_cnn((3,16,16,2)), metrics=[accuracy])
learn = simple_learner()

First we run this command to launch the search:

In [None]:
show_doc(Learner.lr_find)

In [None]:
learn.lr_find()

Then we plot the loss versus the learning rates. We're interested in finding a good order of magnitude of learning rate, so we plot with a log scale. 

In [None]:
learn.recorder.plot()

Then, we choose a value that is an order of magnitude before the mimum: the minimum value is on the edge diverging so it is too high. An order of magnitude before, a value that's still aggressive (for quicker training) but still safer from exploding. (In this example case 1e-1 is a good choice).

Let's start training with this optimal value:

In [None]:
simple_learner().fit_one_cycle(2, 1e-1)

Picking the minimum isn't a good idea because training will diverge.

In [None]:
learn = simple_learner()
simple_learner().fit_one_cycle(2, 1e-0)

Picking a value to far below the minimum isn't optimal because training is too slow.

In [None]:
learn = simple_learner()
simple_learner().fit_one_cycle(2, 1e-2)

In [None]:
show_doc(LRFinder)

In [None]:
show_doc(LRFinder.on_train_end)

In [None]:
show_doc(LRFinder.on_batch_end)

In [None]:
show_doc(LRFinder.on_train_begin)

In [None]:
show_doc(LRFinder.on_epoch_end)