In [None]:
#|hide
#|skip
#! [ -e /content ] && pip install -Uqq fastai  # upgrade fastai on colab

In [1]:
from fastai.text.all import *

In [None]:
#|all_slow

# Transfer learning in text

> How to fine-tune a language model and train a classifier

In this tutorial, we will see how we can train a model to classify text (here based on their sentiment). First we will see how to do this quickly in a few lines of code, then how to get state-of-the art results using the approach of the [ULMFit paper](https://arxiv.org/abs/1801.06146).

We will use the IMDb dataset from the paper [Learning Word Vectors for Sentiment Analysis](https://ai.stanford.edu/~amaas/data/sentiment/), containing a few thousand movie reviews.

## Train a text classifier from a pretrained model

We will try to train a classifier using a pretrained model, a bit like we do in the [vision tutorial](http://docs.fast.ai/tutorial.vision). To get our data ready, we will first use the high-level API:

## Using the high-level API

We can download the data and decompress it with the following command:

In [2]:
path = untar_data(URLs.IMDB)
path.ls()

(#8) [Path('C:/Users/ethan/.fastai/data/imdb/imdb.vocab'),Path('C:/Users/ethan/.fastai/data/imdb/models'),Path('C:/Users/ethan/.fastai/data/imdb/README'),Path('C:/Users/ethan/.fastai/data/imdb/test'),Path('C:/Users/ethan/.fastai/data/imdb/tmp_clas'),Path('C:/Users/ethan/.fastai/data/imdb/tmp_lm'),Path('C:/Users/ethan/.fastai/data/imdb/train'),Path('C:/Users/ethan/.fastai/data/imdb/unsup')]

In [3]:
(path/'train').ls()

(#4) [Path('C:/Users/ethan/.fastai/data/imdb/train/labeledBow.feat'),Path('C:/Users/ethan/.fastai/data/imdb/train/neg'),Path('C:/Users/ethan/.fastai/data/imdb/train/pos'),Path('C:/Users/ethan/.fastai/data/imdb/train/unsupBow.feat')]

The data follows an ImageNet-style organization, in the train folder, we have two subfolders, `pos` and `neg` (for positive reviews and negative reviews). We can gather it by using the `TextDataLoaders.from_folder` method. The only thing we need to specify is the name of the validation folder, which is "test" (and not the default "valid").

In [10]:
dls = TextDataLoaders.from_folder(path, valid='test')

Due to IPython and Windows limitation, python multiprocessing isn't available now.
So `number_workers` is changed to 0 to avoid getting stuck


In [17]:
type(learn)

fastai.text.learner.TextLearner

We can then have a look at the data with the `show_batch` method:

In [11]:
dls.show_batch()

  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))
  ax = ax.append(pd.Series({label: o}))


Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,"xxbos xxmaj berlin - born in 1942 xxmaj margarethe von xxmaj trotta was an actress and now she is a very important director and writer . xxmaj she has been described , perhaps even unfairly caricatured , as a director whose commitment to bringing a woman 's sensibility to the screen outweighs her artistic strengths . "" rosenstrasse , "" which has garnered mixed and even strange reviews ( the xxmaj new xxmaj york xxmaj times article was one of the most negatively aggressive reviews xxmaj i 've ever read in that paper ) is not a perfect film . xxmaj it is a fine movie and a testament to a rare xxunk of successful opposition to the genocidal xxmaj nazi regime by , of all peoples , generically powerless xxmaj germans demonstrating in a xxmaj berlin street . \n\n xxmaj co - writer von xxmaj trotta uses the actual",pos
2,"xxbos xxmaj prior to this release , xxmaj neil labute had this to say about the 1973 original : "" it 's surprising how many people say it 's their favorite soundtrack . xxmaj i 'm like , come on ! xxmaj you may not like the new one , but if that 's your favorite soundtrack , i do n't know if i * want * you to like my film . "" \n\n xxmaj neil , a word . xxmaj you might want to sit down for this too ; as xxmaj lord xxmaj xxunk says , shocks are so much better absorbed with the knees bent . xxmaj see , xxmaj neil , the thing about the original , is that xxmaj paul xxmaj giovanni 's soundtrack is one of the most celebrated things about it . xxmaj the filmmakers themselves consider it a virtual musical .",neg
3,"xxbos xxmaj how strange the human mind is ; this center of activity wherein perceptions of reality are formed and stored , and in which one 's view of the world hinges on the finely tuned functioning of the brain , this most delicate and intricate processor of all things sensory . xxmaj and how much do we really know of it 's inner - workings , of it 's depth or capacity ? xxmaj what is it in the mind that allows us to discern between reality and a dream ? xxmaj or can we ? xxmaj perhaps our sense of reality is no more than an impression of what we actually see , like looking at a painting by xxmaj monet , in which the vanilla sky of his vision becomes our reality . xxmaj it 's a concept visited by filmmaker xxmaj cameron xxmaj crowe in his",pos
4,"xxbos xxup spoilers xxup herein \n\n xxmaj my xxmaj high xxmaj school did all they could to try and motivate us for exams . xxmaj but the most memorable method they used to get us into the right state of mind was a guest speaker , who was none other than xxmaj australian xxmaj kickboxing 's favorite son , xxmaj stan "" the xxmaj man "" xxmaj xxunk . xxmaj the first mistake they made was giving this guy a microphone , because he was screaming half the time despite us sitting no more than 3 or 4 feet away from him . xxmaj now , his speech was full of the usual "" if you fail to prepare , then prepare to fail "" stuff , but there were various instances where i got really worked up . xxmaj the guy stood there in front of us preaching how",neg
5,"xxbos xxmaj beat a path to this important documentary that looks like an attractive feature . xxmaj forbidden xxmaj xxunk ) is simply a better ( cinematic ) version of xxmaj norma xxmaj khouri 's book xxmaj forbidden xxmaj love , and xxup that was a best - seller . xxmaj an onion - peeling of literary fraud and of a pretty woman , xxmaj xxunk is the very best in xxunk reality xxup tv . \n\n xxmaj cleverly edited and colourful , xxmaj broinowski 's storytelling is xxunk by moving silhouettes of xxmaj norma xxmaj khouri meaningfully blowing smoke . i disagree ( with xxmaj variety ) that it 's overlong ; instead my one slight problem was with the episodic nature of its key players commenting on others ' just - recorded testimonials . xxmaj on a single watching your sense of narrative becomes mired … .. so",pos
6,"xxbos * xxmaj some spoilers * \n\n xxmaj this movie is sometimes subtitled "" life xxmaj everlasting . "" xxmaj that 's often taken as reference to the final scene , but more accurately describes how dead and buried this once - estimable series is after this sloppy and illogical send - off . \n\n xxmaj there 's a "" hey kids , let 's put on a show air "" about this telemovie , which can be endearing in spots . xxmaj some fans will feel like insiders as they enjoy picking out all the various cameo appearances . xxmaj co - writer , co - producer xxmaj tom xxmaj fontana and his pals pack the goings - on with friends and favorites from other shows , as well as real xxmaj baltimore personages . \n\n xxmaj that 's on top of the returns of virtually all the members",neg
7,"xxbos i saw this movie during a xxmaj tolkien - themed xxmaj interim class during my sophomore year of college . i was seated unfortunately close to the screen and my professor chose me to serve as a whipping boy- everyone else was laughing , but they were n't within constant eyesight . \n\n xxmaj let 's get it out of the way : the xxmaj peter xxmaj jackson ' lord of the xxmaj rings ' films do owe something to the xxmaj bakshi film . xxmaj in xxmaj jackson 's version of xxmaj the xxmaj fellowship of the xxmaj ring , for instance , the scene in which the xxmaj black xxmaj riders assault the empty inn beds is almost a complete carbon copy of the scene in xxmaj bakshi 's film , shot by shot . xxmaj you could call this plagiarism or homage , depending on your",neg
8,"xxbos "" the xxmaj blob "" qualifies as a cult sci - fi film not only because it launched 27 - year old xxmaj steve mcqueen on a trajectory to superstardom , but also because it exploited the popular themes both of alien invasion and teenage delinquency that were inseparable in the 1950s . xxmaj interestingly , nobody in the xxmaj kay xxmaj xxunk & xxmaj theodore xxmaj simonson screenplay ever refers to the amorphous , scarlet - red protoplasm that plummeted to xxmaj earth in a meteor and menaced everybody in the small town of xxmaj xxunk xxmaj pennsylvania on a xxmaj friday night as "" the xxmaj blob . "" xxmaj steve mcqueen won the role of xxmaj josh xxmaj randall , the old xxmaj west bounty hunter in "" wanted : xxmaj dead or xxmaj alive , "" after producer xxmaj dick xxmaj powell saw this xxmaj",pos


We can see that the library automatically processed all the texts to split then in *tokens*, adding some special tokens like:

- `xxbos` to indicate the beginning of a text
- `xxmaj` to indicate the next word was capitalized

Then, we can define a `Learner` suitable for text classification in one line:

In [16]:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

We use the [AWD LSTM](https://arxiv.org/abs/1708.02182) architecture, `drop_mult` is a parameter that controls the magnitude of all dropouts in that model, and we use `accuracy` to track down how well we are doing. We can then fine-tune our pretrained model:

In [None]:
learn.fine_tune(4, 1e-2)

In [None]:
#learn.fine_tune(4, 1e-2)

Not too bad! To see how well our model is doing, we can use the `show_results` method:

In [None]:
learn.show_results()

And we can predict on new texts quite easily:

In [None]:
learn.predict("That honestly was the worst movie I've ever seen.")

Here we can see the model has considered the review to be positive. The second part of the result is the index of "pos" in our data vocabulary and the last part is the probabilities attributed to each class (99.1% for "pos" and 0.9% for "neg"). 

Now it's your turn! Write your own mini movie review, or copy one from the Internet, and we can see what this model thinks about it. 

### Using the data block API

We can also use the data block API to get our data in a `DataLoaders`. This is a bit more advanced, so fell free to skip this part if you are not comfortable with learning new APIs just yet.

A datablock is built by giving the fastai library a bunch of information:

- the types used, through an argument called `blocks`: here we have images and categories, so we pass `TextBlock` and `CategoryBlock`. To inform the library our texts are files in a folder, we use the `from_folder` class method.
- how to get the raw items, here our function `get_text_files`.
- how to label those items, here with the parent folder.
- how to split those items, here with the grandparent folder.

In [None]:
imdb = DataBlock(blocks=(TextBlock.from_folder(path), CategoryBlock),
                 get_items=get_text_files,
                 get_y=parent_label,
                 splitter=GrandparentSplitter(valid_name='test'))

This only gives a blueprint on how to assemble the data. To actually create it, we need to use the `dataloaders` method:

In [None]:
dls = imdb.dataloaders(path)

In [None]:
display(path)

## The ULMFiT approach

The pretrained model we used in the previous section is called a language model. It was pretrained on Wikipedia on the task of guessing the next word, after reading all the words before. We got great results by directly fine-tuning this language model to a movie review classifier, but with one extra step, we can do even better: the Wikipedia English is slightly different from the IMDb English. So instead of jumping directly to the classifier, we could fine-tune our pretrained language model to the IMDb corpus and *then* use that as the base for our classifier.

One reason, of course, is that it is helpful to understand the foundations of the models that you are using. But there is another very practical reason, which is that you get even better results if you fine tune the (sequence-based) language model prior to fine tuning the classification model. For instance, in the IMDb sentiment analysis task, the dataset includes 50,000 additional movie reviews that do not have any positive or negative labels attached in the unsup folder. We can use all of these reviews to fine tune the pretrained language model — this will result in a language model that is particularly good at predicting the next word of a movie review. In contrast, the pretrained model was trained only on Wikipedia articles.

The whole process is summarized by this picture:

![ULMFit process](https://github.com/fastai/fastai/blob/master/nbs/images/ulmfit.png?raw=1)

### Fine-tuning a language model on IMDb

We can get our texts in a `DataLoaders` suitable for language modeling very easily:

In [None]:
dls_lm = TextDataLoaders.from_folder(path, is_lm=True, valid_pct=0.1)

We need to pass something for `valid_pct` otherwise this method will try to split the data by using the grandparent folder names. By passing `valid_pct=0.1`, we tell it to get a random 10% of those reviews for the validation set.

We can have a look at our data using `show_batch`. Here the task is to guess the next word, so we can see the targets have all shifted one word to the right.

In [None]:
dls_lm.show_batch(max_n=5)

Then we have a convenience method to directly grab a `Learner` from it, using the `AWD_LSTM` architecture like before. We use accuracy and perplexity as metrics (the later is the exponential of the loss) and we set a default weight decay of 0.1. `to_fp16` puts the `Learner` in mixed precision, which is going to help speed up training on GPUs that have Tensor Cores.

In [None]:
learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()

By default, a pretrained `Learner` is in a frozen state, meaning that only the head of the model will train while the body stays frozen. We show you what is behind the fine_tune method here and use a fit_one_cycle method to fit the model:

In [None]:
learn.fit_one_cycle(1, 1e-2)

This model takes a while to train, so it's a good opportunity to talk about saving intermediary results. 

You can easily save the state of your model like so:

In [None]:
learn.save('1epoch')

It will create a file in `learn.path/models/` named "1epoch.pth". If you want to load your model on another machine after creating your `Learner` the same way, or resume training later, you can load the content of this file with:

In [None]:
learn = learn.load('1epoch')

We can them fine-tune the model after unfreezing:

In [None]:
learn.unfreeze()
learn.fit_one_cycle(10, 1e-3)

Once this is done, we save all of our model except the final layer that converts activations to probabilities of picking each token in our vocabulary. The model not including the final layer is called the *encoder*. We can save it with `save_encoder`:

In [None]:
learn.save_encoder('finetuned')

> Jargon: Encoder: The model not including the task-specific final layer(s). It means much the same thing as *body* when applied to vision CNNs, but tends to be more used for NLP and generative models.

Before using this to fine-tune a classifier on the reviews, we can use our model to generate random reviews: since it's trained to guess what the next word of the sentence is, we can use it to write new reviews:

In [None]:
TEXT = "I liked this movie by Robert Downey Jr. because"
N_WORDS = 30
N_SENTENCES = 1
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) 
         for _ in range(N_SENTENCES)]

In [None]:
print("\n".join(preds))

In [None]:
display(preds[0])

### Training a text classifier

We can gather our data for text classification almost exactly like before:

In [None]:
dls_clas = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', text_vocab=dls_lm.vocab)

The main difference is that we have to use the exact same vocabulary as when we were fine-tuning our language model, or the weights learned won't make any sense. We pass that vocabulary with `text_vocab`.

Then we can define our text classifier like before:

In [None]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

The difference is that before training it, we load the previous encoder:

In [None]:
learn = learn.load_encoder('finetuned')

The last step is to train with discriminative learning rates and *gradual unfreezing*. In computer vision, we often unfreeze the model all at once, but for NLP classifiers, we find that unfreezing a few layers at a time makes a real difference.

In [None]:
learn.fit_one_cycle(1, 2e-2)

In just one epoch we get the same result as our training in the first section, not too bad! We can pass `-2` to `freeze_to` to freeze all except the last two parameter groups:

In [None]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

Then we can unfreeze a bit more, and continue training:

In [None]:
learn.freeze_to(-3)
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))

And finally, the whole model!

In [None]:
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))