In [None]:
from fastai.text.all import *
from nbdev.showdoc import show_doc

# ULMFiT

## Finetune a pretrained Language Model

First we get our data and tokenize it.

In [None]:
path = untar_data(URLs.IMDB)

In [None]:
texts = get_files(path, extensions=['.txt'], folders=['unsup', 'train', 'test'])
len(texts)

100000

Then we put it in a `Datasets`. For a language model, we don't have targets, so there is only one transform to numericalize the texts. Note that `tokenize_df` returns the count of the words in the corpus to make it easy to create a vocabulary.

In [None]:
def read_file(f): return L(f.read_text().split(' '))

In [None]:
splits = RandomSplitter(valid_pct=0.1)(texts)
tfms = [Tokenizer.from_folder(path), Numericalize()]
dsets = Datasets(texts, [tfms], splits=splits, dl_type=LMDataLoader)

Then we use that `Datasets` to create a `DataLoaders`. Here the class of `TfmdDL` we need to use is `LMDataLoader` which will concatenate all the texts in a source (with a shuffle at each epoch for the training set), split it in `bs` chunks then read continuously through it.

In [None]:
bs,sl=256,80
dbunch_lm = dsets.dataloaders(bs=bs, seq_len=sl, val_bs=bs)

In [None]:
dbunch_lm.show_batch()

Unnamed: 0,text,text_
0,"xxbos i saw this before ' bubba ho - tep ' at the fantasia film festival in montreal . everything about it is either tipping the hat to ( or completely ripping off ) tim burton . i enjoyed it nonetheless , even if it is extremely derivative . what most impressed me was the quality of the visuals given the obvious shoe - string budget . the set design and the props were inventive and original , although the","i saw this before ' bubba ho - tep ' at the fantasia film festival in montreal . everything about it is either tipping the hat to ( or completely ripping off ) tim burton . i enjoyed it nonetheless , even if it is extremely derivative . what most impressed me was the quality of the visuals given the obvious shoe - string budget . the set design and the props were inventive and original , although the script"
1,"climax 25 minutes of sloppy wrap - up with a character and her dad that we do n't give a crap about anyway … xxunk line , xxup save xxup yourself … xxunk xxup from xxup this xxup movie xxrep 3 ! xxbos i never get tired of watching this movie . i am a die - hard chick - flick fan : fluff all the way , all that meaningless dime - a - dozen stuff . xxmaj but","25 minutes of sloppy wrap - up with a character and her dad that we do n't give a crap about anyway … xxunk line , xxup save xxup yourself … xxunk xxup from xxup this xxup movie xxrep 3 ! xxbos i never get tired of watching this movie . i am a die - hard chick - flick fan : fluff all the way , all that meaningless dime - a - dozen stuff . xxmaj but this"
2,"not hard for me to imagine a dedicated xxmaj mr . xxmaj xxunk teaching kids in xxmaj sunday xxmaj school about the good book . xxmaj nor is it hard to understand why they might picture xxunk 's guards in double - breasted suits like the gangsters in the news of their youth , or relating any number of other scenes to what was familiar to them . \n\n xxmaj connelly was not trying to convert viewers to religion …","hard for me to imagine a dedicated xxmaj mr . xxmaj xxunk teaching kids in xxmaj sunday xxmaj school about the good book . xxmaj nor is it hard to understand why they might picture xxunk 's guards in double - breasted suits like the gangsters in the news of their youth , or relating any number of other scenes to what was familiar to them . \n\n xxmaj connelly was not trying to convert viewers to religion … he"
3,"get better , but it never did . xxmaj from the start to the end , it was one big cliché , extremely predictable with not one surprise in the entire film . xxmaj from the over the top ridiculous boyfriend of xxmaj dawn and the wedding in the pie shop , the wife of the doctor being in the delivery room . i even found the scene where the husband finds the money and wants her to tell him","better , but it never did . xxmaj from the start to the end , it was one big cliché , extremely predictable with not one surprise in the entire film . xxmaj from the over the top ridiculous boyfriend of xxmaj dawn and the wedding in the pie shop , the wife of the doctor being in the delivery room . i even found the scene where the husband finds the money and wants her to tell him it"
4,"up and sings a happy tune , but his mother comes in and tells him to shut up again and gives him a dope slap that leaves a dent in his forehead . i mention this commercial , because it was considered funny , and i did n't hear any objections to it while i was there . xxmaj there is a lot more bloodshed and physical cruelty on screen in "" the xxmaj great xxmaj yokai xxmaj war ""","and sings a happy tune , but his mother comes in and tells him to shut up again and gives him a dope slap that leaves a dent in his forehead . i mention this commercial , because it was considered funny , and i did n't hear any objections to it while i was there . xxmaj there is a lot more bloodshed and physical cruelty on screen in "" the xxmaj great xxmaj yokai xxmaj war "" than"
5,"this film portray our countries xxmaj special xxmaj forces . xxmaj gomer xxmaj pile could have probably survived longer than the "" spec xxmaj ops "" soldiers in this film . xxmaj for crying out loud they should have called them the xxmaj special xxmaj education xxmaj forces instead . xxmaj if you are going to write a script where you send in an elite team to deal with an outbreak of zombies , at least have the soldiers be","film portray our countries xxmaj special xxmaj forces . xxmaj gomer xxmaj pile could have probably survived longer than the "" spec xxmaj ops "" soldiers in this film . xxmaj for crying out loud they should have called them the xxmaj special xxmaj education xxmaj forces instead . xxmaj if you are going to write a script where you send in an elite team to deal with an outbreak of zombies , at least have the soldiers be smarter"
6,"of high school ( they have to show at least 10 doors in the high school labeled "" debate xxmaj club , "" "" german xxmaj club , "" etc . ) and they tend to make fun of things like teen pregnancy and teen sex , which really has xxup nothing to do with making fun of horror films . xxmaj to say the least , i probably laughed once or twice through the entire 90 minutes , and","high school ( they have to show at least 10 doors in the high school labeled "" debate xxmaj club , "" "" german xxmaj club , "" etc . ) and they tend to make fun of things like teen pregnancy and teen sex , which really has xxup nothing to do with making fun of horror films . xxmaj to say the least , i probably laughed once or twice through the entire 90 minutes , and that"
7,"the xxmaj secret xxmaj xxunk . \n\n xxmaj however , xxmaj i 'm a little perplexed about how people have perceived her diary and of her as a person , seeing her as a little saint or having a message of hope for the world . i do n't think that was the original intention of her diary . xxmaj she wrote it mainly for herself , even though she did make some rigorous rewrites before the occupants of the","xxmaj secret xxmaj xxunk . \n\n xxmaj however , xxmaj i 'm a little perplexed about how people have perceived her diary and of her as a person , seeing her as a little saint or having a message of hope for the world . i do n't think that was the original intention of her diary . xxmaj she wrote it mainly for herself , even though she did make some rigorous rewrites before the occupants of the xxmaj"
8,"lied . xxmaj other facts are brought to light that , finally , result in xxmaj dillon 's release . xxmaj the killer is never found , though the movie gives us a thorough xxunk as a plausible perp . \n\n xxmaj this is a weeper from beginning to end . xxmaj nothing seems to go right for the couple . xxmaj oh , there are a few happy moment , maybe a party where everyone is glad to be",". xxmaj other facts are brought to light that , finally , result in xxmaj dillon 's release . xxmaj the killer is never found , though the movie gives us a thorough xxunk as a plausible perp . \n\n xxmaj this is a weeper from beginning to end . xxmaj nothing seems to go right for the couple . xxmaj oh , there are a few happy moment , maybe a party where everyone is glad to be together"


Then we have a convenience method to directly grab a `Learner` from it, using the `AWD_LSTM` architecture.

In [None]:
opt_func = partial(Adam, wd=0.1)
learn = language_model_learner(dbunch_lm, AWD_LSTM, opt_func=opt_func, metrics=[accuracy, Perplexity()], path=path)
learn = learn.to_fp16(clip=0.1)

In [None]:
learn.fit_one_cycle(1, 2e-2, moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.426135,3.984901,0.292371,53.779987,07:00


In [None]:
learn.save('stage1')

In [None]:
learn.load('stage1');

In [None]:
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3, moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.163227,3.870354,0.30684,47.959347,07:24
1,4.055693,3.790802,0.316436,44.291908,07:41
2,3.979279,3.729021,0.323357,41.638317,07:22
3,3.919654,3.688891,0.32777,40.000469,07:22
4,3.889432,3.660633,0.330762,38.885933,07:22
5,3.842923,3.637397,0.333315,37.992798,07:26
6,3.813823,3.619074,0.335308,37.303013,07:25
7,3.793213,3.60801,0.336566,36.892574,07:20
8,3.766456,3.60214,0.337257,36.676647,07:22
9,3.759768,3.600955,0.33745,36.633202,07:23


Once we have fine-tuned the pretrained language model to this corpus, we save the encoder since we will use it for the classifier.

In [None]:
learn.save_encoder('finetuned1')

## Use it to train a classifier

In [None]:
texts = get_files(path, extensions=['.txt'], folders=['train', 'test'])

In [None]:
splits = GrandparentSplitter(valid_name='test')(texts)

For classification, we need to use two set of transforms: one to numericalize the texts and the other to encode the labels as categories.

In [None]:
x_tfms = [Tokenizer.from_folder(path), Numericalize(vocab=dbunch_lm.vocab)]
dsets = Datasets(texts, [x_tfms, [parent_label, Categorize()]], splits=splits, dl_type=SortedDL)

In [None]:
bs = 64

In [None]:
dls = dsets.dataloaders(before_batch=pad_input_chunk, bs=bs)

In [None]:
dls.show_batch(max_n=2)

Unnamed: 0,text,category
0,"xxbos * * attention xxmaj spoilers * * \n\n xxmaj first of all , let me say that xxmaj rob xxmaj roy is one of the best films of the 90 's . xxmaj it was an amazing achievement for all those involved , especially the acting of xxmaj liam xxmaj neeson , xxmaj jessica xxmaj lange , xxmaj john xxmaj hurt , xxmaj brian xxmaj cox , and xxmaj tim xxmaj roth . xxmaj michael xxmaj canton xxmaj jones painted a wonderful portrait of the honor and dishonor that men can represent in themselves . xxmaj but alas … \n\n it constantly , and unfairly gets compared to "" braveheart "" . xxmaj these are two entirely different films , probably only similar in the fact that they are both about xxmaj scots in historical xxmaj scotland . xxmaj yet , this comparison frequently bothers me because it seems",pos
1,"xxbos xxmaj by now you 've probably heard a bit about the new xxmaj disney dub of xxmaj miyazaki 's classic film , xxmaj laputa : xxmaj castle xxmaj in xxmaj the xxmaj sky . xxmaj during late summer of 1998 , xxmaj disney released "" kiki 's xxmaj delivery xxmaj service "" on video which included a preview of the xxmaj laputa dub saying it was due out in "" 1 xxrep 3 9 "" . xxmaj it 's obviously way past that year now , but the dub has been finally completed . xxmaj and it 's not "" laputa : xxmaj castle xxmaj in xxmaj the xxmaj sky "" , just "" castle xxmaj in xxmaj the xxmaj sky "" for the dub , since xxmaj laputa is not such a nice word in xxmaj spanish ( even though they use the word xxmaj laputa many times",pos


Then we once again have a convenience function to create a classifier from this `DataLoaders` with the `AWD_LSTM` architecture.

In [None]:
opt_func = partial(Adam, wd=0.1)
learn = text_classifier_learner(dls, AWD_LSTM, metrics=[accuracy], path=path, drop_mult=0.5, opt_func=opt_func)

We load our pretrained encoder.

In [None]:
learn = learn.load_encoder('finetuned1')
learn = learn.to_fp16(clip=0.1)

Then we can train with gradual unfreezing and differential learning rates.

In [None]:
lr = 1e-1 * bs/128

In [None]:
learn.fit_one_cycle(1, lr, moms=(0.8,0.7,0.8), wd=0.1)

epoch,train_loss,valid_loss,accuracy,time
0,0.328318,0.20065,0.92212,01:08


In [None]:
learn.freeze_to(-2)
lr /= 2
learn.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7,0.8), wd=0.1)

epoch,train_loss,valid_loss,accuracy,time
0,0.20812,0.166004,0.93744,01:15


In [None]:
learn.freeze_to(-3)
lr /= 2
learn.fit_one_cycle(1, slice(lr/(2.6**4),lr), moms=(0.8,0.7,0.8), wd=0.1)

epoch,train_loss,valid_loss,accuracy,time
0,0.162498,0.154959,0.9424,01:35


In [None]:
learn.unfreeze()
lr /= 5
learn.fit_one_cycle(2, slice(lr/(2.6**4),lr), moms=(0.8,0.7,0.8), wd=0.1)

epoch,train_loss,valid_loss,accuracy,time
0,0.1338,0.163456,0.94056,01:34
1,0.095326,0.154301,0.94512,01:34
