In [1]:
from fastai2.text.all import *

In [2]:
from nbdev.showdoc import *

In [3]:
# all_slow

# Transfer learning in text

> How to fine-tune a language model and train a classifier

## Finetune a pretrained Language Model

First we get our data and tokenize it.

In [4]:
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')

Then we put it in a `DataSource`. For a language model, we don't have targets, so there is only one transform to numericalize the texts.

In [5]:
splits = ColSplitter()(df)
tfms = [attrgetter("text"), Tokenizer.from_df("text"), Numericalize()]
dsrc = DataSource(df, [tfms], splits=splits, dl_type=LMDataLoader)

Then we use that `DataSource` to create a `DataBunch`. Here the class of `TfmdDL` we need to use is `LMDataLoader` which will concatenate all the texts in a source (with a shuffle at each epoch for the training set), split it in `bs` chunks then read continuously through it.

In [6]:
dbunch = dsrc.databunch(bs=64, seq_len=72, after_batch=Cuda)

Or more simply with a factory method:

In [7]:
dbunch = TextDataBunch.from_df(df, text_col='text', is_lm=True, valid_col='is_valid')

In [8]:
dbunch.show_batch(max_n=2)

Unnamed: 0,text,text_
0,"xxbos xxmaj this is a bad b movie xxunk as a mockumentary . xxmaj the porn documentary filmmaker in the movie has almost as much screen time and dialog as any other character . xxmaj that completely destroyed any "" documentary feel "" that they may have wanted to create . \n\n xxmaj the fact that the film is not actually a mockumentary is the least of it 's problems . xxmaj","xxmaj this is a bad b movie xxunk as a mockumentary . xxmaj the porn documentary filmmaker in the movie has almost as much screen time and dialog as any other character . xxmaj that completely destroyed any "" documentary feel "" that they may have wanted to create . \n\n xxmaj the fact that the film is not actually a mockumentary is the least of it 's problems . xxmaj the"
1,"our biggest weapon . xxmaj well , if 150 xxmaj europeans can defeat 20 , xxrep 3 0 native warriors and 400 non - military xxmaj south xxmaj africans can defeat 10 , xxrep 3 0 xxmaj xxunk * without a single casualty * in either case , then i think you have to conclude that germs are irrelevant . xxmaj with or without germs , we were going to succeed .","biggest weapon . xxmaj well , if 150 xxmaj europeans can defeat 20 , xxrep 3 0 native warriors and 400 non - military xxmaj south xxmaj africans can defeat 10 , xxrep 3 0 xxmaj xxunk * without a single casualty * in either case , then i think you have to conclude that germs are irrelevant . xxmaj with or without germs , we were going to succeed . \n\n"


Then we have a convenience method to directly grab a `Learner` from it, using the `AWD_LSTM` architecture.

In [9]:
learn = language_model_learner(dbunch, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, opt_func = partial(Adam, wd=0.1)).to_fp16()

In [10]:
learn.freeze()
learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.549092,4.052681,0.273691,57.551571,00:05


In [11]:
learn.unfreeze()
learn.fit_one_cycle(4, 1e-2, moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.300462,4.196602,0.255447,66.460136,00:07
1,4.138047,4.085733,0.267637,59.485519,00:07
2,3.807965,4.046178,0.274691,57.178524,00:07
3,3.423051,4.092277,0.272762,59.876049,00:07


Once we have fine-tuned the pretrained language model to this corpus, we save the encoder since we will use it for the classifier.

In [12]:
learn.show_results()

Unnamed: 0,input,target,pred
0,"xxbos xxmaj this is by far one of the worst movies i have ever seen , the poor special effects along with the poor acting are just a few of the things wrong with this film . i am fan of the first two major xxunk but this one is lame ! xxbos "" pet xxmaj xxunk "" is an adaptation from the xxmaj stephen xxmaj king novel of the same title","xxmaj this is by far one of the worst movies i have ever seen , the poor special effects along with the poor acting are just a few of the things wrong with this film . i am fan of the first two major xxunk but this one is lame ! xxbos "" pet xxmaj xxunk "" is an adaptation from the xxmaj stephen xxmaj king novel of the same title .","xxmaj this is a far the of the worst films i have ever seen . and only acting effects and the the acting acting are just plain bit of the worst that with this movie . xxmaj have not of xxmaj xxmaj two films movies , i one is not . xxmaj xxmaj xxunk xxmaj xxunk "" is a xxunk of the novel xxunk xxmaj king novel of the same name ."
1,"in the beginning . xxmaj no , they run the xxunk of fart jokes , xxunk jokes , xxunk , racism , dressing up as xxrep 3 xxunk … xxmaj this movie is flat out mean to anyone who 's ever played xxup d&d . \n\n xxmaj no wonder it looks like the xxmaj real xxup d&d would n't let them use their game . xxmaj who 'd want their name attached","the beginning . xxmaj no , they run the xxunk of fart jokes , xxunk jokes , xxunk , racism , dressing up as xxrep 3 xxunk … xxmaj this movie is flat out mean to anyone who 's ever played xxup d&d . \n\n xxmaj no wonder it looks like the xxmaj real xxup d&d would n't let them use their game . xxmaj who 'd want their name attached to","the middle of xxmaj the , no did out xxunk of the jokes and xxunk , , and , and , xxunk up as a 3 ! , xxmaj the is is a and of , me who knows ever been a tv . xxmaj xxmaj the , the 's like a xxmaj xxunk xxmaj tv . have have the go the name . xxmaj it knows like to name to to"
2,"must see for all fans of great noir film . xxrep 4 * xxrep 3 ! xxbos xxmaj the film was shot at xxmaj movie xxmaj xxunk , just off xxunk xxunk , near xxmaj lone xxmaj xxunk , xxmaj california , north of the road to xxmaj whitney xxmaj xxunk . xxmaj you can still find xxunk of xxunk and iron xxunk xxunk across the rocks where the sets were built","see for all fans of great noir film . xxrep 4 * xxrep 3 ! xxbos xxmaj the film was shot at xxmaj movie xxmaj xxunk , just off xxunk xxunk , near xxmaj lone xxmaj xxunk , xxmaj california , north of the road to xxmaj whitney xxmaj xxunk . xxmaj you can still find xxunk of xxunk and iron xxunk xxunk across the rocks where the sets were built .","be . a the of xxmaj xxmaj . . xxbos 3 * xxmaj 3 * xxbos xxmaj this story is shot in xxmaj xxunk xxmaj xxunk , xxmaj like xxmaj xxunk , in xxmaj xxunk xxmaj xxunk , xxmaj california . in of the road to xxmaj xxunk xxmaj xxunk . xxmaj the can see see a in the xxunk xxunk xxunk xxunk in the xxunk . the xxunk are built ."
3,"of xxunk as he still struggles to find his humanity . xxmaj this xxunk of his for a real life could get boring , and almost did in xxmaj supremacy , but just works better in xxmaj ultimatum ( better script ) . \n\n i am reminded of a scene in "" xxunk "" ( the only good xxmaj pierce xxmaj xxunk xxmaj bond film ) in which xxmaj sean xxmaj bean","xxunk as he still struggles to find his humanity . xxmaj this xxunk of his for a real life could get boring , and almost did in xxmaj supremacy , but just works better in xxmaj ultimatum ( better script ) . \n\n i am reminded of a scene in "" xxunk "" ( the only good xxmaj pierce xxmaj xxunk xxmaj bond film ) in which xxmaj sean xxmaj bean 's","the , well xxunk has to xxunk himself way . xxmaj he is of the life the while life is be a . but he xxunk n't the xxunk . but it did out . the xxunk . xxunk than ) . xxmaj xxmaj was not of xxmaj xxmaj in which xxunk "" ( which first "" movie bourne xxmaj xxunk film bond film ) , which xxmaj bourne xxmaj bean is"
4,", i can see what girls watching this might be watching . xxmaj and i loved that they had the courage to both let him hurt the younger sister ( most men would , most films would n't ) and get killed . \n\n 7 / 10 on my pretty harsh ratings scale . xxmaj for some reason i found xxmaj jason xxmaj london on a xxunk funny . xxbos xxmaj this","i can see what girls watching this might be watching . xxmaj and i loved that they had the courage to both let him hurt the younger sister ( most men would , most films would n't ) and get killed . \n\n 7 / 10 on my pretty harsh ratings scale . xxmaj for some reason i found xxmaj jason xxmaj london on a xxunk funny . xxbos xxmaj this is","xxmaj think not how i are this film be watching . xxmaj but i 'm the the were to opportunity to do xxunk each go his girl son and and of he have and of would have ) , that the . xxmaj xxmaj / 10 xxbos the list favorite ratings list . xxmaj the those reason i was it xxunk xxmaj xxunk to a xxunk xxunk , xxmaj xxmaj this is"
5,"of the xxunk movie . i heard it was canceled due to xxunk parents . i watched a lot of r rated stuff as a kid , so its a shame parents had to ruin it for everyone . 4 more movies came after the series , so it was n't a total loss . xxbos xxmaj xxunk xxmaj xxunk and xxmaj xxunk xxunk are terrific . i have seen xxmaj xxunk","the xxunk movie . i heard it was canceled due to xxunk parents . i watched a lot of r rated stuff as a kid , so its a shame parents had to ruin it for everyone . 4 more movies came after the series , so it was n't a total loss . xxbos xxmaj xxunk xxmaj xxunk and xxmaj xxunk xxunk are terrific . i have seen xxmaj xxunk on","the xxmaj xxmaj , xxmaj have the was first because to poor and . xxmaj was it lot of things rated movies and well child , but i a good . had to go it . me . xxmaj / of were from the movie . and i was a a total loss . xxmaj xxmaj this xxmaj xxunk is xxmaj xxunk xxmaj are the in xxmaj have seen many xxunk xxmaj"
6,"xxunk of bad decisions which xxunk the globe until it washed xxunk in xxmaj orlando on xxmaj october xxunk , 2004 . \n\n xxmaj the premise , and i use the word loosely , involves a house in xxmaj xxunk haunted by a xxunk xxmaj xxunk ghost who looks like a cross between xxmaj margaret xxmaj xxunk and xxmaj xxunk xxmaj xxunk , along with her xxunk sidekick a xxunk , xxunk","of bad decisions which xxunk the globe until it washed xxunk in xxmaj orlando on xxmaj october xxunk , 2004 . \n\n xxmaj the premise , and i use the word loosely , involves a house in xxmaj xxunk haunted by a xxunk xxmaj xxunk ghost who looks like a cross between xxmaj margaret xxmaj xxunk and xxmaj xxunk xxmaj xxunk , along with her xxunk sidekick a xxunk , xxunk but",", xxmaj art . are the xxunk . the comes up in the xxunk . xxmaj october xxunk . 2004 . xxmaj xxmaj the film of the the do the word "" , is a xxunk in xxmaj xxunk , by a xxunk xxunk xxunk xxmaj . is like a xxunk between xxmaj xxunk xxmaj xxunk and xxmaj xxunk xxmaj xxunk . and with a xxunk friend xxmaj xxunk xxmaj xxmaj ,"
7,"and i rented this movie because some people had drawn xxunk between it and "" office xxmaj space "" . xxmaj blockbuster and xxup imdb even had it as an "" also recommended "" selection if you liked "" office xxmaj space "" . \n\n xxmaj now , xxmaj i 've seen xxmaj office xxmaj space probably 15 or 20 times . i love it . xxmaj it 's probably one of","i rented this movie because some people had drawn xxunk between it and "" office xxmaj space "" . xxmaj blockbuster and xxup imdb even had it as an "" also recommended "" selection if you liked "" office xxmaj space "" . \n\n xxmaj now , xxmaj i 've seen xxmaj office xxmaj space probably 15 or 20 times . i love it . xxmaj it 's probably one of my","the have the movie because it people would n't it to the and the xxunk xxmaj space "" . xxmaj the and xxup dvd had had a on a "" xxunk "" "" . . you are the xxunk xxmaj space "" . xxmaj xxmaj the , i i 'm seen xxmaj office xxmaj space , the years so times . xxmaj have it . xxmaj it 's a the of the"
8,"on a sloppy rear - projected screen for the stupid chase scene -- which might just rank as one of the worst of its kind in film history . \n\n xxmaj for mind - xxunk zombie lovers of xxmaj laurel and xxmaj hardy , it 's probably a film they will love . xxmaj but , for lovers of the team who are willing to honestly evaluate this film relative to their","a sloppy rear - projected screen for the stupid chase scene -- which might just rank as one of the worst of its kind in film history . \n\n xxmaj for mind - xxunk zombie lovers of xxmaj laurel and xxmaj hardy , it 's probably a film they will love . xxmaj but , for lovers of the team who are willing to honestly evaluate this film relative to their amazing","the rainy xxunk - projected screen . the first xxunk scene . the is have be as one of the worst of the kind in history history . \n\n xxmaj the those - xxunk fans fans , xxmaj xxunk and xxmaj hardy , this 's a the must that have never . xxmaj it it it those of the team , are not to xxunk xxunk this film , to their previous"
9,". xxmaj kyle xxunk xxmaj xxunk xxmaj mitch xxunk xxmaj hudson ) are xxunk friends with different looks on life . xxmaj kyle is the xxunk son of an oil xxunk ; xxmaj mitch works for the xxmaj hadley xxmaj oil xxmaj company . xxmaj both fall in love with the same woman , xxmaj lucy xxmaj moore ; but it is xxmaj kyle that has the means to wow her off","xxmaj kyle xxunk xxmaj xxunk xxmaj mitch xxunk xxmaj hudson ) are xxunk friends with different looks on life . xxmaj kyle is the xxunk son of an oil xxunk ; xxmaj mitch works for the xxmaj hadley xxmaj oil xxmaj company . xxmaj both fall in love with the same woman , xxmaj lucy xxmaj moore ; but it is xxmaj kyle that has the means to wow her off her","xxmaj the and is xxunk , xxunk , , xxunk , , xxunk by and a xxunk . the . xxmaj xxunk is a xxunk of of the xxmaj man and xxmaj mitch is for the xxmaj xxunk xxmaj oil xxmaj company . xxmaj the xxmaj in love with xxmaj xxmaj man , xxmaj xxunk xxmaj xxunk , but the 's xxmaj mitch that is a opportunity to xxunk her off her"


In [13]:
learn.save_encoder('enc1')

## Use it to train a classifier

For classification, we need to use two set of transforms: one to numericalize the texts and the other to encode the labels as categories. Note that we have to use the same vocabulary as the one used in fine-tuning the language model.

In [14]:
lm_vocab = dbunch.vocab

In [15]:
splits = ColSplitter()(df)
x_tfms = [attrgetter("text"), Tokenizer.from_df("text"), Numericalize(vocab=lm_vocab)]
dsrc = DataSource(df, splits=splits, tfms=[x_tfms, [attrgetter("label"), Categorize()]], dl_type=SortedDL)

We once again use a subclass of `TfmdDL` for the dataloaders, since we want to sort the texts (sortish for the training set) by order of lengths. We also use `pad_collate` to create batches form texts of different lengths.

In [16]:
dbunch = dsrc.databunch(before_batch=pad_input, after_batch=Cuda)

And there is a factory method, once again:

In [17]:
dbunch = TextDataBunch.from_df(df, text_col="text", text_vocab=lm_vocab, label_col='label', valid_col='is_valid', bs=32)

In [18]:
dbunch.show_batch(max_n=2, trunc_at=60)

Unnamed: 0,text,category
0,"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no",negative
1,"xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that",positive


Then we once again have a convenience function to create a classifier from this `DataBunch` with the `AWD_LSTM` architecture.

In [19]:
learn = text_classifier_learner(dbunch, AWD_LSTM, metrics=[accuracy], path=path,drop_mult=0.5)

In [20]:
learn = learn.load_encoder('enc1')

Then we can train with gradual unfreezing and differential learning rates.

In [21]:
learn.fit_one_cycle(4, moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,time
0,0.740185,0.641235,0.565,00:06
1,0.586413,0.497715,0.76,00:05
2,0.50931,0.462465,0.755,00:05
3,0.454171,0.46743,0.755,00:05


In [22]:
learn.unfreeze()
learn.opt = learn.create_opt()
learn.fit_one_cycle(8, slice(1e-5,1e-3), moms=(0.8,0.7,0.8))

epoch,train_loss,valid_loss,accuracy,time
0,0.432282,0.463266,0.75,00:10
1,0.40296,0.462794,0.8,00:10
2,0.368824,0.492125,0.785,00:10
3,0.33855,0.444738,0.82,00:11
4,0.305834,0.433889,0.825,00:11
5,0.269694,0.474207,0.81,00:11
6,0.244303,0.471722,0.825,00:09
7,0.226905,0.469794,0.815,00:10


In [23]:
learn.show_results(max_n=2, trunc_at=60)

Unnamed: 0,text,category,category_
0,"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no",negative,negative
1,"xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that",positive,positive


In [24]:
from fastai2.interpret import *

In [25]:
interp = Interpretation.from_learner(learn)

In [26]:
interp.plot_top_losses(6)

Unnamed: 0,input,target,predicted,probability,loss
0,"xxbos xxmaj i 'm gon na xxunk the xxunk here a bit and say i enjoyed this . xxmaj however , the cartoon is really only going to appeal to those who have very xxunk xxunk . xxmaj it 's definitely something that most people will not get , as is the nature of xxunk . \n\n the animation is horrible , but yes , that 's the point . xxmaj the main character is foul mouthed , violent , and stupid . no redeeming qualities whatsoever . his wife xxunk and xxunk , apparently just barely capable of the most basic xxunk skills . most of these stories completely lack any kind of point . \n\n but again , that 's the point xxunk \n\n xxmaj if non xxunk , foul language , and complete and utter xxunk are your thing , you 're going to love this .",positive,negative,0.9972332119941713,5.890044212341309
1,"xxbos xxmaj most italian horror lovers seem to hate this movie since because it has no connection to the first two xxmaj demons films . xxmaj and with the "" demons xxrep 3 i "" in the title , one would assume it would . xxmaj the problem is that this film was never intended to be part of the xxmaj demons series . xxmaj the distributors only a "" demons xxrep 3 i "" above its original title "" the xxmaj ogre "" to cash in on the other films popularity . xxmaj the new xxmaj american xxup dvd release of this picture has the title "" demons xxrep 3 i : xxmaj the xxmaj ogre "" on the box art but the film itself only says "" the xxmaj ogre "" . i do n't know if past releases had the title "" demons xxrep 3 i """,positive,negative,0.9962806105613708,5.594207763671875
2,"xxbos xxmaj while i count myself as a fan of the xxmaj xxunk 5 television series , the original movie that introduced the series was a weak start . xxmaj although many of the elements that would later mature and become much more compelling in the series are there , the pace of xxmaj the xxmaj gathering is slow , the makeup somewhat inadequate , and the plot confusing . xxmaj worse , the characterization in the premiere episode is poor . xxmaj although the ratings xxunk shows that many fans are willing to overlook these problems , i remember xxmaj the xxmaj gathering almost turned me off off what soon grew into a spectacular series .",negative,positive,0.985458254814148,4.230735778808594
3,"xxbos xxmaj weaker entry in the xxmaj xxunk xxmaj drummond series , with xxmaj john xxmaj howard in the role . xxmaj usual funny xxunk and antics , but not much plot . xxmaj barrymore gets something to do as the inspector , xxunk xxunk to follow xxmaj drummond , xxmaj algy , and xxmaj xxunk on a wild xxunk chase ( mostly in circles ; perhaps the budget was tighter than usual ) to rescue poor xxmaj xxunk , who is being held captive by people who want to lure xxmaj drummond to his doom . xxmaj for those keeping score , in this one , xxmaj drummond is planning to ask xxmaj xxunk to marry him and xxmaj algy is worried about missing the baby 's xxunk . xxmaj it 's fun to see xxmaj algy and xxmaj xxunk dressed up as xxunk to blend in at xxmaj",negative,positive,0.9851903319358826,4.2124786376953125
4,"xxbos xxmaj this movie is xxunk in a ' so bad it 's good ' kind of way . \n\n xxmaj the storyline is xxunk from so many other films of this kind , that xxmaj i 'm not going to even bother xxunk it . xxmaj it 's a sword / sorcery picture , has a kid hoping to realize how important he is in this world , has a "" xxunk "" xxunk , an evil xxunk / xxunk , a princess , a hairy creature … xxunk get the point . \n\n xxmaj the first time i caught this movie was during a very harsh winter . i do n't know why i decided to continue watching it for an extra five minutes before turning the channel , but when i caught site of xxmaj gulfax , i decided to stay and watch it until the end",positive,negative,0.9814806580543518,3.988937139511109
5,"xxbos xxmaj for anyone who may not know what a one - actor movie was like , this is the best example . xxmaj this plot is ridiculous , and really makes no sense . xxmaj it 's full of xxunk situations , hackneyed lines , melodrama , comedy … you name it ! \n\n xxmaj but xxmaj xxunk xxmaj xxunk can make anything convincing , and this movie is by no means an exception . xxmaj everyone turns in a decent performance - xxmaj xxunk xxmaj xxunk , xxmaj xxunk xxmaj xxunk , xxmaj xxunk , xxmaj om xxmaj xxunk , xxmaj xxunk xxmaj xxunk … xxmaj but it is the xxmaj xxunk who xxunk everyone with his xxunk presence . xxmaj without him , this movie would have been a non - xxunk … xxmaj the story is about xxunk / mistaken identities / misunderstandings / love /",positive,negative,0.9746053814888,3.6732177734375
