In [None]:
from fastai2.text.all import *

In [None]:
from nbdev.showdoc import *

In [None]:
# all_slow

# Transfer learning in text

> How to fine-tune a language model and train a classifier

## Finetune a pretrained Language Model

First we get our data and tokenize it.

In [None]:
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')

Then we put it in a `DataSource`. For a language model, we don't have targets, so there is only one transform to numericalize the texts.

In [None]:
splits = ColSplitter()(df)
tfms = [attrgetter("text"), Tokenizer.from_df("text"), Numericalize()]
dsrc = DataSource(df, [tfms], splits=splits, dl_type=LMDataLoader)

Then we use that `DataSource` to create a `DataBunch`. Here the class of `TfmdDL` we need to use is `LMDataLoader` which will concatenate all the texts in a source (with a shuffle at each epoch for the training set), split it in `bs` chunks then read continuously through it.

In [None]:
dbunch = dsrc.databunch(bs=64, seq_len=72, after_batch=Cuda)

Or more simply with a factory method:

In [None]:
dbunch = TextDataBunch.from_df(df, text_col='text', is_lm=True, valid_col='is_valid')

In [None]:
dbunch.show_batch(max_n=2)

Unnamed: 0,text,text_
0,"xxbos xxmaj if you loved xxmaj long xxmaj way xxmaj round you will enjoy this nearly as much . xxmaj it is educational , funny , interesting and tense . xxmaj xxunk shares the screen with two interesting xxunk , two tired mechanics , two excellent xxunk and too much xxmaj russ . xxmaj ewan makes a few appearances but xxmaj xxunk really pulls it off alone . xxmaj he is funny","xxmaj if you loved xxmaj long xxmaj way xxmaj round you will enjoy this nearly as much . xxmaj it is educational , funny , interesting and tense . xxmaj xxunk shares the screen with two interesting xxunk , two tired mechanics , two excellent xxunk and too much xxmaj russ . xxmaj ewan makes a few appearances but xxmaj xxunk really pulls it off alone . xxmaj he is funny ,"
1,", and xxup then he finally dies . ) i guess it 's not going to be perfect , since it 's an independent movie , but it still could be better . xxmaj not worth watching , honestly , even for kids . xxmaj might as well watch something good , like xxmaj the xxmaj lion xxmaj king or xxmaj toy xxmaj story if you 're going to see anything you","and xxup then he finally dies . ) i guess it 's not going to be perfect , since it 's an independent movie , but it still could be better . xxmaj not worth watching , honestly , even for kids . xxmaj might as well watch something good , like xxmaj the xxmaj lion xxmaj king or xxmaj toy xxmaj story if you 're going to see anything you 'll"


Then we have a convenience method to directly grab a `Learner` from it, using the `AWD_LSTM` architecture.

In [None]:
learn = language_model_learner(dbunch, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, opt_func = partial(Adam, wd=0.1)).to_fp16()

In [None]:
learn.freeze()
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.55299,4.052593,0.27499,57.546467,00:09


In [None]:
learn.unfreeze()
learn.fit_one_cycle(4, 1e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.306663,4.1495,0.258906,63.402313,00:11
1,4.136341,4.089919,0.269491,59.735058,00:11
2,3.815114,4.048334,0.273331,57.301918,00:10
3,3.44184,4.085623,0.273609,59.47897,00:11


Once we have fine-tuned the pretrained language model to this corpus, we save the encoder since we will use it for the classifier.

In [None]:
learn.show_results()

Unnamed: 0,input,target,pred
0,"xxbos xxmaj ah , a xxmaj kelly / xxmaj sinatra sailor - suit musical . xxmaj so familiar , right ? xxmaj yes , but this is n't the one you usually hear about . xxmaj on xxmaj the xxmaj town 's that - a - way . xxmaj but if you stick around , you might learn something . xxmaj okay , probably not . xxmaj anyway , xxmaj xxunk xxmaj","xxmaj ah , a xxmaj kelly / xxmaj sinatra sailor - suit musical . xxmaj so familiar , right ? xxmaj yes , but this is n't the one you usually hear about . xxmaj on xxmaj the xxmaj town 's that - a - way . xxmaj but if you stick around , you might learn something . xxmaj okay , probably not . xxmaj anyway , xxmaj xxunk xxmaj xxunk","xxmaj this , i xxmaj japanese xxmaj xxmaj kelly fan - xxunk who . xxmaj it , , that ? xxmaj and , i it is n't a best that 're hear about . xxmaj it the the xxmaj town 's xxmaj xxmaj a - way to xxmaj it it you want to the you 'll be that . xxmaj it , it because a xxmaj but , i i xxmaj xxunk"
1,"only reason you may not like it is because it is set in the future where xxmaj xxunk has gone to hell . that and you my not like it cause the future they show could very well happen . xxbos xxmaj stephen xxmaj king was raised on flicks like this . xxunk xxup not films . \n\n xxmaj movies like this and ' jeepers xxmaj creepers ' are "" xxunk ""","reason you may not like it is because it is set in the future where xxmaj xxunk has gone to hell . that and you my not like it cause the future they show could very well happen . xxbos xxmaj stephen xxmaj king was raised on flicks like this . xxunk xxup not films . \n\n xxmaj movies like this and ' jeepers xxmaj creepers ' are "" xxunk "" to","one why can not like this . because it is a in the xxmaj . it xxunk xxmaj been to work . xxmaj the when can xxunk - the . the movie to will it be well happen . xxmaj xxmaj this xxmaj king is a by a like this . xxmaj xxmaj xxunk xxup like xxmaj xxmaj the like this are xxmaj xxunk xxmaj xxunk ' are so xxunk "" ."
2,"one truly understand the xxunk of this film . xxmaj personally i enjoy the narrator for his intelligent , no subject left xxunk , style of narration . xxmaj the introduction grips you right away , and holds you at the edge of your seat throughout the film . xxmaj he provides wonderful insight into the world of the xxunk and allows the audience to really ' connect ' with internal horror","truly understand the xxunk of this film . xxmaj personally i enjoy the narrator for his intelligent , no subject left xxunk , style of narration . xxmaj the introduction grips you right away , and holds you at the edge of your seat throughout the film . xxmaj he provides wonderful insight into the world of the xxunk and allows the audience to really ' connect ' with internal horror this","of loved the xxunk of the film . xxmaj it , was the xxunk 's his xxunk and xxunk - , xxunk , and of writing , xxmaj the xxunk of the out away from and the you in the right of your seat . the entire . xxmaj the is us insight into the world of the xxmaj and the the viewer to xxunk xxunk xxunk ' with the xxunk ."
3,"xxmaj paperhouse yet available in the xxup u.s . ( only in xxmaj europe ) , here 's hoping one of my wishes will come true as i truly cherish this beautiful film and a xxup dvd of it would be very welcome ! \n\n xxmaj it 's satisfying watching the girl work out her thoughts like a xxunk game trying to make the dream world work for her and her xxunk","paperhouse yet available in the xxup u.s . ( only in xxmaj europe ) , here 's hoping one of my wishes will come true as i truly cherish this beautiful film and a xxup dvd of it would be very welcome ! \n\n xxmaj it 's satisfying watching the girl work out her thoughts like a xxunk game trying to make the dream world work for her and her xxunk friend","xxunk . . on xxmaj xxup u.s . . and the the europe ) . and 's where that might the friends will be out . i am have the film film . it new dvd of it . be worth welcome . xxbos xxmaj the 's a to this film in out the xxunk on this xxunk , . to make a girl world feel . her . her xxunk xxunk"
4,"when i first saw this short , i was really laughing so hard , that like with a lot of other films that i have seen , no sound came out ! xxmaj curly is really great at "" singing "" opera in this one , i am surprised that he did not consider a career as a professional singer , because he was really good ! \n\n xxmaj if you noticed","i first saw this short , i was really laughing so hard , that like with a lot of other films that i have seen , no sound came out ! xxmaj curly is really great at "" singing "" opera in this one , i am surprised that he did not consider a career as a professional singer , because he was really good ! \n\n xxmaj if you noticed ,","xxmaj saw saw the movie , i was surprised surprised . hard . that i xxmaj the lot of other movies , were have seen , this one was out . xxmaj the was a a as this xxunk "" this , this movie . and was surprised that he did n't have himself career as a professional actor . because he was a good at xxmaj xxmaj the you are the"
5,"but not in an amusingly inept way , simply incredibly tedious . xxmaj this footage has clearly been knocked together quickly and without any effort . xxmaj it serves as a framing device for the endless clips from the first ( and possibly second ) movies . xxmaj and boy , do they milk those clips from the earlier films ; sometimes xxunk sequences over and over again . xxmaj the only","not in an amusingly inept way , simply incredibly tedious . xxmaj this footage has clearly been knocked together quickly and without any effort . xxmaj it serves as a framing device for the endless clips from the first ( and possibly second ) movies . xxmaj and boy , do they milk those clips from the earlier films ; sometimes xxunk sequences over and over again . xxmaj the only new","not as the attempt inept way . and because xxunk . xxmaj the is is a been xxunk together by and without any regard . xxmaj the 's as a xxunk device for the movie xxunk and the movie and and second second ) movie . xxmaj the the , i n't get the films from the first films . they they the are and over again . xxmaj the xxunk redeeming"
6,"he has to go to the theater . xxmaj that xxmaj april xxunk must have been very busy for xxmaj xxunk - in "" xxunk xxmaj city "" he xxunk a pardon to xxmaj errol xxmaj flynn at the xxunk of xxmaj xxunk xxmaj hopkins on the same date . \n\n xxmaj actually , while xxmaj lincoln was concerned about the xxmaj west , his immediate thoughts on the last day of","has to go to the theater . xxmaj that xxmaj april xxunk must have been very busy for xxmaj xxunk - in "" xxunk xxmaj city "" he xxunk a pardon to xxmaj errol xxmaj flynn at the xxunk of xxmaj xxunk xxmaj hopkins on the same date . \n\n xxmaj actually , while xxmaj lincoln was concerned about the xxmaj west , his immediate thoughts on the last day of his","'s been xxunk to a xxunk and xxmaj he 's xxunk is is have been a xxunk with the xxunk xxmaj xxunk the xxunk "" park "" , was to xxunk to xxmaj xxunk xxmaj flynn . the same of xxmaj xxunk xxmaj xxunk . the xxmaj date . xxmaj xxmaj the , the xxmaj xxunk 's still with the xxmaj west , he xxunk reaction on the xxmaj day of the"
7,"point to this film . \n\n xxmaj for content that 's supposed to be so ' xxunk ' and ' controversial ' the things that xxmaj xxunk xxunk to the students are awfully lame . xxmaj students seem to be easily xxunk by xxunk anti - xxunk sentiments and xxunk of words xxunk xxunk onto xxunk . xxmaj rebel , everybody . \n\n i suppose it would have been too much to","to this film . \n\n xxmaj for content that 's supposed to be so ' xxunk ' and ' controversial ' the things that xxmaj xxunk xxunk to the students are awfully lame . xxmaj students seem to be easily xxunk by xxunk anti - xxunk sentiments and xxunk of words xxunk xxunk onto xxunk . xxmaj rebel , everybody . \n\n i suppose it would have been too much to ask","of the : . xxmaj xxmaj the those that is not to be a bad xxunk ' , ' xxunk ' , film that xxmaj xxunk xxunk to the film are xxunk xxunk . xxmaj the are to be xxunk xxunk by the xxunk - xxunk xxunk and xxunk xxunk xxunk . xxunk . the . xxmaj the xxunk xxunk , xxmaj xxmaj think that 's have been a bad to be"
8,"best films of the 70 's . i love the type of humor in this film , it just makes me laugh so hard . \n\n i got this movie on xxup vhs 3 days ago ( yes , xxup vhs because it was cheaper - only $ 3 ) . i watched it as soon as i got home , but i had to watch it again because i kept missing","films of the 70 's . i love the type of humor in this film , it just makes me laugh so hard . \n\n i got this movie on xxup vhs 3 days ago ( yes , xxup vhs because it was cheaper - only $ 3 ) . i watched it as soon as i got home , but i had to watch it again because i kept missing a","of of the year 's . xxmaj have the xxunk of movie that this film , but 's makes me laugh so hard . xxmaj xxmaj was the movie because xxup vhs , days ago , i , i vhs , it was xxup to only $ 1 ) , xxmaj was it with well as i was to , and i was to watch it because because i was watching the"
9,"themselves into their respective roles . \n\n xxmaj music , which was used so powerfully in xxup bsg , also plays a significant role in xxmaj caprica . xxmaj battlestar 's powerful rolling drums and mournful xxunk served it 's themes very well . xxmaj caprica uses a more xxunk sound , which gives the show it 's own feeling quite distinct from either of it 's predecessors . \n\n xxmaj the","into their respective roles . \n\n xxmaj music , which was used so powerfully in xxup bsg , also plays a significant role in xxmaj caprica . xxmaj battlestar 's powerful rolling drums and mournful xxunk served it 's themes very well . xxmaj caprica uses a more xxunk sound , which gives the show it 's own feeling quite distinct from either of it 's predecessors . \n\n xxmaj the new",". a own roles . xxmaj xxmaj the is music is used in well in the wwii , is plays a major role in the caprica . xxmaj it 's xxunk , drums and xxunk xxunk xxunk as 's xxunk . well . xxmaj it is a more xxunk atmosphere effects which makes the show a 's own feel . xxunk from the of the 's predecessors . xxmaj xxmaj the overall"


In [None]:
learn.save_encoder('enc1')

## Use it to train a classifier

For classification, we need to use two set of transforms: one to numericalize the texts and the other to encode the labels as categories. Note that we have to use the same vocabulary as the one used in fine-tuning the language model.

In [None]:
lm_vocab = dbunch.vocab

In [None]:
splits = ColSplitter()(df)
x_tfms = [attrgetter("text"), Tokenizer.from_df("text"), Numericalize(vocab=lm_vocab)]
dsrc = DataSource(df, splits=splits, tfms=[x_tfms, [attrgetter("label"), Categorize()]], dl_type=SortedDL)

We once again use a subclass of `TfmdDL` for the dataloaders, since we want to sort the texts (sortish for the training set) by order of lengths. We also use `pad_collate` to create batches form texts of different lengths.

In [None]:
dbunch = dsrc.databunch(before_batch=pad_input, after_batch=Cuda)

And there is a factory method, once again:

In [None]:
dbunch = TextDataBunch.from_df(df, text_col="text", text_vocab=lm_vocab, label_col='label', valid_col='is_valid', bs=32)

In [None]:
dbunch.show_batch(max_n=2, trunc_at=60)

Unnamed: 0,text,category
0,"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no",negative
1,"xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that",positive


Then we once again have a convenience function to create a classifier from this `DataBunch` with the `AWD_LSTM` architecture.

In [None]:
learn = text_classifier_learner(dbunch, AWD_LSTM, metrics=[accuracy], path=path,drop_mult=0.5)

In [None]:
learn = learn.load_encoder('enc1')

Then we can train with gradual unfreezing and differential learning rates.

In [None]:
learn.fit_one_cycle(4)

epoch,train_loss,valid_loss,accuracy,time
0,0.740185,0.641235,0.565,00:06
1,0.586413,0.497715,0.76,00:05
2,0.50931,0.462465,0.755,00:05
3,0.454171,0.46743,0.755,00:05


In [None]:
learn.unfreeze()
learn.opt = learn.create_opt()
learn.fit_one_cycle(8, slice(1e-5,1e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.432282,0.463266,0.75,00:10
1,0.40296,0.462794,0.8,00:10
2,0.368824,0.492125,0.785,00:10
3,0.33855,0.444738,0.82,00:11
4,0.305834,0.433889,0.825,00:11
5,0.269694,0.474207,0.81,00:11
6,0.244303,0.471722,0.825,00:09
7,0.226905,0.469794,0.815,00:10


In [None]:
learn.show_results(max_n=2, trunc_at=60)

Unnamed: 0,text,category,category_
0,"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no",negative,negative
1,"xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that",positive,positive


In [None]:
learn.predict("This was a good movie")

('positive', tensor(1), tensor([0.4949, 0.5051]))

In [None]:
from fastai2.interpret import *

In [None]:
interp = Interpretation.from_learner(learn)

In [None]:
interp.plot_top_losses(6)

Unnamed: 0,input,target,predicted,probability,loss
0,"xxbos xxmaj i 'm gon na xxunk the xxunk here a bit and say i enjoyed this . xxmaj however , the cartoon is really only going to appeal to those who have very xxunk xxunk . xxmaj it 's definitely something that most people will not get , as is the nature of xxunk . \n\n the animation is horrible , but yes , that 's the point . xxmaj the main character is foul mouthed , violent , and stupid . no redeeming qualities whatsoever . his wife xxunk and xxunk , apparently just barely capable of the most basic xxunk skills . most of these stories completely lack any kind of point . \n\n but again , that 's the point xxunk \n\n xxmaj if non xxunk , foul language , and complete and utter xxunk are your thing , you 're going to love this .",positive,negative,0.9972332119941713,5.890044212341309
1,"xxbos xxmaj most italian horror lovers seem to hate this movie since because it has no connection to the first two xxmaj demons films . xxmaj and with the "" demons xxrep 3 i "" in the title , one would assume it would . xxmaj the problem is that this film was never intended to be part of the xxmaj demons series . xxmaj the distributors only a "" demons xxrep 3 i "" above its original title "" the xxmaj ogre "" to cash in on the other films popularity . xxmaj the new xxmaj american xxup dvd release of this picture has the title "" demons xxrep 3 i : xxmaj the xxmaj ogre "" on the box art but the film itself only says "" the xxmaj ogre "" . i do n't know if past releases had the title "" demons xxrep 3 i """,positive,negative,0.9962806105613708,5.594207763671875
2,"xxbos xxmaj while i count myself as a fan of the xxmaj xxunk 5 television series , the original movie that introduced the series was a weak start . xxmaj although many of the elements that would later mature and become much more compelling in the series are there , the pace of xxmaj the xxmaj gathering is slow , the makeup somewhat inadequate , and the plot confusing . xxmaj worse , the characterization in the premiere episode is poor . xxmaj although the ratings xxunk shows that many fans are willing to overlook these problems , i remember xxmaj the xxmaj gathering almost turned me off off what soon grew into a spectacular series .",negative,positive,0.985458254814148,4.230735778808594
3,"xxbos xxmaj weaker entry in the xxmaj xxunk xxmaj drummond series , with xxmaj john xxmaj howard in the role . xxmaj usual funny xxunk and antics , but not much plot . xxmaj barrymore gets something to do as the inspector , xxunk xxunk to follow xxmaj drummond , xxmaj algy , and xxmaj xxunk on a wild xxunk chase ( mostly in circles ; perhaps the budget was tighter than usual ) to rescue poor xxmaj xxunk , who is being held captive by people who want to lure xxmaj drummond to his doom . xxmaj for those keeping score , in this one , xxmaj drummond is planning to ask xxmaj xxunk to marry him and xxmaj algy is worried about missing the baby 's xxunk . xxmaj it 's fun to see xxmaj algy and xxmaj xxunk dressed up as xxunk to blend in at xxmaj",negative,positive,0.9851903319358826,4.2124786376953125
4,"xxbos xxmaj this movie is xxunk in a ' so bad it 's good ' kind of way . \n\n xxmaj the storyline is xxunk from so many other films of this kind , that xxmaj i 'm not going to even bother xxunk it . xxmaj it 's a sword / sorcery picture , has a kid hoping to realize how important he is in this world , has a "" xxunk "" xxunk , an evil xxunk / xxunk , a princess , a hairy creature … xxunk get the point . \n\n xxmaj the first time i caught this movie was during a very harsh winter . i do n't know why i decided to continue watching it for an extra five minutes before turning the channel , but when i caught site of xxmaj gulfax , i decided to stay and watch it until the end",positive,negative,0.9814806580543518,3.988937139511109
5,"xxbos xxmaj for anyone who may not know what a one - actor movie was like , this is the best example . xxmaj this plot is ridiculous , and really makes no sense . xxmaj it 's full of xxunk situations , hackneyed lines , melodrama , comedy … you name it ! \n\n xxmaj but xxmaj xxunk xxmaj xxunk can make anything convincing , and this movie is by no means an exception . xxmaj everyone turns in a decent performance - xxmaj xxunk xxmaj xxunk , xxmaj xxunk xxmaj xxunk , xxmaj xxunk , xxmaj om xxmaj xxunk , xxmaj xxunk xxmaj xxunk … xxmaj but it is the xxmaj xxunk who xxunk everyone with his xxunk presence . xxmaj without him , this movie would have been a non - xxunk … xxmaj the story is about xxunk / mistaken identities / misunderstandings / love /",positive,negative,0.9746053814888,3.6732177734375
