A simple fastai notebook used for training encoder on imdb review dataset. The notebook contains 2 usages of the encoder:
1. Random text generation.
2. Text sentiment classification

In [1]:
from fastai.text.all import *
path = untar_data(URLs.IMDB)

In [2]:
from IPython.display import display,HTML

In [3]:
files = get_text_files(path, folders = ['train', 'test', 'unsup'])

In [4]:
get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=32, seq_len=80)

In [5]:
dls_lm.show_batch(max_n=2)

Unnamed: 0,text,text_
0,"xxbos xxmaj in xxmaj oklahoma , a gas company employee is killed by insects in the housing development xxmaj oasis xxmaj plans . xxmaj dean and xxmaj sam decide to drive to the compound to investigate the event . xxmaj they go to a open barbecue in the house of the owner of the lands , xxmaj larry , and meet a real state agent , pretending they were interested in buying a house . xxmaj sam becomes close to","xxmaj in xxmaj oklahoma , a gas company employee is killed by insects in the housing development xxmaj oasis xxmaj plans . xxmaj dean and xxmaj sam decide to drive to the compound to investigate the event . xxmaj they go to a open barbecue in the house of the owner of the lands , xxmaj larry , and meet a real state agent , pretending they were interested in buying a house . xxmaj sam becomes close to xxmaj"
1,"\n\n xxmaj i 've seen movies that try to scare by cranking up the wind machine and having the cast yell before . "" screams "" is just about the only one where i really felt fear for the characters . xxmaj these actors may have been amateurs , but when called upon , they really do make the ending of this one sing with apocalyptic passion . i almost expected at least one person to survive only to throw","xxmaj i 've seen movies that try to scare by cranking up the wind machine and having the cast yell before . "" screams "" is just about the only one where i really felt fear for the characters . xxmaj these actors may have been amateurs , but when called upon , they really do make the ending of this one sing with apocalyptic passion . i almost expected at least one person to survive only to throw open"


In [6]:
learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3, 
    metrics=[accuracy, Perplexity()]).to_fp16()

In [7]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.173192,3.96706,0.295814,52.829002,18:28


In [10]:
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.90464,3.793264,0.315647,44.401085,20:16
1,3.893327,3.772758,0.319904,43.499893,20:14
2,3.844523,3.729158,0.325296,41.644024,20:15
3,3.758991,3.693784,0.329126,40.196655,20:13
4,3.722609,3.658558,0.333201,38.805359,20:20
5,3.675596,3.625907,0.336954,37.558762,20:19
6,3.615715,3.598427,0.340349,36.540714,20:19
7,3.531554,3.580827,0.342962,35.903236,20:19
8,3.483516,3.575092,0.343828,35.697906,20:18
9,3.417306,3.57796,0.343694,35.800449,20:20


In [11]:
learn.save_encoder('finetuned')

In [12]:
learn.export()

In [7]:
path = learn.path

In [19]:
learn = load_learner(path/'export.pkl')

In [26]:
TEXT = "This has shown a very interesting journey"
N_WORDS = 45
N_SENTENCES = 3
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) 
         for _ in range(N_SENTENCES)]

In [27]:
print("\n".join(preds))

This has shown a very interesting journey into the mind of an actor . It is a well directed and acted film that is true to life and very well made . The actors were very good , and the script was very good . The fact that it
This has shown a very interesting journey : a giant Greek God , a Greek , a German , a Greek and an Indian ! The Greek monster " human " is a man who can do everything and take his place in the
This has shown a very interesting journey through time . If you look closely , this is not TV news . This is clearly a movie made to show how much TV stars can be the most important . The movie itself is extremely powerful , and


In [15]:
dls_clas = DataBlock(
    blocks=(TextBlock.from_folder(path, vocab=dls_lm.vocab),CategoryBlock),
    get_y = parent_label,
    get_items=partial(get_text_files, folders=['train', 'test']),
    splitter=GrandparentSplitter(valid_name='test')
).dataloaders(path, path=path, bs=32, seq_len=72)

In [16]:

dls_clas.show_batch(max_n=3)

Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,pos
2,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,pos


In [17]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, 
                                metrics=accuracy).to_fp16()

In [18]:

learn = learn.load_encoder('finetuned')

In [19]:

learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.397091,0.182046,0.934,01:28


In [20]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,0.244173,0.165306,0.938,01:38


In [21]:
learn.freeze_to(-3)
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.205566,0.141739,0.94728,02:07


In [22]:
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.163323,0.138108,0.94988,02:40
1,0.148316,0.14133,0.94976,02:40
