# Sentiment analysis: Movie reviews, deep dive

In [1]:
from fastai.text.all import *

Back in Lesson 1, we got up to about 93% accuracy for sentiment analysis of the IMDB movie review dataset (in just 3 lines of code). The model used was `fastai.text.models.awdlstm.AWD_LSTM`

In [2]:
AWD_LSTM

fastai.text.models.awdlstm.AWD_LSTM

where we performed fine tuning. This is a pre-trained language model. A language model is a model which tries to predict the next word of a sentence, which was pretrained on Wikipedia. 

In this notebook, we will use a pretrained Wikipedia language model, and train a IMDb-specific language model. From that, we will then generate a classifier. This is known as the [Universal Language Model Fine-tuning (ULMFit)](https://arxiv.org/abs/1801.06146) approach.

<img src="./figures/ulmfit.png" width="500">

## Text preprocessing

We've already seen from tabular data how to deal with categorical variables -- embeddings. Words are essentially categorical variables. Sentences are, however, more than a bag of words: they are ordered sequences of words (or "tokens"). 

When generating a new vocab, we begin with the vocab of the pre-trained model, and we'll add new words specific to our corpus. Our embedding matrix will be built accordingly: for words that are in the pre-trained vocabulary, use the corresponding row of the embedding matrix. Otherwise, we'll initialize a new row randomly.

The steps for building the language model will therefore be:

1. **Tokenization**: Convert text into a list of words (or characters, or substrings, depending on the granularity ofthe model). 
1. **Numericalization**: Make a list of all the unique words (vocab), and convert each word into a number, by looking up its index in the vocab
1. **Language model data loader creation**: Use `LMDataLoader` which automatically handles creating a dependent variable that is offset from the independent variable by one token. It also handles some important details such as how to shuffle the training data such that the dependent and independent variables maintain their structure
1. **Language model creation**: In this chapter we'll look at using _recurrent neural networks) as the language model.

### Word tokenization with fastai

Converting text into a list of words isn't as simple as it seems. For instance, what do we do with punctuation? How do we deal with words like "don't"? Is it one word or two? What about long medical or chemical words? Hyphenated words? In languages like German, Polish, Chinese, and Japanese, there are even further considerations to convert a sentence into a list of tokens with atomic meaning. 

There is no one correct answer. There are 3 main approaches:

- Word-based: Split a sentence on spaces, as well as applying language-specific fules to separate parts of meaning even when there are no spaces e.g. don't -> do, n't
- Subword-based: Split words into smaller parts, based on commonly occuring substrings e.g. occasion -> o, c, ca, sion
- Character-based


fastai doesn't implement any tokenizers, it just provides a consistent interface to a range of existing tokenizers.

In [3]:
path = untar_data(URLs.IMDB)

In [4]:
path

Path('/home/jupyter/.fastai/data/imdb')

In [7]:
Path.BASE_PATH = path

In [8]:
path.ls()

(#7) [Path('unsup'),Path('tmp_clas'),Path('imdb.vocab'),Path('README'),Path('tmp_lm'),Path('train'),Path('test')]

In [9]:
files = get_text_files(path, folders=['train', 'test', 'unsup'])

In [10]:
txt = files[0].open().read(); txt[:75]

"I know it's an action film but it would help if there was a vague plot, rat"

fastai currently uses a tokenizer library called _spaCy_ as default. `WordTokenizer` will always point to fastai's current default word tokenizer.

In [11]:
WordTokenizer

fastai.text.core.SpacyTokenizer

Still looks like it's _spaCy_

In [18]:
spacy = WordTokenizer()
toks = first(spacy([txt]))
print(coll_repr(toks, 30))

(#65) ['I','know','it',"'s",'an','action','film','but','it','would','help','if','there','was','a','vague','plot',',','rather','than','just','one','long','drawn','-','out','joke','about','muddling','twins'...]


In [14]:
first(spacy(['The U.S. dollar $1 is $1.00.']))

(#9) ['The','U.S.','dollar','$','1','is','$','1.00','.']

In [15]:
first(spacy(['It is. It is it.']))

(#7) ['It','is','.','It','is','it','.']

fastai provides a `Tokenizer` wrapper

In [19]:
tkn = Tokenizer(spacy)
print(coll_repr(tkn(txt), 30))

(#72) ['xxbos','i','know','it',"'s",'an','action','film','but','it','would','help','if','there','was','a','vague','plot',',','rather','than','just','one','long','drawn','-','out','joke','about','muddling'...]


fastai has introduced new tokens beginning with "xx". E.g. `xxbos` means "beginning of stream". This helps the model learn it needs to "forget" what was said previously by focusing on upcoming words. 

In [23]:
print(coll_repr(tkn('It is. It is it.'), 30))

(#10) ['xxbos','xxmaj','it','is','.','xxmaj','it','is','it','.']


The main special tokens introduced by fastai are:

- `xxbos`:: Indicates the beginning of a text (here, a review)
- `xxmaj`:: Indicates the next word begins with a capital (since we lowercased everything)
- `xxunk`:: Indicates the next word is unknown

And the rules used are

In [24]:
defaults.text_proc_rules

[<function fastai.text.core.fix_html(x)>,
 <function fastai.text.core.replace_rep(t)>,
 <function fastai.text.core.replace_wrep(t)>,
 <function fastai.text.core.spec_add_spaces(t)>,
 <function fastai.text.core.rm_useless_spaces(t)>,
 <function fastai.text.core.replace_all_caps(t)>,
 <function fastai.text.core.replace_maj(t)>,
 <function fastai.text.core.lowercase(t, add_bos=True, add_eos=False)>]

In [25]:
coll_repr(tkn('&copy;   Fast.ai www.fast.ai/INDEX'), 31)

"(#11) ['xxbos','©','xxmaj','fast.ai','xxrep','3','w','.fast.ai','/','xxup','index'...]"

### Subword tokenization

In [26]:
txts = L(o.open().read() for o in files[:2000])

In [31]:
def subword(sz):
    sp = SubwordTokenizer(vocab_sz=sz)
    sp.setup(txts)
    return " ".join(first(sp([txt]))[:40])
subword(1000)

"▁I ▁know ▁it ' s ▁an ▁action ▁film ▁but ▁it ▁would ▁help ▁if ▁there ▁was ▁a ▁v a g ue ▁plot , ▁rather ▁than ▁just ▁one ▁long ▁d ra w n - o ut ▁joke ▁about ▁mu d d l"

Transforms in fastai always have a `setup` method. Tokenization requires some initial text in order to define the vocab (according to the most commonly occuring tokens). 

The special character `▁` represents the space character in the original text.

Using a smaller vocab size, then each token will represent fewer characters

In [32]:
subword(200)

"▁I ▁ k n o w ▁it ' s ▁ an ▁a c t i on ▁film ▁b u t ▁it ▁w o u l d ▁he l p ▁ i f ▁the re ▁was ▁a ▁ v a g"

In [33]:
subword(10000)

"▁I ▁know ▁it ' s ▁an ▁action ▁film ▁but ▁it ▁would ▁help ▁if ▁there ▁was ▁a ▁vague ▁plot , ▁rather ▁than ▁just ▁one ▁long ▁drawn - out ▁joke ▁about ▁mudd ling ▁twins . ▁yes , ▁the ▁fights ▁are ▁good ▁but"

Jeremy thinks that subword tokenization is likely to have better performance in the future than word tokenization. 

### Numericalization with fastai

In [34]:
toks = tkn(txt)
print(coll_repr(tkn(txt), 31))

(#72) ['xxbos','i','know','it',"'s",'an','action','film','but','it','would','help','if','there','was','a','vague','plot',',','rather','than','just','one','long','drawn','-','out','joke','about','muddling','twins'...]


Tokenization takes a while and is done in parallel by fastai. But for this manual walkthrough, we'll use a small subset.

In [35]:
toks200 = txts[:200].map(tkn)
toks200[0]

(#72) ['xxbos','i','know','it',"'s",'an','action','film','but','it'...]

And pass this to `setup` of `Numericalize` to create our vocab:

In [37]:
num = Numericalize()
num.setup(toks200)  # creates vocab
coll_repr(num.vocab, 20)

"(#2024) ['xxunk','xxpad','xxbos','xxeos','xxfld','xxrep','xxwrep','xxup','xxmaj','the','.',',','and','a','of','to','is','in','it','i'...]"

Defaults to `max_vocab=60_000` words. If there are more, rare words become `xxunk`.

Once we've created our `Numericalize` object, we can use it as if it were a function:

In [39]:
nums = num(toks)[:20]; nums

tensor([   2,   19,  135,   18,   23,   48,  213,   34,   30,   18,   74,  417,
          55,   53,   26,   13, 1482,  144,   11,  291])

In [40]:
' '.join(num.vocab[o] for o in nums)

"xxbos i know it 's an action film but it would help if there was a vague plot , rather"

## Creating text batches for language model

We need to reshape the list of characters into square arrays to feed into the GPU. This is a bit tricky.

If we have e.g. 6 minibatches, we need to reshape the list into having 6 rows:

In [42]:
from IPython.display import display,HTML

In [43]:
stream = "In this chapter, we will go back over the example of classifying movie reviews we studied in chapter 1 and dig deeper under the surface. First we will look at the processing steps necessary to convert text into numbers and how to customize it. By doing this, we'll have another example of the PreProcessor used in the data block API.\nThen we will study how we build a language model and train it for a while."
tokens = tkn(stream)
bs,seq_len = 6,15
d_tokens = np.array([tokens[i*seq_len:(i+1)*seq_len] for i in range(bs)])
df = pd.DataFrame(d_tokens)
display(HTML(df.to_html(index=False,header=None)))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
xxbos,xxmaj,in,this,chapter,",",we,will,go,back,over,the,example,of,classifying
movie,reviews,we,studied,in,chapter,1,and,dig,deeper,under,the,surface,.,xxmaj
first,we,will,look,at,the,processing,steps,necessary,to,convert,text,into,numbers,and
how,to,customize,it,.,xxmaj,by,doing,this,",",we,'ll,have,another,example
of,the,preprocessor,used,in,the,data,block,xxup,api,.,\n,xxmaj,then,we
will,study,how,we,build,a,language,model,and,train,it,for,a,while,.


We're not done yet. For the IMDB corpus, `stream` is several million tokens. If we stop here, the data won't all fit into the GPU.

We therefore need to split this array more finely **whilst retaining the order within and across arrays**

If, for example, we want the sequence length to be 5 within a single batch, then we want minibatches which look like this:

In [44]:
bs,seq_len = 6,5
d_tokens = np.array([tokens[i*15:i*15+seq_len] for i in range(bs)])
df = pd.DataFrame(d_tokens)
display(HTML(df.to_html(index=False,header=None)))

0,1,2,3,4
xxbos,xxmaj,in,this,chapter
movie,reviews,we,studied,in
first,we,will,look,at
how,to,customize,it,.
of,the,preprocessor,used,in
will,study,how,we,build


In [45]:
bs,seq_len = 6,5
d_tokens = np.array([tokens[i*15+seq_len:i*15+2*seq_len] for i in range(bs)])
df = pd.DataFrame(d_tokens)
display(HTML(df.to_html(index=False,header=None)))

0,1,2,3,4
",",we,will,go,back
chapter,1,and,dig,deeper
the,processing,steps,necessary,to
xxmaj,by,doing,this,","
the,data,block,xxup,api
a,language,model,and,train


In [46]:
bs,seq_len = 6,5
d_tokens = np.array([tokens[i*15+10:i*15+15] for i in range(bs)])
df = pd.DataFrame(d_tokens)
display(HTML(df.to_html(index=False,header=None)))

0,1,2,3,4
over,the,example,of,classifying
under,the,surface,.,xxmaj
convert,text,into,numbers,and
we,'ll,have,another,example
.,\n,xxmaj,then,we
it,for,a,while,.


**So the first row, across mini-batches, is a contiguous stream of tokens.** Therefore the second row of the first batch begins at the end of the final batch of the first row, and so on.

All this fiddly stuff is handled for us by the `LMDataLoader`

First numericalize:

In [48]:
nums200 = toks200.map(num)

... then pass into the `LMDataLoader`

In [49]:
dl = LMDataLoader(nums200)

and confirm this works by getting the first batch

In [51]:
x,y = first(dl)
x.shape, y.shape

(torch.Size([64, 72]), torch.Size([64, 72]))

64 is the default batch size and 72 is the default sequence length

In [54]:
' '.join(num.vocab[o] for o in x[0][:20])

"xxbos i know it 's an action film but it would help if there was a vague plot , rather"

In [55]:
' '.join(num.vocab[o] for o in y[0][:20])

"i know it 's an action film but it would help if there was a vague plot , rather than"

So `y` is indeed shifted by 1 token forward

## Creating a language model

fastai's `DataBlock` does tokenization and numericalization automatically.

In [57]:
get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])

In [58]:
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True), 
    # the method here has a few optimizations since 
    # tokenization/numericalization are expensive
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

In [73]:
dls_lm.show_batch(max_n=2)

Unnamed: 0,text,text_
0,"xxbos xxmaj along with the town undertaker , a xxmaj man follows his brother 's killer ( george xxmaj eastman ) to xxmaj mexico , where sinister ( and i mean sinister ! ) gringo xxmaj horst xxmaj frank forces his xxmaj mexican slaves to fight to the death using steel claws . xxmaj frank then sets his sights on xxmaj eastman 's secret goldmine , holding him captive and torturing him until the justice seeking brother comes to take","xxmaj along with the town undertaker , a xxmaj man follows his brother 's killer ( george xxmaj eastman ) to xxmaj mexico , where sinister ( and i mean sinister ! ) gringo xxmaj horst xxmaj frank forces his xxmaj mexican slaves to fight to the death using steel claws . xxmaj frank then sets his sights on xxmaj eastman 's secret goldmine , holding him captive and torturing him until the justice seeking brother comes to take him"
1,". xxmaj if you feel curious and you 're open - minded , give it a try , you might like it . xxbos xxmaj after watching the slick and creative promos and finding out about the gargantuan star cast , my expectations for this movie were sky - high . xxmaj having six / seven different stories , you ca n't judge the movie as a whole . xxmaj the first story was extremely scary and had very effectively","xxmaj if you feel curious and you 're open - minded , give it a try , you might like it . xxbos xxmaj after watching the slick and creative promos and finding out about the gargantuan star cast , my expectations for this movie were sky - high . xxmaj having six / seven different stories , you ca n't judge the movie as a whole . xxmaj the first story was extremely scary and had very effectively used"


In [126]:
learn = language_model_learner(dls_lm, AWD_LSTM, drop_mult=0.3, metrics=[accuracy, Perplexity()]).to_fp16()

In [75]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.129025,3.914229,0.299565,50.110428,32:29


In [76]:
learn.save('1epoch')

Path('models/1epoch.pth')

In [128]:
learn.load('1epoch');

In [129]:
learn.save_encoder('1epoch-encoder');

If I could be bothered we could

```python
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3
```

We would then only need to save the encoder, which is all of the model except for the final layer

```python
learn.save_encoder('finetuned')
```

### Text generation

Not necessary for our application, but just because we can:

In [77]:
TEXT = 'I liked this movie because'
N_WORDS = 40
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
         for _ in range(N_SENTENCES)
        ]

In [78]:
print("\n".join(preds))

i liked this movie because i loved the movie , and i had to enjoy it with a good director . The movie was bad . In the end , however , i was surprised that i did n't have any chemistry and
i liked this movie because the plot of this film was so much more interesting than the first movie . But the movie itself was not a good movie at times way over the top ( if you 're looking for a sequel though


Jeremy's is better :(

There are more sophisticated ways of getting predictions out of a model, `learn.predict` uses the simplest.

## Creating a classifier

First we need to make another `DataBlock`

In [79]:
(path/'train').ls()

(#4) [Path('train/neg'),Path('train/pos'),Path('train/labeledBow.feat'),Path('train/unsupBow.feat')]

In [80]:
(path/'train'/'pos').ls()

(#12500) [Path('train/pos/629_9.txt'),Path('train/pos/5149_8.txt'),Path('train/pos/10402_10.txt'),Path('train/pos/9896_8.txt'),Path('train/pos/8203_7.txt'),Path('train/pos/6437_9.txt'),Path('train/pos/4465_7.txt'),Path('train/pos/9748_10.txt'),Path('train/pos/4783_10.txt'),Path('train/pos/11140_9.txt')...]

In [None]:
dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True), 
    # the method here has a few optimizations since 
    # tokenization/numericalization are expensive
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=80)

In [110]:
db_clas = DataBlock(
    blocks=(TextBlock.from_folder(path, vocab=dls_lm.vocab),CategoryBlock),  
    # use language model vocab. is_lm = False, because we're not making a language model
    get_y = parent_label,
    get_items=partial(get_text_files, folders=['train', 'test']),
    splitter=GrandparentSplitter(valid_name='test')
)

dls_clas = db_clas.dataloaders(path, path=path, bs=128, seq_len=72)

In [136]:
dls_clas.show_batch(max_n=5)

Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,neg
2,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,neg
3,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,pos
4,xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad xxpad,neg


For classification, we need to associate each example with the category, so we can't split a single review up across distant mini-batches. Reviews have heterogeneous lengths. So, fastai uses the padding character `xxpad` to ensure that all batches are consistent dimensions.

I'm quite concerned about these items which are entirely `xpad` though... I've posted [a question on the forum](https://forums.fast.ai/t/lesson-8-official-topic/70494/283?u=jaryaman). Let's see what happens when we train and if we get nonsense results

Create learner

In [130]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, 
                                metrics=accuracy).to_fp16()

In [131]:
learn = learn.load_encoder('1epoch-encoder')  # has to be the encoder that you load, not the rest

In [132]:
learn.fit_one_cycle(1, 2e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.486559,0.324411,0.86008,00:58


Ok, so if all my batches were entirely `xxpad` then it'd be impossible to predict the category. So I guess all that padding isn't affecting the prediction accuracy...(?)

For NLP it's found empirically that it's better to unfreeze just one layer at a time

In [137]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(1e-2/(2.6**4),1e-2))

epoch,train_loss,valid_loss,accuracy,time
0,0.331934,0.246476,0.90212,01:08


In [138]:
learn.freeze_to(-3)
learn.fit_one_cycle(1, slice(5e-3/(2.6**4),5e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.241276,0.197537,0.92324,01:32


In [139]:
learn.unfreeze()
learn.fit_one_cycle(2, slice(1e-3/(2.6**4),1e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.2107,0.19317,0.92536,01:46
1,0.196802,0.191539,0.9264,01:51


So even with my crummy language model that was only fit for one cycle, its doing pretty well. A better language model will probably improve things.