# Combined the following:
- https://docs.fast.ai/tutorial.data.html#Text
- https://docs.fast.ai/text.html#Building-a-classifier

<br>
<br>
<br>
<br>

# Notes:

Fastai's text module has three steps:
1. Pre-process data
2. Fine-tune a pre-trained model
3. Create other models (e.g. classifiers) on top of the encoder of the fine-tuned model

<br>
<br>
According to: https://gilberttanner.com/blog/fastai-sentiment-analysis

Standard FastAI training pipeline
1. Finding the best learning rate
2. Training the top layers
3. Unfreezing all layers

<br>

Regarding `fit_one_cycle`

- https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html
- https://sgugger.github.io/the-1cycle-policy.html
<br>
<br>

In [1]:
from fastai.text import *

In [2]:
imdb = untar_data(URLs.IMDB_SAMPLE)

In [3]:
imdb

PosixPath('/Users/p787144/.fastai/data/imdb_sample')

In [4]:
df = pd.read_csv(imdb/'texts.csv')
df.head()

Unnamed: 0,label,text,is_valid
0,negative,Un-bleeping-believable! Meg Ryan doesn't even ...,False
1,positive,This is a extremely well-made film. The acting...,False
2,negative,Every once in a long while a movie will come a...,False
3,positive,Name just says it all. I watched this movie wi...,False
4,negative,This movie succeeds at being one of the most u...,False


Can also be done by reading from dataframe, according to https://gilberttanner.com/blog/fastai-sentiment-analysis

`train_df, valid_df = df.loc[:12000,:], df.loc[12000:,:]`

`data_lm = TextLMDataBunch.from_df(path, train_df, valid_df, text_cols=10, bs=32)`

### Prepare data for fine-tuning the language model (lm)

In [5]:
data_lm = (
    TextList
    .from_csv(imdb, 'texts.csv', cols='text')
    .split_by_rand_pct()
    .label_for_lm()
    .databunch()
)

data_lm.save()

data.show_batch( ) shows the beginning of each sequence of text along the batch dimension (the target being to guess the next word).

You  may notice that there are quite a few strange tokens starting with xx. These are special FastAI tokens that have the following meanings:

- xxunk: Token used instead of unknown words (words not found in the vocabulary).
- xxbos: Beginning of a text.
- xxfld: Represents separate parts of your document (several columns in a dataframe) like headline, body, summary, etc.
- xxmaj: Indicates that the next word starts with a capital, e.g. “House” will be tokenized as “xxmaj house”.
- xxup: Indicates that next word is written in all caps, e.g. “WHY” will be tokenized as “xxup why ”.
- xxrep: Token indicates that a character is repeated n times, e.g. if you have  10 $ in a row it will be tokenized as “xxrep 10 $” (in general “xxrep n  {char}”)
- xxwrep: Indicates that a word is repeated n times.
- xxpad : Token used as padding (so every text has the same length)

In [6]:
data_lm.show_batch()

idx,text
0,"cast does an excellent job with the script . \n \n xxmaj but it is hard to watch , because there is no good end to a situation like the one presented . xxmaj it is now xxunk to blame the xxmaj british for setting xxmaj hindus and xxmaj muslims against each other , and then xxunk xxunk them into two countries . xxmaj there is some merit in"
1,'s pretty much about some high school xxunk acting xxunk and doing drugs and speaking in a language that became outdated decades ago . xxmaj one of the female students has a crush on her teacher . xxmaj the teacher has a xxunk wife ( whom he indeed refers to as an xxunk ) so he is xxunk to the girl 's advances . \n \n xxmaj there 's
2,"movie magic as i have ever seen outside of the "" xxmaj star xxmaj wars "" movies , and , given what those films are like , that means this film deserves a high rating indeed . xxmaj ashley xxmaj xxunk ' acting , xxmaj mr. xxmaj xxunk , and its ' great simple worthwhile story make this a fine coming - of - age story and a wonderful movie"
3,"gordon xxmaj xxunk . xxmaj judging by those two efforts already mentioned , xxmaj xxunk was no xxunk  and , this one having already received its share of xxunk over here , is certainly no better ! xxmaj the film , in fact , is quite xxunk of the xxunk which xxunk xxmaj mexican horror xxunk from the era , but given an added xxunk by virtue of the"
4,will love this . \n \n xxmaj the series features stunning photography as well as a few interviews of peoples xxunk . xxmaj xxunk . xxmaj there is another extremely catchy theme song like xxup xxunk but this one is not nearly as good as the xxmaj xxunk . \n \n xxmaj if you live in the xxup us god knows when it will be released so buy


### Prepare data for training the classification model (clas)

In [7]:
data_clas = (
    TextList
    .from_csv(imdb, 'texts.csv', cols='text', vocab=data_lm.vocab)
    .split_from_df(col='is_valid')
    .label_from_df(cols='label')
    .databunch(bs=32)
)

In [8]:
data_clas.show_batch()

text,target
"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n \n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and gooey xxmaj raising xxmaj",negative
"xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that tries too hard , nor does it come up with",positive
"xxbos xxmaj now that xxmaj xxunk ) has finished its relatively short xxmaj australian cinema run ( extremely limited xxunk screen in xxmaj xxunk , after xxunk ) , i can xxunk join both xxunk of "" xxmaj at xxmaj the xxmaj movies "" in taking xxmaj steven xxmaj xxunk to task . \n \n xxmaj it 's usually satisfying to watch a film director change his style /",negative
"xxbos xxmaj this film sat on my xxmaj xxunk for weeks before i watched it . i xxunk a self - indulgent xxunk flick about relationships gone bad . i was wrong ; this was an xxunk xxunk into the xxunk - up xxunk of xxmaj new xxmaj xxunk . \n \n xxmaj the format is the same as xxmaj max xxmaj xxunk ' "" xxmaj la xxmaj xxunk",positive
"xxbos xxmaj many neglect that this is n't just a classic due to the fact that it 's the first xxup 3d game , or even the first xxunk - up . xxmaj it 's also one of the first xxunk games , one of the xxunk definitely the first ) truly claustrophobic games , and just a pretty well - xxunk xxunk experience in general . xxmaj with graphics",positive


<br>
<br>

# 1. Basic Method:
### 1.1 Fine-tune

I believe in below it doesn't matter to specify 
`learn.unfreeze()` or not, since I have seen "total trainable params" in `learn.summary()` is the same with or without `learn.unfreeze()`.

In [9]:
learn = language_model_learner(data_lm, AWD_LSTM)
learn.summary()

SequentialRNN
Layer (type)         Output Shape         Param #    Trainable 
RNNDropout           [70, 400]            0          False     
______________________________________________________________________
RNNDropout           [70, 1152]           0          False     
______________________________________________________________________
RNNDropout           [70, 1152]           0          False     
______________________________________________________________________
Linear               [70, 6088]           2,441,288  True      
______________________________________________________________________
RNNDropout           [70, 400]            0          False     
______________________________________________________________________

Total params: 2,441,288
Total trainable params: 2,441,288
Total non-trainable params: 0
Optimized with 'torch.optim.adam.Adam', betas=(0.9, 0.99)
Using true weight decay as discussed in https://www.fast.ai/2018/07/02/adam-weight-decay/ 
Loss func

In [7]:
learn = language_model_learner(data_lm, AWD_LSTM)
learn.fit_one_cycle(2, 1e-2)
learn.save('mini_train_lm')
learn.save_encoder('mini_train_encoder')

epoch,train_loss,valid_loss,accuracy,time
0,4.291222,3.774262,0.288795,03:53
1,4.009931,3.746215,0.292292,04:07


To evaluate your language model, you can run the Learner.predict method and specify the number of words you want it to guess.

`learn.predict("This is a review about", n_words=10)`

Or

In [8]:
learn.show_results()

text,target,pred
xxbos xxmaj this movie is a must - see movie for all . xxmaj xxunk should see this xxunk documentary,"from the point - of - view of the soldier , as should everyone in xxmaj america . xxmaj the",", the start of of - view . the xxunk who and well the in the xxunk . xxmaj the"
"xxunk as a xxunk lady xxunk on top of him and later his xxunk xxunk xxunk were hysterical , but",then i remembered it was n't supposed to be a comedy . i 'm xxunk xxunk my brain to find,the the was the as a a to be a xxunk . xxmaj was not to to xxunk . see
the lives of a multi - ethnic mix of not so ordinary people in the rural xxmaj xxunk xxmaj xxunk,". xxmaj solid directing and writing along with fine acting , especially the performances by xxmaj xxunk xxmaj xxunk and",", xxmaj the xxunk , xxunk , with a cinematography , and in xxunk of xxmaj xxunk xxmaj xxunk and"
"friday and xxup the xxup xxunk xxup story . xxmaj both of these movies features women with a strong ,","xxunk screen presence and who played independent , xxunk - feminist characters . xxmaj in both movies , both women","xxunk personality . . a is a women xxunk xxunk xxunk xxunk . xxmaj the the the , xxmaj are"
a single interesting or xxunk thing xxmaj james said during the course of the show . xxmaj he is xxup,"that boring and forgettable . xxmaj in fact , one of the xxunk flat out xxunk him he was n't","ok xxunk , xxunk . xxmaj the fact , he of the best of - of is in has in"


### 1.2 Build a classifier

Currently has AWD_LSTM, Transformer and TransformerXL

In [16]:
learn = text_classifier_learner(data_clas, AWD_LSTM)
learn.summary()

SequentialRNN
Layer (type)         Output Shape         Param #    Trainable 
RNNDropout           [24, 400]            0          False     
______________________________________________________________________
RNNDropout           [24, 1152]           0          False     
______________________________________________________________________
RNNDropout           [24, 1152]           0          False     
______________________________________________________________________
BatchNorm1d          [1200]               2,400      True      
______________________________________________________________________
Dropout              [1200]               0          False     
______________________________________________________________________
Linear               [50]                 60,050     True      
______________________________________________________________________
ReLU                 [50]                 0          False     
________________________________________________

In [17]:
learn.freeze()
learn.summary()

SequentialRNN
Layer (type)         Output Shape         Param #    Trainable 
RNNDropout           [24, 400]            0          False     
______________________________________________________________________
RNNDropout           [24, 1152]           0          False     
______________________________________________________________________
RNNDropout           [24, 1152]           0          False     
______________________________________________________________________
BatchNorm1d          [1200]               2,400      True      
______________________________________________________________________
Dropout              [1200]               0          False     
______________________________________________________________________
Linear               [50]                 60,050     True      
______________________________________________________________________
ReLU                 [50]                 0          False     
________________________________________________

In [10]:
del learn

In [11]:
learn = text_classifier_learner(data_clas, AWD_LSTM)
learn.load_encoder('mini_train_encoder_improved')
# learn.unfreeze()
# learn.summary()

RuntimeError: Error(s) in loading state_dict for AWD_LSTM:
	size mismatch for encoder.weight: copying a param with shape torch.Size([6056, 400]) from checkpoint, the shape in current model is torch.Size([6104, 400]).
	size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([6056, 400]) from checkpoint, the shape in current model is torch.Size([6104, 400]).

Regarding `slice()`

- If you pass `slice(start,end)` then the first group's learning rate is `start`, the last is `end`, and the remaining are evenly geometrically spaced.

- If you pass just `slice(end)` then the last group's learning rate is `end`, and all the other groups are `end/10`. For instance (for our learner that has 3 layer groups):

- `learn.lr_range(slice(1e-5,1e-3)), learn.lr_range(slice(1e-3))`
- `(array([1.e-05, 1.e-04, 1.e-03]), array([0.0001, 0.0001, 0.001 ]))`

In [11]:
learn = text_classifier_learner(data_clas, AWD_LSTM)
learn.load_encoder('mini_train_encoder')
learn.fit_one_cycle(2, slice(1e-3,1e-2))
learn.save('mini_train_clas')

epoch,train_loss,valid_loss,accuracy,time
0,0.683442,0.653294,0.64,05:39
1,0.627569,0.57125,0.705,05:46


In [14]:
learn.show_results()

text,target,prediction
"xxbos \n \n i 'm sure things did n't exactly go the same way in the real life of xxmaj homer xxmaj hickam as they did in the film adaptation of his book , xxmaj rocket xxmaj boys , but the movie "" xxmaj october xxmaj sky "" ( an xxunk of the book 's title ) is good enough to stand alone . i have not read xxmaj",positive,positive
"xxbos xxmaj to review this movie , i without any doubt would have to quote that memorable scene in xxmaj tarantino 's "" xxmaj pulp xxmaj fiction "" ( xxunk ) when xxmaj jules and xxmaj vincent are talking about xxmaj xxunk xxmaj xxunk and what she does for a living . xxmaj jules tells xxmaj vincent that the "" xxmaj only thing she did worthwhile was pilot "" .",negative,negative
"xxbos xxmaj how viewers react to this new "" adaption "" of xxmaj shirley xxmaj jackson 's book , which was promoted as xxup not being a remake of the original 1963 movie ( true enough ) , will be based , i suspect , on the following : those who were big fans of either the book or original movie are not going to think much of this one",negative,negative
"xxbos xxmaj the trouble with the book , "" xxmaj memoirs of a xxmaj geisha "" is that it had xxmaj japanese xxunk but underneath the xxunk it was all an xxmaj american man 's way of thinking . xxmaj reading the book is like watching a magnificent ballet with great music , sets , and costumes yet performed by xxunk animals dressed in those xxunk far from xxmaj japanese",negative,negative
"xxbos xxmaj bonanza had a great cast of wonderful actors . xxmaj xxunk xxmaj xxunk , xxmaj pernell xxmaj whitaker , xxmaj michael xxmaj xxunk , xxmaj dan xxmaj blocker , and even xxmaj guy xxmaj williams ( as the cousin who was brought in for several episodes during 1964 to replace xxmaj adam when he was leaving the series ) . xxmaj the cast had chemistry , and they",positive,positive


<br>
<br>

# 2. Better Method:
### 2.1 Fine-tune

In [17]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,4.124669,3.786375,0.289747,04:25


In [18]:
learn.unfreeze()
learn.fit_one_cycle(1, 1e-3)

epoch,train_loss,valid_loss,accuracy,time
0,3.786723,3.713934,0.298958,04:38


In [19]:
learn.predict("This is a review about", n_words=10)

'This is a review about one of the first ones to say that the film'

In [20]:
learn.save_encoder('mini_train_encoder_improved')

### 2.2 Build a classifier

In [21]:
learn = text_classifier_learner(data_clas, AWD_LSTM) #, drop_mult=0.5)
learn.load_encoder('mini_train_encoder_improved')
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.608099,0.612671,0.665,06:19


Unfreeze the model and fine-tune it.

In [22]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, slice(5e-3/2., 5e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.581895,0.492237,0.765,06:09


In [23]:
learn.unfreeze()
learn.fit_one_cycle(1, slice(2e-3/100, 2e-3))

epoch,train_loss,valid_loss,accuracy,time
0,0.47798,0.431328,0.815,06:21


In [24]:
learn.predict("This was a great movie!")

(Category positive, tensor(1), tensor([0.0050, 0.9950]))