# Library

In [None]:
from fastai.text.all import *
import warnings
warnings.filterwarnings('ignore')

## Set a seed

In [None]:
set_seed(42)

# Get data

In [None]:
path = Path('/kaggle/input/imdb-dataset-of-50k-movie-reviews')

In [None]:
path.ls()

In [None]:
train_df = pd.read_csv(path/'IMDB Dataset.csv')

In [None]:
train_df.head(3)

## To classify the reviews we'll be using the ULMFiT (Universal Language Model Fine-Tuning) approach. The first step is to fine-tune a pretrained language model (the model we use here, AWD-LSTM, which stands for Average Stochastic Gradient Descent Weight-Dropped Long Short Term Memory, was trained on Wikipedia pages). We fine-tune it to our corpus of IMDb reviews to make it understand the content of a movie review. This step is valuable, because it helps the model to get used to the style of the corpus we are targeting.

# Fine-tuning the language model

### Text preprocessing in fastai

In order to train a language model, we have to preprocess the texts in our dataset, since our computer cannot understand natural language in its written form. We can treat the text as categorical variables, but with the addition of the idea of a sequence. The steps to take will be:
1. Tokenization: Concatenate all of the documents in our dataset into one long string and split it into words (or tokens), which will give us a very long list of them (we call this the vocabulary).
2. Numericalization: Replace each token with its index in the vocabulary.
3. Create an embedding matrix for this containing a row for each item of the vocabulary.
4. Use this embedding matrix as the first layer of a neural network. 

Note: In the tokenization process, words are not the only things in the vocabulary. Some special tokens are added, like *xxbos*, which indicates the beginning of a text. There are also rules, for example removing all repetitions of the space character.

Our independent variable will be the sequence of words from the first word in our list and ending with the second to last. Our dependent variable will be the word sequence starting with the second word and ending with the last word.

## In fastai, all of these steps are done automatically, but can be customized to suit our specific needs.

### Create DataLoaders from our DataFrame. We set is_lm (is language model) to True to indicate that we're going to fine-tune a language, not a text classification model. We set the batch size to 128 and sequence length to 80, meaning that we feed 128 sequences of length 80 to the model at once.

In [None]:
dls_lm = TextDataLoaders.from_df(train_df, is_lm=True, text_col='review', labelcol='sentiment', bs=128, seq_len=80)

### Create the model using AWD-LSTM architecture. To monitor progress, we choose accuracy and perplexity (the exponent of our loss function, here, cross-entropy) as our metrics. We make the model use mixed precision floats, to lower computation cost and make training faster

In [None]:
learn = language_model_learner(dls_lm, AWD_LSTM, drop_mult=0.3, metrics=[accuracy, Perplexity()]).to_fp16()

### Fine-tune the model to IMDb reviews, using the one-cycle approach, where we increase and decrease learning rates in a cyclical manner over the epochs. First, we fine-tune the head of the model, which is specific to our task and different from the one used in training on Wikipedia pages.

In [None]:
learn.fit_one_cycle(1, lr_max=2e-2)

### Now unfreeze the remaining layers and fine-tune the whole model.

In [None]:
learn.unfreeze()
learn.fit_one_cycle(7, lr_max=2e-3)

### Save the encoder to use in the text classifier.

In [None]:
learn.save_encoder('lm_encoder')

# Text generation

## Since our model is trained to predict next words in a sequence, we can use it to generate text.

In [None]:
TEXT = 'I really disliked this movie because'
N_WORDS = 50
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)]
print('\n'.join(preds))

# Fine-tuning the text classifier

### Create DataLoaders for text classification, is_lm is False by default and indicates that we're training a classifier. We also pass the language model's vocabulary to be consistent and use the same one for both tasks.

In [None]:
dls_clas = TextDataLoaders.from_df(train_df, text_vocab=dls_lm.vocab, text_col='review', labelcol='sentiment', bs=128, seq_len=80)

### Create the classifier.

In [None]:
learn = text_classifier_learner(dls_clas, AWD_LSTM, drop_mult=0.5, metrics=accuracy).to_fp16()

### Load saved encoder from our language model.

In [None]:
learn = learn.load_encoder('lm_encoder')

### Fine-tune using the same one-cycle approach. This time we will unfreeze several layers at a time, instead of unfreezing the whole model at once. It makes a real difference when training NLP classifiers. We pass a slice object to the lr_max parameter. This indicates that we're using discriminative learning rates. It means that the first value of the slice is the first layer's, the second value is the last layer's learning rate. The learning rates of layers in-between are multiplicatively equidistant throughout that range.

In [None]:
learn.fit_one_cycle(1, lr_max=2e-2)

In [None]:
learn.freeze_to(-2)
learn.fit_one_cycle(1, lr_max=slice(1e-2/(2.6**4), 1e-2))

In [None]:
learn.freeze_to(-3)
learn.fit_one_cycle(1, lr_max=slice(5e-3/(2.6**4), 5e-3))

In [None]:
learn.unfreeze()
learn.fit_one_cycle(2, lr_max=slice(1e-3/(2.6**4), 1e-3))

## We achieved 94% accuracy, which is a great result.