# WARNING
**Please make sure to "COPY AND EDIT NOTEBOOK" to use compatible library dependencies! DO NOT CREATE A NEW NOTEBOOK AND COPY+PASTE THE CODE - this will use latest Kaggle dependencies at the time you do that, and the code will need to be modified to make it work. Also make sure internet connectivity is enabled on your notebook**

Note that this version of the notebook uses fast.ai version 1. For version 2 code, please see https://www.kaggle.com/azunre/tlfornlp-chapter9-ulmfit-adaptation-fast-aiv2

Also note that while fast.ai version 2 documentation is available at https://docs.fast.ai/, the fast.ai version 1 documentation is available at https://fastai1.fast.ai/ 

# Preliminaries
Write requirements to file, anytime you run it, in case you have to go back and recover dependencies. **MOST OF THESE REQUIREMENTS WOULD NOT BE NECESSARY FOR LOCAL INSTALLATION**

Requirements are hosted for each notebook in the companion github repo, and can be pulled down and installed here if needed. Companion github repo is located at https://github.com/azunre/transfer-learning-for-nlp

In [None]:
!pip freeze > kaggle_image_requirements.txt

# Read and Preprocess Fake News Data

The data preprocessing steps are the same as those in sections 4.2/4.4

Read in the "true" and "fake" data

In quotes, because that has the potential to simply replicate the biases of the labeler, so should be carefully evaluated

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Read the data into pandas DataFrames
DataTrue = pd.read_csv("/kaggle/input/fake-and-real-news-dataset/True.csv")
DataFake = pd.read_csv("/kaggle/input/fake-and-real-news-dataset/Fake.csv")

print("Data labeled as True:")
print(DataTrue.head())
print("\n\n\nData labeled as Fake:")
print(DataFake.head())

Assemble the two different kinds of data (1000 samples from each of the two classes)

In [None]:
Nsamp =1000 # number of samples to generate in each class - 'true', 'fake'
DataTrue = DataTrue.sample(Nsamp)
DataFake = DataFake.sample(Nsamp)
raw_data = pd.concat([DataTrue,DataFake], axis=0).values

# combine title, body text and topics into one string per document
#raw_data = [sample[0].lower() + sample[1].lower() + sample[3].lower() for sample in raw_data]

print("Length of combined data is:")
print(len(raw_data))
print("Data represented as numpy array (first 5 samples) is:")
print(raw_data[:5])

# corresponding labels
Categories = ['True','False']
header = ([1]*Nsamp)
header.extend(([0]*Nsamp))

Shuffle data, split into train and test sets...

In [None]:
# function for shuffling data
def unison_shuffle(a, b):
    p = np.random.permutation(len(b))
    data = np.asarray(a)[p]
    header = np.asarray(b)[p]
    return data, header

raw_data, header = unison_shuffle(raw_data, header)

# split into independent 70% training and 30% testing sets
idx = int(0.7*raw_data.shape[0])

# 70% of data for training
train_x = raw_data[:idx]
train_y = header[:idx]
# remaining 30% for testing
test_x = raw_data[idx:]
test_y = header[idx:]

print("train_x/train_y list details, to make sure it is of the right form:")
print(len(train_x))
#print(train_x)
print(train_y[:5])
print(train_y.shape)

# ULMFiT Experiments

Import the fast.ai library, written by the ULMFiT authors

In [None]:
from fastai.text import *

## Data Bunch Class for Language Model/Task Classifier Consumption

We prepare train and test/validation dataframes first.

In [None]:
train_df = pd.DataFrame(data=[train_y,train_x]).T
test_df = pd.DataFrame(data=[test_y,test_x]).T

Check their shape:

In [None]:
train_df.shape
test_df.shape

Data in fast.ai is consumed using the *TextLMDataBunch* class. Construct an instance of this class for language model consumption.

In [None]:
data_lm = TextLMDataBunch.from_df(train_df = train_df, valid_df = test_df, path = "")

Construct an instance of this object for task-specific classifier consumption.

In [None]:
data_clas = TextClasDataBunch.from_df(path = "", train_df = train_df, valid_df = test_df, vocab=data_lm.train_ds.vocab, bs=32)

## Fine-Tune Language Model

In ULMFiT, language models are trained using the *language_model_learner* class. 

We initialize an instance of this class, opting to go with ASGD Weight-Dropped LSTM (AWD_LSTM) model architecture. This is just the usual LSTM with some weights randomly set to 0, analogously to what is done to activations in Dropout layers. More info can be found here - https://docs.fast.ai/text.models.awdlstm

In [None]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)

Note that the initialization of this model also loads weights pretrained on the Wikitext 103 benchmark dataset (The WikiText Long Term Dependency Language Modeling Dataset - https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/). You can see the execution log above for confirmation of this. 

We can find a suggested maximum learning rate using the following commands. Instead of selecting the lowest point on the curve, note that the chosen point is where the curve is changing the fastest.

In [None]:
learn.lr_find() # find best rate
learn.recorder.plot(suggestion=True) # plot it

Fetch the optimal rate as follows.

In [None]:
rate = learn.recorder.min_grad_lr
print(rate)

We fine-tune using slanted trangular learning rates, which are already built into the *fit_one_cycle()* method in fast.ai

In [None]:
learn.fit_one_cycle(1, rate)

### Discriminative Fine-Tuning

The call *learn.unfreeze()* makes all the layers trainable. We can use the *slice()* function to train the last layer at a specified rate, while the layers below will have reducing learning rates. We set the lower bound of the range at two orders of magnitude smaller, i.e., divide the maximum rate by 100.

In [None]:
learn.unfreeze()

In [None]:
learn.fit_one_cycle(1, slice(rate/100,rate))

As you can see, the accuracy slightly increased!

We can use the resulting language model to predict some words in a sequence using the following command (predicts next 10 words)

In [None]:
learn.predict("This is a news article about", n_words=10)

Plausible!

Save the fine-tuned language model!

In [None]:
learn.save_encoder('fine-tuned_language_model')

## Target Task Classifier Fine-tuning

In ULMFiT, target task classifier fine-tuning is carried out using the *text_classifier_learner* class. Recall that the target task here is predicting whether a given article is "fake news" or not.

We instantiate it below, using the same settings as the language model we fine-tuned above, so we can load that fine-tuned model without issues. We also load the fine-tuned language model into the instance below.

In [None]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.3) # use the same settings as the language model we fine-tuned, so we can load without problems
learn.load_encoder('fine-tuned_language_model')

Figure out the learning best rate as before.

In [None]:
learn.lr_find() # find best rate
learn.recorder.plot(suggestion=True) # plot it

In [None]:
rate = learn.recorder.min_grad_lr
print(rate)

Train the fake news classifier

In [None]:
learn.fit_one_cycle(1, rate)

A nearly perfect score is achieved!

### Gradual Unfreezing
The idea is to keep the initial layers of model as untrainable in the beginning, slowly decreasing how many are untrainable as the training process proceeds.

We can use the following command to only unfreeze the last layer:

In [None]:
learn.freeze_to(-1)

We can use the following command to only unfreeze the last two layers

In [None]:
learn.freeze_to(-2)

Thus, gradual unfreezing to a depth=2 would involve doing something like this:

In [None]:
depth = 2
for i in range(1,depth+1): # freeze progressively fewer layers, up to a depth of 2, training for one cycle each time
    learn.freeze_to(-i)
    learn.fit_one_cycle(1, rate)

Looks like we actually achieved the perfect score here! These results speak for themselves!