# Ironic Corpus - ULMFIT technique

Breifly, this notebook uses data from  [_Ironic Corpus_](http://www.byronwallace.com/static/articles/wallace-irony-acl-2014.pdf). To see a larger introduction and short exploratory data analysis, see [this notebook](https://www.kaggle.com/melissarajaram/ironic-corpus-understanding-the-data). The goal using this corpus is to classify internet comments as either "ironic" or "unironic".

In the original study, the authors used [Support Vector Machines](https://en.wikipedia.org/wiki/Support-vector_machine) (SVM) to classify the ironic and unironic comments. Their results are reported with respect to the F1 score, precision and recall using a five-fold cross-validation. When interpreting these outcome metrics, scores closer to 1 are best.
- average [F1 score](https://en.wikipedia.org/wiki/F1_score): 0.383 (range 0.330 - 0.412)
- average [recall](https://en.wikipedia.org/wiki/Precision_and_recall): 0.496 (range 0.446 - 0.548)
- average [precision](https://en.wikipedia.org/wiki/Precision_and_recall): 0.315 (range 0.261 - 0.380)

The goal of this notebook is to try and duplicate or improve on the results using using more recent techniques that include transfer learning. The first **transfer learning** method applied to Natural Language Processing (NLP) was [Universal Language Model Fine-tuning for Text Classification](https://medium.com/r/?url=https%3A%2F%2Farxiv.org%2Fpdf%2F1801.06146.pdf).(ULMFiT) method. This method involves starting with a pre-trained language model (LM), for example, trained on the Wikitext 103 dataset, and then fine tuning the language model on a new dataset. The fine tuned language model can then be used in a classification task with a different set of data. A video demonstration is in the [fast.ai course](https://course.fast.ai/videos/?lesson=4), incorporating other techniques like discriminate learning rates, gradual model unfreezing, and slanted triangular learning rates. A [text based example](https://docs.fast.ai/text.html) can be found in the fastai docs. In this tutorial, I will use a language model that is pretrained on Wikitext 103, a subset of the IMDB movie reviews dataset, and the Ironic Corpus.

This notebook proceeds in the following sections:
1. Data Loading
1. Create a Language Model to predict IMDB and Ironic Corpus words
1. Train a text classification model to predict the IMDB class
1. Retrain the classification model to detect irony class
1. Compare the results of the ULMFit and SVM classification


# 1. Data Loading

After importing all the python packages we'll need, the Ironic Corpus and IMDB Sample are loaded and combined.

In [None]:
from fastai import *
from fastai.text import *
from fastai.metrics import Precision, Recall, FBeta
import random
import re
random.seed(42) # set the random seed

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


## Ironic Corpus

The csv file contains one column with the comment text, and one column with the label. A label value of _-1_ corresponds to "not ironic", and  label, and a label value of _1_ corresponds to "ironic".

In [None]:
irony_data = pd.read_csv('/kaggle/input/ironic-corpus/irony-labeled.csv')
irony_data.head()

## IMDB database

Here, we're using the sample size of the IMDB movie reviews found in the fastai datasets. If you haven't prevously downloaded this dataset, it is automatically downloaded from tha amazon server. The sample contains 1,000 movie reviews labeled as either _positive_ or _negative_. In addition, an IMDB validate set is already designated in the `is_valid` column. 

In [None]:
imdb_path = untar_data(URLs.IMDB_SAMPLE)
imdb_path.ls()
imdb = pd.read_csv(imdb_path/'texts.csv')

In [None]:
imdb.head()

## Combining the Ironic and IMDB sample datasets

Since the goal is to first predict words by fine tuning the pretrained langauge model on IMDB and Ironic Corpus text, the two datasets are combined into one dataframe. From the resulting columns of the dataframe, we will only use the `text` column. 

In [None]:
combined = imdb.append(irony_data.rename(columns={'comment_text':'text'}),sort=False)
combined.columns

# 2. Create a Language Model to predict IMDB and Ironic Corpus words

Here, we create the data we will use to fine tune the langauge model. It is created as a `databunch`, and saved for later. Fastai does a lot of processing 'under the hood' to tokenize and numericalize the data. 

In [None]:
bs = 48
data_lm = (TextList.from_df(df=combined, cols='text')
            .split_by_rand_pct(0.1)
            .label_for_lm()           
            .databunch(bs=bs))
data_lm.save('data_lm.pkl')

When we look at a batch of this data, we can see that the tokenizer has replaced some of the tokens. For example, xxmaj, xxunk.

In [None]:
data_lm.show_batch()

### Training a language model with combined data

In [None]:
bs=48
path = "."
data_lm = load_data(path, 'data_lm.pkl', bs=bs)

Here, we're creating a language model learner. 

In [None]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)

Find a good learning rate. with the suggestion=True, it gives a heuristic for choosing. 

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

Now, train the model to predict words.

In [None]:
learn.fit_one_cycle(4, 1e-2, moms=(0.8,0.7))

In [None]:
learn.save('fit_head')

In [None]:
learn.load('fit_head');

In [None]:
learn.unfreeze()

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(2, slice(1e-2/2,1e-3), moms=(0.8,0.7))

In [None]:
learn.save('fine_tuned')

In [None]:
learn.load('fine_tuned');

Since this is a language model, we can use it to complete sentences. Since this is a combination of wikipedia, IMDB and Ironic, it might sound somewhat sensationalized.

In [None]:
TEXT = "I think that"
N_WORDS = 25
N_SENTENCES = 2

In [None]:
print("\n".join(learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)))

Remember that this has two parts, an encoder, and a decoder. We've just trained the model to predict words, and we want to now use that encoder with a different 'head' to make it into a text classifier.

In [None]:
learn.save_encoder('fine_tuned_enc')

# 3. Train a text classification model to predict the IMDB class

Now that the language model is trained, we can create a classifier for the ironic sentences

In [None]:
imdb = imdb[['text','label']]

In [None]:
imdb_clas = (TextList.from_df(df=imdb,cols='text',vocab=data_lm.vocab)
             .split_by_rand_pct(.2)
             #split by random 20% 
             .label_from_df(cols='label')
             #label from the csv file
             .databunch(bs=bs))

imdb_clas.save('imdb_clas.pkl')

In [None]:
imdb_clas = load_data(path, 'imdb_clas.pkl', bs=bs)

In [None]:
imdb_clas.show_batch()

Now, instead of using `language_model_learner`, we use the `text_classifier_learner`. We pass the `DataBunch` with the IMDB data, and then load the previously fine tuned encoder.

In [None]:
learn = text_classifier_learner(imdb_clas, AWD_LSTM, drop_mult=0.2)
learn.load_encoder('fine_tuned_enc');

In [None]:
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
learn.fit_one_cycle(4, 1e-3, moms=(0.8,0.7))

In [None]:
learn.save('froze_imdb')

In [None]:
learn.load('froze_imdb');

When fine tuning the entire model, we need to have a smaller batch size. Here, we create another `DataBunch` with a smaller batch size, and then reload the model parameters. 

In [None]:
bs = 24 # was previously 48
imdb_clas = load_data(path, 'imdb_clas.pkl', bs=bs)
learn = text_classifier_learner(imdb_clas, AWD_LSTM, drop_mult=0.5)
learn.load('froze_imdb');

In [None]:
learn.unfreeze()
learn.fit_one_cycle(1, slice(1e-3/(2.6**4),1e-3), moms=(0.8,0.7))

In [None]:
learn.save('unfroze_imdb')

# 4. Retrain the classification model to detect irony class

## Creating Cross Validation Folds

To be able to compare these results with the previously published paper, we need to make a 5 fold cross validation before the training and testing cycles.

In [None]:
from sklearn.model_selection import KFold # import KFold

irony_data.head()
X = irony_data['comment_text']
y = irony_data['label']
kf = KFold(n_splits=5)
kf.get_n_splits(X) # returns the number of splitting iterations in the cross-validator

In [None]:
print(kf)

In [None]:
trains = list()
tests = list()
for train_index, test_index in kf.split(X):
    trains.append(train_index)
    tests.append(test_index)

In [None]:
def create_validation(valnum):

    train = {'comment_text': X[trains[valnum]], 'label': y[trains[valnum]]}
    dftrain = pd.DataFrame(data=train)
    
    valid = {'comment_text': X[tests[valnum]], 'label': y[tests[valnum]]}
    dfvalid = pd.DataFrame(data=valid)
    
    return dftrain, dfvalid

In [None]:
fold1_train, fold1_valid = create_validation(0)
fold2_train, fold2_valid = create_validation(1)
fold3_train, fold3_valid = create_validation(2)
fold4_train, fold4_valid = create_validation(3)
fold5_train, fold5_valid = create_validation(4)

## Looping through the cross validation folds

In [None]:
bs=48
path = "."
data_lm = load_data(path, 'data_lm.pkl', bs=bs)

trains = [fold1_train, fold2_train, fold3_train, fold4_train, fold5_train]
valids = [fold1_valid, fold2_valid, fold3_valid, fold4_valid, fold5_valid]
n_reps = 1
# to hold precision, recall and f1 values across reps
metrics = np.zeros([len(trains),n_reps,3]) 

This is important to account for the class imbalance in the ironic corpus.

In [None]:
weights = [1., 3.]
class_weights=torch.FloatTensor(weights).cuda()

In [None]:
foldx = TextDataBunch.from_df(".",fold1_train,fold1_valid,text_cols=0,label_cols=1,vocab=data_lm.vocab,bs=bs)
learn = text_classifier_learner(foldx, AWD_LSTM, drop_mult=0.2,
                                loss_func = nn.CrossEntropyLoss(weight=class_weights))
learn.load('unfroze_imdb');
learn.lr_find()
learn.recorder.plot(suggestion=True)

In [None]:
for reps in range(n_reps):
    for fold in range(0,len(trains)):
        foldx = TextDataBunch.from_df(".",trains[fold],valids[fold],text_cols=0,label_cols=1,vocab=data_lm.vocab,bs=bs)
        learn = text_classifier_learner(foldx, AWD_LSTM, drop_mult=0.2,metrics=[Precision(),Recall(),FBeta(beta=1)],
                                       loss_func = nn.CrossEntropyLoss(weight=class_weights))
        learn.load('unfroze_imdb');
        learn.fit_one_cycle(1, 1e-2, moms=(0.8,0.7))
        metrics[fold,reps:] = learn.recorder.metrics
    

# Compare the results of the ULMFit and SVM classification

In [None]:
avg_per_fold = np.mean(metrics,axis=1);avg_per_fold

In [None]:
def format_scores(avg_metrics):
    def print_line(name,arr):
        print(name,':',format(np.mean(arr), '.3f'), '(range ', np.min(arr), ' - ',np.max(arr))
    
    print_line('F1 score',avg_metrics[:,2])
    print_line('recall',avg_metrics[:,1])
    print_line('precision',avg_metrics[:,0])
    

In [None]:
format_scores(avg_per_fold)

Scores presented in the paper:
- average [F1 score](https://en.wikipedia.org/wiki/F1_score): 0.383 (range 0.330 - 0.412)
- average [recall](https://en.wikipedia.org/wiki/Precision_and_recall): 0.496 (range 0.446 - 0.548)
- average [precision](https://en.wikipedia.org/wiki/Precision_and_recall): 0.315 (range 0.261 - 0.380)

## Interpretation:

The ULMFit technique is able to get close to the best scores from the paper. 