# Twitter US Airlines Sentiment with ULMFiT

## Introduction

This notebook implements training the [AWD-LSTM](https://arxiv.org/pdf/1708.02182.pdf) architecure according to [ULMFiT](https://arxiv.org/pdf/1801.06146.pdf) paper on the [Twitter US Airline Sentiment Dataset](https://www.kaggle.com/crowdflower/twitter-airline-sentiment) using fastai library. The aim of the model is to determine sentiments of the tweets.

First, needed libraries are imported.

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sn

from fastai.text import *
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

%matplotlib inline 

We read the file containing tweets into a DataFrame object and have a look at the first few records.

In [None]:
df = pd.read_csv('../input/Tweets.csv')
df.head()

We see that our DataFrame contains numerous columns that provide information about the corresponding tweet. In our context of training a language model the most important columns are 'airline_sentiment' and 'text'. Let's check if there're no missing values in them.
Next we visualise the counts of tweets by sentiment category.

In [None]:
df[['airline_sentiment', 'text']].isnull().sum()

There're no missing values so we can proceed. Let's visualise counts of tweets by sentiment category.

In [None]:
df['airline_sentiment'].value_counts().plot(kind='bar')

We see that neutral and positive tweets are underepresented compared to negative ones. We will not balance this dataset and evaluate the perfomance of the model on the test set with the same proportion of sentiments.

Next let's investugate the relationship between tweet length and its sentiment.

In [None]:
df['tweet_length'] = df['text'].apply(len)
df.groupby(['tweet_length', 'airline_sentiment']).size().unstack().plot(kind='line')

We see that for negative sentiment distribution is higly skewed towards longer tweets. 

We split our dataset into train and test parts. We don't show the test part to our model until it is trained and use it for evaluation purposes. Then we save them as .csv files for later purporses.

In [None]:
rnd_state = 111
df_train, df_test = train_test_split(df, test_size=0.15, random_state = rnd_state)
df_train[['airline_sentiment', 'text']].to_csv('Tweets_train.csv', index=False, encoding='utf-8')
df_test[['airline_sentiment', 'text']].to_csv('Tweets_test.csv', index=False, encoding='utf-8')



Now we create TextDataBunch objects for language model and classifier and save them so in future iterations we can load them straight away and skip running previous steps. 

We specify 15% our training data for validation purposes so that we can experiment with hypoparameters and adjust them based on perfomance on the validation data.

In [None]:
data_lm = TextLMDataBunch.from_csv('.', 'Tweets_train.csv', valid_pct=0.15)
data_clas = TextClasDataBunch.from_csv('.', 'Tweets_train.csv', valid_pct=0.15, vocab=data_lm.train_ds.vocab, bs=32)
data_lm.save('data_lm_export.pkl')
data_clas.save('data_clas_export.pkl')

Load previosly created data for language model and classifier.

In [None]:
data_lm = load_data('.', 'data_lm_export.pkl')
data_clas = load_data('.', 'data_clas_export.pkl', bs=32)

Now we have a look at what our TextDataBunch objects contain. The texts of tweets has gone through automatic tokenization stage that includes but not limited to:
* Separation according to spaces and punctuation.
* Transforming text to lower case.
* Introducing special tokens that encode spicific information, e.g. indication of the beginning of a text.
After that tokens are numericalised meaning substituting words by their values in the vocabulary. The resulting numerical sequences will serve as inputs for the language model.
We print an example preprocessed text with the encoding it was transformed to.

In [None]:
print("Preprocessed text:", data_lm.x[0])
print("\n")
print("Corresponding numerical sequence:", data_lm.x[0].data)


## 1. Language model

Now we implement our language model. The UlMFit apporach consists of three main steps:
1. Pretraining the language model. We will download the model pretrained on a large corpus of English text. 
2. Fine-tuning the language model. This is necessary to adjust the language model to the specificities of the dataset we are going to work with.
3. Using our language model as an ecoder for the classifier which will infer the sentiment of the tweets.

Now we are ready to create the learner for our model. It will come with pretrained [AWD_LSTM](https://arxiv.org/pdf/1708.02182.pdf) architecure which will serve as our laguage model. 

In [None]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)


Next we explore the space of possible learning rates.

In [None]:
learn.lr_find()
learn.recorder.plot()

Next we fit the last layer of our model for 1 epoch using 1 cycle policy. This approach was described to work best in the original paper. We use the learning rate of one order of magnitude lower than the one corresponding to the lowest loss on the previous plot as recommended. We use values for the cyclic momentum and other parameters as they were found to work well in guidelines of fastai course.

In [None]:
learn.fit_one_cycle(1, 1e-02, moms=(0.8, 0.7))

Next we unfreeze the whole model and fine-tune it for 10 epochs.

In [None]:
learn.unfreeze()
learn.fit_one_cycle(10,1e-03, moms=(0.8, 0.7))

In the last few epochs we see that the accuracy on the validation set stagnates which means that training further will only lead to overfitting.

Next we ask our trained language model to finish a phrase.

In [None]:
learn.predict("My experience was", n_words=10)

We see that our model follows basic rules of grammar. 

Finally we save our trained model to use it as an encoder for the classifier.

In [None]:
learn.save_encoder('ft_enc')

## 2. Classifier

Now it's time to implement the final stage of ULMFit - Classifier. Fot this we create an appropriate learner and import our fine-tuned language model as an encoder.

In [None]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('ft_enc')

Again, let's explore different values for the learning rate.

In [None]:
learn.lr_find()
learn.recorder.plot()

Now we will train our classifier gradually unfreezing layers from the top. Training all layers straight way may result in loss of information achieved through fine-tuning of language model. 

In [None]:
learn.fit_one_cycle(8, 1e-2, moms=(0.8, 0.7))

In [None]:
learn.freeze_to(-2)
learn.fit_one_cycle(5, slice(1e-2/(2.6**4),1e-2), moms=(0.8, 0.7))

In [None]:
learn.unfreeze()
learn.fit_one_cycle(5, slice(5e-3/(2.6**4),5e-3), moms=(0.8, 0.7))

## Evaluating the model

Now we will evalute the perfomance. For this let's first read the previously saved test part of our dataset. Then make a prediction of each tweet.

In [None]:
test_df = pd.read_csv("Tweets_test.csv", encoding="utf-8")
test_df['pred_sentiment'] = test_df['text'].apply(lambda row: str(learn.predict(row)[0]))
test_df['airline_sentiment'].value_counts().plot(kind='bar')

We see that test set contains same proportion of sentiments as train set.
Finally we calculate the accuracy on the test set. 

In [None]:
print("Test Accuracy: ", accuracy_score(test_df['airline_sentiment'], test_df['pred_sentiment']))

Now let's ivestigate mistakes that has been made by our model using confusion matrix.

In [None]:
conf_matrix = confusion_matrix(y_true=test_df['airline_sentiment'].values, y_pred=test_df['pred_sentiment'].values, labels=['negative', 'neutral', 'positive'])
labels = ['negative', 'neutral', 'positive']
sn.heatmap(conf_matrix, annot=True, fmt='g', xticklabels=labels, yticklabels=labels)

We see that our model classified significant amount of neutral tweets as negatives which is reasonable because even as humans it sometimes not clear. What is more interesting is to look at positive tweets that were classified as negatives.

In [None]:
pd.set_option('display.max_colwidth', -1)
test_df.loc[(test_df['airline_sentiment'] == 'positive') & (test_df['pred_sentiment'] == 'negative')].head()

From here we see that these tweets are indeed confusing as often they are responses to other users' tweets.

## Coclusion
In this notebook we explored the application of ULMFiT strategy for detecting sentiments of tweets and achieved level of accuracy similar to human. Improvement in perfomance might be obtained by tweaking the parameters of the model as well as taking into account additional information such as relation between particular tweets.