# US Airlines Sentiment Analysis (ULMFIT)

### Introduction

Objective: Apply a supervised or semi-supervised ULMFiT model to Twitter US Airlines Sentiment

Twitter US Airline Sentiment dataset: https://www.kaggle.com/crowdflower/twitter-airline-sentiment#Tweets.csv 

Ulmfit model Description: http://nlp.fast.ai/classification/2018/05/15/introducing-ulmfit.html.

### Methodology

The outline of the steps taken are as follows: 

1) Data wrangling: In this step, we observe the dataset and make sure there aren't important features missing or missing columns. We also look at the number of labels per class for each airline. We see that the sentiment varies for every airline. Because of this, it is necessary to implement a regular expression to replace the airline name in every tweet. We don't want our machine to use 'airline name' as a means to predict the sentiment.

2) Language Model Fine Tuning: First, we import a pretrained language model. For this project, we used the wikitext103 language model. We then fine tune it a bit for our particular problem by training it on our Twitter dataset. 

3) Classifier Fine Tuning: Currently, we have a language model that is fairly decent at predicting the next word in a sentence. But it cannot yet perform sentiment analysis. So, we train the language model to be a classifier using tweets and associated labels of "positive", "negative", or "neutral" also provided to us in the dataset.

4) Results: In this last step, we analyze the performance of our classifier. It can predict the sentiment of the tweets at an 82.7 percent accuracy which is fairly decent at face value. Tweets can be somewhat ambiguous and things like sarcasm are difficult to account for. The misclassified tweets are observed and it seems that it is diffiuclt to distinguish between positive and neutral tweets. 

In [None]:
# Import necessary libraries 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
from sklearn.model_selection import train_test_split


# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
# import fast.ai libraries for nlp

from fastai import *
from fastai.text import *

In [None]:

import fastai.utils.collect_env

fastai.utils.collect_env.show_install()

In [None]:
# bs=48
# bs=24
bs=192

> ### Data Wrangling

The following cells will process our data into a dataframe. From there, we will select relevant columns for the project and use a regualar expression to filter out the airline names in all of our tweets. 

In [None]:
path = Path('../input/twitter-airline-sentiment/')
file_name = 'Tweets.csv'
path.ls()

In [None]:
file_path = path / file_name
df_airline = pd.read_csv(file_path)
df_airline.head()

In [None]:
df_final = df_airline[['airline_sentiment', 'text']]
pd.set_option('display.max_colwidth',0)
df_final.head()

In [None]:
#check for missing values in data
df_final.isna().sum()

In [None]:
#Data is skewed more towards the negative sentiment
df_final['airline_sentiment'].value_counts()

In [None]:
#Visualize the sentiment for each airline
sns.set(style="darkgrid")
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.countplot(x="airline", hue='airline_sentiment', data=df_airline)
plt.title("Airline Sentiment For Each Airline");

As we see from the plot above, the airline sentiments vary greatly depending on the airline. Virgin America is the most positive while United is the most negative overall. In order for NLP to be effective on the tweets, we need to remove the airline name in each tweet, so we don't accidentally influence our predictive model with them

In [None]:

import re
regex = r"@(VirginAmerica|united|SouthwestAir|Delta|USAirways|AmericanAir)"
def text_replace(text):
    return re.sub(regex, '@airline', text, flags=re.IGNORECASE)

df_final['text'] = df_final['text'].apply(text_replace)

In [None]:
df_final.head(10)

> ### Language Model Fine Tuning

The following cells will process our cleaned dataframe into a databunch. A databunch is a temp file perfectly organized to work in the tracks required by the training method. Then, the language model is downloaded and fine tuned on our tweets dataset. 

In [None]:
#Split the dataframe randomly into train set and valid set. 
#TextLMDataBunch only accepts two separate dataframes for train and valid
train, valid = train_test_split(df_final, test_size=0.1)
moms = (0.8,0.7)
wd = 0.1

In [None]:
data_lm = TextLMDataBunch.from_df(path, train_df = train, valid_df = valid)

In [None]:
data_lm.show_batch()


In [None]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5, model_dir='/tmp/models')
learn.freeze()




In [None]:
#learn.model_dir='/kaggle/working/'

In [None]:
#To Find the proper learning rate, use the x value where the slope is steepest in the -y direction.
learn.lr_find()

In [None]:
learn.recorder.plot()

In [None]:
moms = (0.8,0.7)
wd = 0.1
lr = 1.0E-02
learn.fit_one_cycle(1, lr, moms=moms, wd=wd)

In [None]:
learn.unfreeze()

In [None]:
learn.fit_one_cycle(3, lr, moms=moms, wd=wd)


In [None]:
learn.predict('This flight sucks!', n_words=20)


In [None]:
learn.save_encoder('ft_enc')

> ### Classifier Fine Tuning

We repeat the process of creating a databunch. Then, we train our language model to classify sentiment using the existing data and labels. 

In [None]:
train_valid, test = train_test_split(df_final, test_size=0.1)
train, valid = train_test_split(train_valid, test_size=0.1)


In [None]:
data_clas = TextClasDataBunch.from_df(path,train_df=train, valid_df = valid,test_df = test, vocab=data_lm.train_ds.vocab, 
                                      text_cols='text', label_cols='airline_sentiment', bs=48)



In [None]:
data_clas.show_batch()


In [None]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, model_dir='/tmp/models')
learn.load_encoder('ft_enc')
learn.freeze()

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
#The fast.ai ULMFIT method performs better if we do one epoch at a time for the classifier training
lr = 1.0E-03
learn.fit_one_cycle(1, lr, moms=moms, wd=wd)

In [None]:
learn.freeze_to(-2)
lr /= 2
learn.fit_one_cycle(1, slice(lr/(2.6**4), lr), moms=moms, wd=wd)

In [None]:
learn.freeze_to(-3)
lr /= 2
learn.fit_one_cycle(1, slice(lr/(2.6**4), lr), moms=moms, wd=wd)

In [None]:
learn.unfreeze()
lr /= 5
learn.fit_one_cycle(2, slice(lr/(2.6**4), lr), moms=moms, wd=wd)

In [None]:
learn.predict('this airline sucks!')


> ### Results

Finally, we quantify the performance of our classifier by finding the accuracy. The confusion matrix is produced to help figure out which tweets the classifier is misclassifying. After that, we observe the misclassified tweets to form a hypothesis as to why they're being misclassified. 

In [None]:
interp = TextClassificationInterpretation.from_learner(learn)
acc = accuracy(interp.preds, interp.y_true)
print('Accuracy: {0:.3f}'.format(acc))

In [None]:
interp.plot_confusion_matrix()
plt.title('Classifation Confusion Matrix')

In [None]:
#test_df = df_final
#test_df['pred_sentiment'] = test_df['text'].apply(lambda row: str(learn.predict(row)[0]))
#pred_sent_df = test_df.loc[(test_df['airline_sentiment'] == 'positive') & (test_df['pred_sentiment'] == 'negative')]
#pred_sent_df.head(20)


interp.show_top_losses(20)

The above confusion matrix highlights the number of correctly and incorrectly predicted sentiment. 
We then observe the top losses for insight. We want to examine the incorrect predictions to understand where our
model fell short

> ### Conclusion

We applied the Ulmfit method in order to classify sentiment of tweets corresponding to various airlines. The method performed exceptionally well achieving a ~81 percent accuracy. 

The above table shows the misclassified positive tweets as negative. Although our method is very accurate - moreso than other methods, we still want to understand better why we got these predictions wrong. 

It is still unclear. But some of the tweets do carry negative sentiment aimed at other users instead of the particular airline. In addition, there are other things to consider with informal writing such as sarcasm. Finally, 
there do appear to be tweets that are mislabeled in the dataset. Perhaps the sentiment judgment is a bit subjective. But take for example the tweet in the table above: "@airline # epicfail on connections in # xxmaj chicago today , extremely disappointed w / xxunk customer service , rethinking loyalty üòê." This tweet is labeled neutral in the dataset. However, our model predicted it to be negative. I agree more with the model's prediction for this tweet than the label it came with. 
