# Analyzing airline sentiment from tweets

Even though this dataset is quite rich in metadata, the goal here is to explore how far can we get if we discard all of it, just using the raw tweet text.

We're going to use a standard fastai language model, pretrained on the English Wikipedia, and fine-tuned on the raw tweets with minimal pre-processing.

In [None]:
from pathlib import Path
from fastai.text import *
import pandas as pd

First let's load the data and have a quick look:

In [None]:
csv_path = Path('/kaggle/input/twitter-airline-sentiment')
path = '/kaggle/output'
df = pd.read_csv(csv_path / 'Tweets.csv')
df.head()

## Pre-processing
Keeping it to a minimum, we are only going to strip any twitter handles. We'll leave everything else.

In [None]:
import re

strip_handles = lambda text: re.sub(r'@[^\s]+\s', '', text)
remove_urls = lambda text: re.sub(r'\shttps?:[^\s]+\s?', '', text)

tweet = df.at[42, 'text']
tweet, strip_handles(tweet), remove_urls('visit this site http://www.google.com')

We will turn it into a fastai preprocessor and prefix it to the default spacy-based tokenizing and numericalizing:

In [None]:
class StripWeirdThings(PreProcessor):
    def process_one(self, item: str) -> str:
        return remove_urls(strip_handles(item))
    
processor = [StripWeirdThings(), TokenizeProcessor(), NumericalizeProcessor()]

We're ready to create the databunch for the language model. We'll be reserving 20% of it for cross-validation.

In [None]:
data_lm = (TextList.from_df(df, path, cols=['text'], 
            processor=processor)
           .split_by_rand_pct(0.2, seed=42)
           .label_for_lm()
           .databunch(bs=42, num_workers=1))
data_lm.show_batch()

Let's use a standard AWD_LSTM with pretrained weights from the English Wikipedia. We'll use mixed precision training to speed up the process.

In [None]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3)
learn.lr_find()
learn.recorder.plot(suggestion=True)
min_grad_lr = learn.recorder.min_grad_lr
min_grad_lr

In [None]:
learn.fit_one_cycle(10, 1e-02, moms=(0.8, 0.7))
learn.recorder.plot_losses()
learn.save('lm_head.model')
learn.freeze_to(-2)
learn.fit_one_cycle(10, 1e-3, moms=(0.9, 0.8))
learn.recorder.plot_losses()

In [None]:
learn.save('lm_head_2.model')

Let's unfreeze the backbone and do some more training.

In [None]:
learn.show_results()

It seems that validation loss is starting to go up. Let's stop here, reload the trained head and roll back to 3 epochs of backbone training.

In [None]:
learn.save('lm_head_2.model')
learn.save_encoder('lm_encoder')

## Training a multi-label classifier with the LM encoder

Now that we've trained a language model to predict the next word of a tweet, we'll use the encoder part of the language model to as an input to our classifier.

We'll approach the problem as a multi-label classifier problem, concatenating positive and neutral sentiments with specific negative reasons as labels.

In [None]:
def create_label(sent_and_reason):
    sent = sent_and_reason[0]
    reason = sent_and_reason[1]
    if sent == 'negative':
        return reason
    else:
        return sent

df['label'] = df[['airline_sentiment', 'negativereason']].apply(create_label, axis=1)
df['label'].unique()

We have a total of 12 classes.

In [None]:
data_clas = (TextList.from_df(df, path, cols='text', vocab=data_lm.vocab, processor=processor)
             .split_by_rand_pct(0.2, seed=42)
             .label_from_df(cols='label')
             .databunch(bs=64))

data_clas.save('data_clas_export.pkl')

In [None]:
data_clas.show_batch()

In [None]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.3)
learn.load_encoder('lm_encoder')
learn.lr_find()
learn.recorder.plot(suggestion=True)
min_grad_lr = learn.recorder.min_grad_lr

In [None]:
learn.fit_one_cycle(6, 1e-02)
learn.recorder.plot_losses()

In [None]:
learn.freeze_to(-2)
learn.fit_one_cycle(4, slice(5e-3, 2e-3), moms=(0.8,0.7))
learn.recorder.plot_losses()

In [None]:
learn.unfreeze()
learn.fit_one_cycle(4, slice(2e-3/100, 2e-3), moms=(0.8,0.7))

In [None]:
learn.recorder.plot_losses()

We got around 60% accuracy in a 12-class classification problem. Not bad!

## Predicting and interpreting classification results

In [None]:
def predict(learn, tweet):
    learn.freeze()
    learn = learn.to_fp32()
    interp = TextClassificationInterpretation.from_learner(learn)
    interp.show_intrinsic_attention(tweet)
    return learn.predict(tweet)

In [None]:
predict(learn, df.at[8992, 'text'])

In [None]:
predict(learn, df.at[1000, 'text'])