Emotion Classification
===

Multilabel classification class 'E-C' in [SemEval 2018 Task 1](https://competitions.codalab.org/competitions/17751).

We use SpaCy for preprocessing and Vowpal Wabbit for training and evaluating classifiers.

For getting started with Vowpal Wabbit, see here: https://github.com/hal3/vwnlp

In [62]:
%matplotlib inline

from pathlib import Path
import os
import subprocess

import pandas as pd
import numpy as np

from collections import Counter
from tqdm import tqdm

import matplotlib.pyplot as plt
import matplotlib.dates as md
import matplotlib
import pylab as pl
from IPython.core.display import display, HTML

In [2]:
# data is stored relative to the root of the git repository
git_root_dir = !git rev-parse --show-toplevel
git_root_dir = Path(git_root_dir[0].strip())
git_root_dir

PosixPath('/home/levon003/repos/nlp-for-hci-workshop')

In [3]:
ec_dir = git_root_dir / 'data' / 'SemEval2018-Task1' / 'E-c'
train = ec_dir / "2018-E-c-En-train.txt"
valid = ec_dir / "2018-E-c-En-dev.txt"
test = ec_dir / "2018-E-c-En-test-gold.txt"
assert train.exists() and valid.exists() and test.exists()

In [8]:
train_df = pd.read_csv(train, sep='\t')
valid_df = pd.read_csv(valid, sep='\t')
test_df = pd.read_csv(test, sep='\t')
len(train_df), len(valid_df), len(test_df)

(6838, 886, 3259)

In [7]:
train_df.head()

Unnamed: 0,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,2017-En-21441,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1
1,2017-En-31535,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0
2,2017-En-21068,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0
3,2017-En-31436,Accept the challenges so that you can literall...,0,0,0,0,1,0,1,0,0,0,0
4,2017-En-22195,My roommate: it's okay that we can't spell bec...,1,0,1,0,0,0,0,0,0,0,0


### Preprocessing with SpaCy

In [11]:
import spacy
nlp = spacy.load('en')

In [17]:
doc = nlp(train_df.iloc[0].Tweet)
print([token.text for token in doc])

['“', 'Worry', 'is', 'a', 'down', 'payment', 'on', 'a', 'problem', 'you', 'may', 'never', 'have', "'", '.', '\xa0', 'Joyce', 'Meyer', '.', ' ', '#', 'motivation', '#', 'leadership', '#', 'worry']


In [18]:
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
          token.shape_, token.is_alpha, token.is_stop)

“ " PUNCT `` compound “ False False
Worry worry NOUN NN nsubj Xxxxx True False
is be VERB VBZ ROOT xx True True
a a DET DT det x True True
down down ADJ JJ amod xxxx True True
payment payment NOUN NN attr xxxx True False
on on ADP IN prep xx True True
a a DET DT det x True True
problem problem NOUN NN pobj xxxx True False
you -PRON- PRON PRP nsubj xxx True True
may may VERB MD aux xxx True True
never never ADV RB neg xxxx True True
have have VERB VB relcl xxxx True True
' ' PUNCT '' punct ' False False
. . PUNCT . punct . False False
     SPACE     False False
Joyce joyce PROPN NNP compound Xxxxx True False
Meyer meyer PROPN NNP ROOT Xxxxx True False
. . PUNCT . punct . False False
    SPACE     False False
# # NOUN NN nmod # False False
motivation motivation NOUN NN compound xxxx True False
# # NOUN NN compound # False False
leadership leadership NOUN NN compound xxxx True False
# # NOUN NN nsubj # False False
worry worry VERB VBP ROOT xxxx True False


In [22]:
from spacy import displacy
displacy.render(nlp("Lana Yarosh visited Kazakhstan last week."), style='ent', jupyter=True)

In [23]:
displacy.render(nlp("Lana Yarosh visited Kazakhstan last week."), style='dep', jupyter=True)

In [27]:
%%time
# the default english model is relatively small... but using the larger model (850MB+) gives us word vectors
nlp = spacy.load('en_core_web_lg')

CPU times: user 10.7 s, sys: 796 ms, total: 11.5 s
Wall time: 32.3 s


In [31]:
train_df['test'] = False
valid_df['test'] = False
test_df['test'] = True
df = pd.concat([train_df, valid_df, test_df])
len(df)

10983

In [37]:
tokens = []
embedding = []
for doc in tqdm(nlp.pipe(df.Tweet, batch_size=1000, n_threads=3)):
    tokens.append([token.text for token in doc])
    embedding.append(doc.vector)

10983it [11:21, 16.12it/s]


In [41]:
list(zip(tokens[0], embedding[0]))

[('“', -0.15849355),
 ('Worry', 0.28258097),
 ('is', -0.061569724),
 ('a', 0.01659168),
 ('down', 0.007948801),
 ('payment', -0.039756116),
 ('on', 0.11326242),
 ('a', -0.17677411),
 ('problem', 0.012483919),
 ('you', 1.7204615),
 ('may', -0.2550403),
 ('never', -0.011027425),
 ('have', 0.08156482),
 ("'", 0.033338226),
 ('.', -0.20137876),
 ('\xa0', -0.03994038),
 ('Joyce', -0.041898027),
 ('Meyer', 1.0718476),
 ('.', -0.1999955),
 (' ', 0.102360114),
 ('#', 0.014081122),
 ('motivation', 0.060670152),
 ('#', -0.059404768),
 ('leadership', 0.025720209),
 ('#', 0.08948427),
 ('worry', 0.018438809)]

In [38]:
df['tokens'] = tokens
df['embedding'] = embedding

In [45]:
df.reset_index(inplace=True)
df.to_pickle(git_root_dir / 'data' / 'SemEval2018-Task1' / 'E-c' / 'tokenized.pkl')

In [47]:
df = pd.read_pickle(git_root_dir / 'data' / 'SemEval2018-Task1' / 'E-c' / 'tokenized.pkl')
len(df)

10983

In [48]:
train_df = df[~df.test]
test_df = df[df.test]
len(train_df), len(test_df)

(7724, 3259)

In [52]:
train_df.head(n=3)

Unnamed: 0,level_0,index,ID,Tweet,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust,test,tokens,embedding
0,0,0,2017-En-21441,“Worry is a down payment on a problem you may ...,0,1,0,0,0,0,1,0,0,0,1,False,"[“, Worry, is, a, down, payment, on, a, proble...","[-0.15849355, 0.28258097, -0.061569724, 0.0165..."
1,1,1,2017-En-31535,Whatever you decide to do make sure it makes y...,0,0,0,0,1,1,1,0,0,0,0,False,"[Whatever, you, decide, to, do, make, sure, it...","[-0.04233078, 0.2347169, -0.28917667, -0.03889..."
2,2,2,2017-En-21068,@Max_Kellerman it also helps that the majorit...,1,0,1,0,1,0,1,0,0,0,0,False,"[@Max_Kellerman, , it, also, helps, that, the...","[-0.12489866, 0.25080636, -0.047889024, -0.047..."


### Classification with Vowpal Wabbit

In [91]:
EMOTION_LIST = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'love', 'optimism', 'pessimism', 'sadness', 'surprise', 'trust']

# write the dataframe to a file in the vowpal wabbit format
def to_vw_format(df, filename, emotion='anger'):
    with open(filename, 'w') as outfile:
        for i in range(len(df)):
            row = df.iloc[i]
            if emotion == 'all':
                # we will use a cost-sensitive model
                labels = [row[emotion] for emotion in EMOTION_LIST]
                labels = [(label - 1) * -1 for label in labels] 
                label = " ".join([f"{emotion}:{int_label}" for emotion, int_label in zip(EMOTION_LIST, labels)])
            else: # binary classification of a single emotion
                label = "+1" if row[emotion] == 1 else "-1"
            features = " ".join([token.replace(":", "COLON") for token in row.tokens])
            line = f"{label} {row.ID}|T {features}\n"
            outfile.write(line)

#### Binary Classification

In [67]:
%%time
train_filename = ec_dir / 'vw_train_anger.txt'
test_filename = ec_dir / 'vw_test_anger.txt'
to_vw_format(train_df, train_filename, emotion='anger')
to_vw_format(test_df, test_filename, emotion='anger')

CPU times: user 1.97 s, sys: 8 ms, total: 1.98 s
Wall time: 1.98 s


In [68]:
!head -n 3 {train_filename}

-1 2017-En-21441|T “ Worry is a down payment on a problem you may never have ' .   Joyce Meyer .   # motivation # leadership # worry
-1 2017-En-31535|T Whatever you decide to do make sure it makes you # happy .
+1 2017-En-21068|T @Max_Kellerman   it also helps that the majority of NFL coaching is inept . Some of Bill O'Brien 's play calling was wow , ! # GOPATS


In [72]:
!vw -k -c -b 28 --passes 20 --binary {train_filename} -f {ec_dir}/sentiment.model

final_regressor = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/sentiment.model
Num weight bits = 28
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_train_anger.txt.cache
Reading datafile = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_train_anger.txt
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
0.000000 0.000000            1            1.0  -1.0000  -1.0000       26
0.000000 0.000000            2            2.0  -1.0000  -1.0000       14
0.250000 0.500000            4            4.0  -1.0000  -1.0000       22
0.500000 0.750000            8            8.0   1.0000  -1.0000       17
0.437500 0.375000           16           16.0   1.0000   1.0000        9
0.625000 0.812500           32           32.0  -1.0000

In [73]:
!vw --binary -t -i {ec_dir}/sentiment.model -p {test_filename}.pred {test_filename}

only testing
predictions = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_test_anger.txt.pred
Num weight bits = 28
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_test_anger.txt
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0   1.0000  -1.0000       20
0.500000 0.000000            2            2.0  -1.0000  -1.0000       24
0.250000 0.000000            4            4.0  -1.0000  -1.0000       12
0.250000 0.250000            8            8.0  -1.0000  -1.0000       16
0.250000 0.250000           16           16.0  -1.0000  -1.0000       21
0.187500 0.125000           32           32.0  -1.0000  -1.0000        4
0.187500 0.187500           64           64.0  -1.0000  -1.0000       13


In [78]:
test_preds_filename = str(test_filename) + ".pred"
with open(test_preds_filename, 'r') as infile:
    lines = infile.readlines()
    preds_raw = [line.split()[0] for line in lines]
    preds = [0 if pred == '-1' else 1 for pred in preds_raw]
len(preds), preds[:5]

(3259, [0, 0, 1, 0, 0])

In [79]:
assert len(preds) == len(test_df)
test_df['pred_anger'] = preds

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [80]:
# Compute simple accuracy
np.sum(test_df.pred_anger == test_df.anger) / len(test_df)

0.7898128260202516

In [88]:
import sklearn.metrics
print(sklearn.metrics.classification_report(test_df.anger, test_df.pred_anger, 
                                            target_names=["anger not present", "anger present"]))

                   precision    recall  f1-score   support

anger not present       0.83      0.86      0.84      2158
    anger present       0.70      0.65      0.68      1101

        micro avg       0.79      0.79      0.79      3259
        macro avg       0.77      0.76      0.76      3259
     weighted avg       0.79      0.79      0.79      3259



#### Multiclass

Using a cost-sensitive one-against-all model.

In [92]:
%%time
train_filename = ec_dir / 'vw_train_all.txt'
test_filename = ec_dir / 'vw_test_all.txt'
to_vw_format(train_df, train_filename, emotion='all')
to_vw_format(test_df, test_filename, emotion='all')

CPU times: user 2.78 s, sys: 12 ms, total: 2.79 s
Wall time: 2.79 s


In [93]:
!head -n 3 {train_filename}

anger:1 anticipation:0 disgust:1 fear:1 joy:1 love:1 optimism:0 pessimism:1 sadness:1 surprise:1 trust:0 2017-En-21441|T “ Worry is a down payment on a problem you may never have ' .   Joyce Meyer .   # motivation # leadership # worry
anger:1 anticipation:1 disgust:1 fear:1 joy:0 love:0 optimism:0 pessimism:1 sadness:1 surprise:1 trust:1 2017-En-31535|T Whatever you decide to do make sure it makes you # happy .
anger:0 anticipation:1 disgust:0 fear:1 joy:0 love:1 optimism:0 pessimism:1 sadness:1 surprise:1 trust:1 2017-En-21068|T @Max_Kellerman   it also helps that the majority of NFL coaching is inept . Some of Bill O'Brien 's play calling was wow , ! # GOPATS


In [94]:
num_classes = len(EMOTION_LIST)
named_labels = ",".join(EMOTION_LIST)
num_classes

11

In [96]:
!vw -k -c -b 28 --csoaa {num_classes} -d {train_filename} -f {ec_dir}/sentiment_all.model --passes 20 --named_labels {named_labels} --ngram 2 --skips 1
!vw -t -i {ec_dir}/sentiment_all.model -d {test_filename} -p {test_filename}.pred -r {test_filename}.pred.raw

Generating 2-grams for all namespaces.
Generating 1-skips for all namespaces.
parsed 11 named labels
final_regressor = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/sentiment_all.model
Num weight bits = 28
learning rate = 0.5
initial_t = 0
power_t = 0.5
decay_learning_rate = 1
creating cache_file = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_train_all.txt.cache
Reading datafile = /home/levon003/repos/nlp-for-hci-workshop/data/SemEval2018-Task1/E-c/vw_train_all.txt
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
1.000000 1.000000            1            1.0    known    anger       73
1.000000 1.000000            2            2.0    known anticipation       37
0.500000 0.000000            4            4.0    known optimism       61
0.750000 1.000000            8            8.0    known optimism       46
0.625000 0.500000     

In [99]:
test_preds_filename = str(test_filename) + ".pred"
test_raw_preds_filename = str(test_filename) + ".pred.raw"
!head -n 3 {test_preds_filename}
!echo
!head -n 3 {test_raw_preds_filename}

optimism 2018-En-01559
optimism 2018-En-03739
anger 2018-En-00385

1:0.829435 2:0.675799 3:0.679984 4:0.710117 5:0.832285 6:0.964882 7:0.62584 8:0.941957 9:0.772772 10:1.12944 11:0.896213 2018-En-01559
1:0.789542 2:0.548801 3:0.729407 4:0.884375 5:0.505027 6:0.871636 7:0.485595 8:0.803483 9:0.612237 10:1.00483 11:0.870229 2018-En-03739
1:0.372602 2:0.664358 3:0.55658 4:0.616096 5:0.502329 6:0.769712 7:0.644689 8:0.571755 9:0.54561 10:0.772304 11:0.808059 2018-En-00385


In [107]:
with open(test_raw_preds_filename, 'r') as infile:
    lines = infile.readlines()
predictions = []
for line in lines:
    if line.strip() == "":
        continue
    tokens = line.split()[:-1]
    assert len(tokens) == num_classes
    prediction = {}
    for i, token in enumerate(tokens):
        raw_pred = float(token.split(":")[1])
        prediction[EMOTION_LIST[i]] = raw_pred
    predictions.append(prediction)
assert len(predictions) == len(test_df)
len(predictions)

3259

In [108]:
test_preds = pd.DataFrame(predictions)
test_preds.head(n=2)

Unnamed: 0,anger,anticipation,disgust,fear,joy,love,optimism,pessimism,sadness,surprise,trust
0,0.829435,0.675799,0.679984,0.710117,0.832285,0.964882,0.62584,0.941957,0.772772,1.12944,0.896213
1,0.789542,0.548801,0.729407,0.884375,0.505027,0.871636,0.485595,0.803483,0.612237,1.00483,0.870229


In [106]:
# identify the ground truth
n_samples = len(test_df)
n_classes = num_classes
y_true = test_df.loc[:, EMOTION_LIST].values
y_true.shape

(3259, 11)

In [110]:
y_score_raw = test_preds.loc[:, EMOTION_LIST].values
y_score_1 = 1 - np.clip(y_score_raw, 0, 1)
y_score_2 = 1 - ((y_score_raw - y_score_raw.min(0)) / y_score_raw.ptp(0))
print(y_score_raw.shape, y_score_1.shape, y_score_2.shape)
assert y_score_1.shape == y_score_2.shape
y_score = y_score_1
assert np.isfinite(y_score).all()
print(sklearn.metrics.roc_auc_score(y_true, y_score_1), sklearn.metrics.roc_auc_score(y_true, y_score_2))
y_score.shape

(3259, 11) (3259, 11) (3259, 11)
0.7416444989753931 0.7427934072169889


(3259, 11)

In [124]:
y_pred = (y_score > 0.5).astype(int)
print(f"{'Emotion':15} # Predicted Present")
print("="*40)
for i, total_predicted in enumerate(np.sum(y_pred, axis=0)):
    print(f"{EMOTION_LIST[i]:15} {total_predicted:4}")

Emotion         # Predicted Present
anger            946
anticipation     124
disgust         1090
fear             342
joy             1175
love             192
optimism         794
pessimism        108
sadness          681
surprise          19
trust             20


In [126]:
print(sklearn.metrics.classification_report(y_true, y_pred, target_names=EMOTION_LIST))

              precision    recall  f1-score   support

       anger       0.69      0.59      0.64      1101
anticipation       0.28      0.08      0.13       425
     disgust       0.60      0.59      0.60      1099
        fear       0.65      0.46      0.54       485
         joy       0.81      0.66      0.73      1442
        love       0.70      0.26      0.38       516
    optimism       0.67      0.47      0.55      1143
   pessimism       0.41      0.12      0.18       375
     sadness       0.67      0.47      0.55       960
    surprise       0.42      0.05      0.08       170
       trust       0.25      0.03      0.06       153

   micro avg       0.67      0.47      0.55      7869
   macro avg       0.56      0.34      0.40      7869
weighted avg       0.64      0.47      0.53      7869
 samples avg       0.60      0.49      0.51      7869



In [127]:
#Strict accuracy
sklearn.metrics.accuracy_score(y_true, y_pred)

0.17213869285056765

Seems like a pretty hard task!

The winning team (from the National Technical University of Athens) achieved accuracy, micro F1, and macro F1 of 0.588, 0.701, and 0.528 respectively.  We achieved 0.17, 0.55, and 0.4!

See also [all results](https://docs.google.com/spreadsheets/d/1yzyBC7uf4Di38QooN7QbUGqsokK51FShUXCfUJFZmLs/edit#gid=1360997704) for this task.

The winning team also made their code available: https://github.com/cbaziotis/ntua-slp-semeval2018

Their approach is described in [this](https://arxiv.org/pdf/1804.06658.pdf) technical paper.