# SemEval-2018 Task 1: Affect in Tweets (AIT-2018)

## An emotion intensity ordinal classification task

*Joana Ferreira |
joanaferreira0011@gmail.com |
Faculty of Engineering, University of Porto, R. Dr. Roberto Frias, 4200-465 Porto, Portugal *

### Abstract:
This notebook presents a solution for the SemEval-2018 Task 1: Affect in Tweets (AIT-2018). Given a tweet and an emotion (sadness, anger, fear, joy), the model should output an intensity classification (0, 1, 2, 3). To do this, two different approaches were implemented: the first one using a knowleadge based preprocessing the second one using word embeddings (BERT). Finally, several models were used and compared to do the classification.


### 1. Introduction
Given a tweet and an emotion (sadness, anger, fear or joy), the proposed task is to classify it in one of four classes given its intensity: (0:no,
1: low, 2: moderate, 3: high). The task was solved only for the English language. 

This task was proposed in (Mohammad et al.,2018).

### 2. Decription of the dataset
There are four training and test sets of labeled data – one for each emotion. The data creation is described in (Mohammad and Kiritchenko, 2018).

The training and test sets were merged together and then split (20/80) in order to increase the size of the dataset for training.

In [16]:
import pandas as pd

li = []
all_files= {'../datasets/2018-EI-oc-En-anger-dev.txt', '../datasets/EI-oc-En-anger-train.txt'}

for filename in all_files:
    df = pd.read_csv(filename, sep="\t", header=None, skiprows=1)
    li.append(df)

dataset = pd.concat(li, axis=0, ignore_index=True)

dataset.columns = ['date', 'text', 'emotion', 'level']
dataset= dataset.sample(frac=1)
print (dataset)


date                                               text  \
2015  2018-En-01895  Sarah is a complete lunatic. How Chad is still...   
158   2017-En-11404  @Lowetide there is no room for jokes in hockey...   
1495  2017-En-10107  Leave it on there, rule,nimber 1 of carpet cle...   
1884  2018-En-01755  What does Amelia want?! Sarah was v grateful  ...   
1348  2017-En-11047  Threat factors in respect to provocation bulb ...   
...             ...                                                ...   
594   2017-En-10927  Sting is just too damn earnest for early morni...   
1691  2017-En-10397  @iamsrk what's up w the gender bias? #indignan...   
1730  2018-En-01152  @Argos_Online customer service is dreadful, ph...   
674   2017-En-10760  Tiangong 1, China's first space laboratory, wi...   
793   2017-En-10419  Ok but I just got called a 'White Devil' on th...   

     emotion                                        level  
2015   anger      3: high amount of anger can be inferred  
158   

### 3. Approach 
Two alternatives were implemented for preprocessing: a knowleadge based approach (section 3.1.1) and one using word embeddings (BERT) (section 3.1.2). After the preprocessing, both data were run for the same models: Naive Bayes, SVM, Logistic regression, Multi-layer Perceptron classifier, Perceptron, Decision Tree and Random Forest.

#### 3.1.1. Preprocessing using a knowleadge based approach
In this approach, the following processes were implemented:
* **Tokenization**: tokens were created using Python String split() method. Other tokenization methods were tested such as NLTK word_tokenize(), but no significant improvements were observed.
* **Lower case**: tokens were converted to lower case
* **Lemmatization**: using NLTK WordNetLemmatizer
* **User**: All user mentions were substituted with '@user'
* **Emojis**: All emojis were converted to keywords, using the 'emoji' library
* **TF-IDF**: SkLearn's TfidfVectorizer was used to perform TF-IDF
* **Bag of Words (BoW)**: Bag of words was used but eventually substitute with TF-IDF, because of the better overall accuracy


*Skip to section 3.1.2. and do not run the following 3 cells if you want to see the results from using word embeddings (BERT).*


In [None]:
import re
import nltk
from nltk.corpus import stopwords
import emoji
from nltk.stem import WordNetLemmatizer 
  
lemmatizer = WordNetLemmatizer() 

corpus = []
ps = PorterStemmer()

for index, row in dataset.iterrows():
    tweet = row['text']

    #handle users
    tweet = re.sub('@.*', '@user', tweet) 

    
    
    tweet = tweet.tolower().split()
    #tweet = nltk.word_tokenize(tweet)

    # stemming and stop word removal
    tweet = ' '.join([lemmatizer.lemmatize(w) for w in tweet if not w in set(stopwords.words('english'))])
    

    #tweet = nlp(tweet) # run annotation over a sentence
    
    
    #emojis
    tweet = emoji.demojize(tweet)
    
    corpus.append(tweet)

print(corpus)


In [None]:
#TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer=TfidfVectorizer(use_idf=True)
tfidf_vectorizer_vectors=tfidf_vectorizer.fit_transform(corpus)

print(tfidf_vectorizer_vectors)

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np


vectorizer = CountVectorizer(max_features = 1500)
#X = vectorizer.fit_transform(corpus).toarray()
X = tfidf_vectorizer_vectors.toarray()
y = []
for index, row in dataset.iterrows():
    y.append(int(row['level'][0]))
    '''
    if(int(row['level'][0])==3):
        y.append(1) 
    elif(int(row['level'][0])>0):
        y.append(1) 
    else:
        y.append(0)
    '''
    
    
y = np.array(y)
#print(vectorizer.get_feature_names())
#print(type(X[0]), y.shape)

#### 3.1.2. Preprocessing using BERT
In this approach, word embedding was used to preprocess the data. For this, BERT was chosen and the library *bert_embedding* was used because of its simplicity. The model was pre-trained in the following dataset: *book_corpus_wiki_en_cased*.

*Skip to section 3.2 and do not run the following 3 cells if you want to see the results from using the knowleadge based approach described in 3.1.1.*


In [17]:
from bert_embedding import BertEmbedding
import re
import nltk
import emoji
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer 
import mxnet as mx


corpus = []

for index, row in dataset.iterrows():
    tweet = row['text']
    corpus.append(tweet)

bert = BertEmbedding(model='bert_24_1024_16', dataset_name='book_corpus_wiki_en_cased')
results = bert(corpus)
print(results)

at32), array([ 0.61834633,  0.7299718 , -0.39379093, ...,  0.86459017,
       -0.05611638, -0.10752675], dtype=float32), array([-0.0568656 , -1.0629956 , -0.33967328, ..., -0.8862806 ,
        0.0664607 , -0.27232605], dtype=float32), array([-0.36749876, -0.12939635,  0.35977873, ...,  0.1538951 ,
        0.52668357, -0.43332773], dtype=float32), array([-0.14321527,  0.28332147,  0.2589766 , ...,  0.10303332,
        0.4076267 ,  0.8468303 ], dtype=float32), array([-0.5571995 ,  0.05310187, -0.3118394 , ...,  0.13199309,
        0.38863054,  0.04157293], dtype=float32), array([-0.5891458 , -0.52483165, -0.51016456, ..., -0.17108464,
       -0.32084063, -0.16606271], dtype=float32)]), (['@', 'paulsherard', '@', 'margarethaynie', 'wow', 'I', 'just', 'realized', 'how', 'much', 'of', 'a', 'sexist', 'pig'], [array([-0.37625355, -1.0494866 , -0.19031039, ..., -0.6830996 ,
        0.10930118, -0.43802965], dtype=float32), array([-0.36003363,  0.33860287,  0.08261   , ..., -0.85542107,
       

In [18]:
import numpy as np
averaged = []
for sent in results:
    averaged.append(np.mean(sent[1], axis = 0, dtype=np.float64))

corpus=averaged
X=np.array(corpus)

In [19]:
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np


vectorizer = CountVectorizer(max_features = 1500)
y = []
for index, row in dataset.iterrows():
    y.append(int(row['level'][0]))
    '''
    if(int(row['level'][0])==3):
        y.append(1) 
    elif(int(row['level'][0])>0):
        y.append(1) 
    else:
        y.append(0)
    '''
    
    
y = np.array(y)
#print(vectorizer.get_feature_names())
#print(type(X[0]), y.shape)

### 3.2. Classification
After the preprocessing, the data was run for the following models (All from SkLearn). 
* Naive Bayes 
* Support Vector Machine(SVM)
* Logistic regression
* Multi-layer Perceptron classifier
* Perceptron
* Decision Tree
* Random Forest

To evaluate and compare them, Accuracy, Precision, Recall and F1 were measured.


In [20]:
# Split dataset into training and test sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(1671, 1024) (1671,)
(418, 1024) (418,)


In [21]:
# Naive Bayes

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

classifier = GaussianNB()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[56 36 16 19]
 [30 15 25  7]
 [21 31 36 29]
 [12  9 11 65]]
Accuracy:  0.41148325358851673
Precision:  0.4135462119735973
Recall:  0.41148325358851673
F1:  0.40855124453720665


In [22]:
# SVM

from sklearn.svm import SVC

classifier = SVC()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[84  1 35  7]
 [39  1 36  1]
 [41  1 61 14]
 [ 8  0 34 55]]
Accuracy:  0.48086124401913877
Precision:  0.47839620255968635
Recall:  0.48086124401913877
F1:  0.4426862707090956


In [23]:
# Logistic Regression

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression(max_iter=1000)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[73 23 19 12]
 [28 13 32  4]
 [34 25 38 20]
 [ 6  6 21 64]]
Accuracy:  0.44976076555023925
Precision:  0.438254194677727
Recall:  0.44976076555023925
F1:  0.44326921488042503


In [24]:
# SGDC classifier

from sklearn.linear_model import SGDClassifier

classifier = SGDClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[40 23 39 25]
 [15 15 38  9]
 [16 22 45 34]
 [ 3  9 22 63]]
Accuracy:  0.38995215311004783
Precision:  0.40334721063460127
Recall:  0.38995215311004783
F1:  0.3835389287892652


In [25]:
from sklearn.neural_network import MLPClassifier

classifier = MLPClassifier(random_state=1, max_iter=300)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[78 22 17 10]
 [30 20 25  2]
 [35 23 41 18]
 [ 7  8 19 63]]
Accuracy:  0.48325358851674644
Precision:  0.47816980764358763
Recall:  0.48325358851674644
F1:  0.47892663637526395


In [26]:
# Perceptron
from sklearn.linear_model import Perceptron

classifier = Perceptron() 
classifier.fit(X_train, y_train) 
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred)) 
print('Accuracy: ', accuracy_score(y_test, y_pred)) 
print('Precision: ', precision_score(y_test, y_pred, average='weighted')) 
print('Recall: ', recall_score(y_test, y_pred, average='weighted')) 
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[82 24  8 13]
 [41 15 12  9]
 [46 33 16 22]
 [ 9 11 11 66]]
Accuracy:  0.42822966507177035
Precision:  0.4077776743114188
Recall:  0.42822966507177035
F1:  0.40050326637946165


In [31]:
# Decision Tree

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[51 30 32 14]
 [29 20 16 12]
 [30 24 42 21]
 [15 12 25 45]]
Accuracy:  0.37799043062200954
Precision:  0.3825336452170043
Recall:  0.37799043062200954
F1:  0.38003113057949217


In [32]:
# Random Forest

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Precision: ', precision_score(y_test, y_pred, average='weighted'))
print('Recall: ', recall_score(y_test, y_pred, average='weighted'))
print('F1: ', f1_score(y_test, y_pred, average='weighted'))

[[88 10 22  7]
 [41  3 28  5]
 [49  1 55 12]
 [14  4 28 51]]
Accuracy:  0.47129186602870815
Precision:  0.44350508448153875
Recall:  0.47129186602870815
F1:  0.4400367924065352


In [29]:
#Test here with your input

import os
import numpy as np

tweet = input("Enter tweet: ")
tweet = re.sub('[^a-zA-Z]', ' ', tweet).split()
tweet = ' '.join([ps.stem(w) for w in tweet])
X = vectorizer.transform([tweet]).toarray()

print(X.shape)
print(X)

print("Sentiment level: ", classifier.predict(X))

NotFittedError: Vocabulary not fitted or provided

### 4. Experimental evaluation

The following values for the accuracy were obtain as the median of 3 runs for each model.

#### 4.1 Joy

| Models | Naive Bayes | SVM | Logistic Regression | MLPClassifier | Perceptron | Decision tree | Random forest |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Knowleadge | 0.346 | **0.419** | 0.408 | 0.369 | 0.397 | 0.390 | 0.408 |
| BERT | 0.471 | 0.440 | 0.424 | 0.458 | 0.400 | 0.372 | **0.476** |

#### 4.2 Sadness

| Models | Naive Bayes | SVM | Logistic Regression | MLPClassifier | Perceptron | Decision tree | Random forest |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Knowleadge | 0.326 | 0.436 | 0.471 | 0.443 | **0.474** | 0.463 | **0.474** |
| BERT | 0.409 | **0.489** | 0.437 | 0.472 | 0.409 | 0.319 | 0.432 |

#### 4.3 Anger

| Models | Naive Bayes | SVM | Logistic Regression | MLPClassifier | Perceptron | Decision tree | Random forest |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Knowleadge | 0.284 | 0.430 | **0.447** | 0.411 | 0.406 | 0.409 | **0.447** |
| BERT | 0.411 | **0.480** | 0.450 | 0.483 | 0.428 | 0.378 | 0.443 |

#### 4.4 Fear

| Models | Naive Bayes | SVM | Logistic Regression | MLPClassifier | Perceptron | Decision tree | Random forest |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Knowleadge | 0.319 | 0.677 | 0.674 | 0.633 | 0.686 | 0.640 | **0.691** |
| BERT | 0.539 | **0.681** | 0.621 | 0.671 | 0.639 | 0.533 | 0.665 |


In [None]:
### 5. Conclusions



### References

Priban, Pavel & Hercig, Tomáš & Lenc, Ladislav. (2018). UWB at SemEval-2018 Task 1: Emotion Intensity Detection in Tweets. 133-140. 10.18653/v1/S18-1018.