# CS289 Final Project: Irony Detection in English Tweets

Team: Jayanth, Sudharsan Krishnaswamy, Debleena Sengupta, Shadi Shahsavari

# Abstract

The advent of social media like Twitter and Facebook has led to rise of people using more creative and figurative language use like Irony, Sarcasm, Hyperbole etc to catch the social network’s attention for more likes and retweets. Natural Language Processing Tasks on such social media datasets like Sentiment Analysis, Opinion Mining, Argument Analysis etc struggle to maintain high performance, when applied to Ironic texts. We try to tackle this hard problem of Irony Detection using new advances in Deep Learning technologies. We approach the first tasks in our work based on SemEval 2018 dataset. The first task is to detect if a tweet is “ironic” or not (binary classification of 0 or 1).

# SemEval Dataset Preprocessing and Corpus Analysis with Visual Plots


# Baseline Implementations of traditional ML algorithms

# DNN Model to Capture Linguistic Property of Irony

# Deep Learning Algorithm Implementation and Model Tuning

# Boosting Implementation and Model Tuning

In addition to the deep learning model, we explored a boosting implementation. The idea was to use the same word embedding features that were used in the DNN in teh boosting algorithm to see if we could achieve a letter performance. Below is the implementation of our boosting algorithm.

In [None]:
from xgboost import XGBClassifier
from xgboost import XGBRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
import re
import numpy as np
import os

#preprocess data files that contain word_embedding features.
#May need to change path based on where data is
data_y_fp = "./labels.txt"
directory = "word_embeddings"
MAX_LENGTH = 310
data_x = []
data_y = []

#go through labels file:
with open(data_y_fp) as f:
    for label in f:
        label = int(label)
        data_y.append(label)
data_y = np.array(data_y)

#go through the feature files. Each file has 25 features
files = os.listdir(directory)
os.chdir(directory)
for filename in files:
    example = []
    name = filename.split(".")
    file_num = int(name[0])
    label = data_y[file_num-1]
    with open(filename) as f:
        for line in f:
            tmp = line.split(" ")
            tmp = tmp[:len(tmp)-1]
            tmp = map(float, tmp)
            example = example + tmp
    example = np.array(example)
    if len(example)>MAX_LENGTH:
        example = example[:MAX_LENGTH]
    if len(example)<MAX_LENGTH:
        example = np.pad(example, (0,MAX_LENGTH-len(example)), 'constant')
    example = np.append(example, label)
    data_x.append(example)
data_x = np.array(data_x)
X = data_x[:,0:MAX_LENGTH]
Y = data_x[:,MAX_LENGTH]

seed = 7
test_size = 0.1
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBRegressor(max_depth=3) #gave 56.51%
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))



For the implementation, we decided to use the XGBRegressor from the xgboost library. This is a classifier uses a logistic regression model to perform the binary classification task. Based on the history of the performance of boosting, we hypothesized that with the proper feature selection, boosting would acheive a significntly high score. A variety of feature implementations were tested. We tried different combinations of sentiment score extractions and we tried word embedding. Out of all of the techniques, the word embedding features performed the best, resulting in a 58.33% accuracy, as listed above. We also present the code for the most successful accuracy output. Even though this score was better than random guessing, it did not outperform the DNN approach, as was expected. After careful deliberation, we have come to the conclusion that boosting only works successfully when the proper features are selected (?). In the case of detecting irony, only extracting sentiment scores or word embeddings were not strong enough features to characterize a tweet as ironic or not ironic.

Describe code, describe tuning, describe 10 fold testing

# Result Analysis and Test Data Evaluation Submission on SemEval Dataset