# CS289 Final Project: Irony Detection in English Tweets

Team: Jayanth, Sudharsan Krishnaswamy, Debleena Sengupta, Shadi Shahsavari

# Abstract

The advent of social media like Twitter and Facebook has led to rise of people using more creative and figurative language use like Irony, Sarcasm, Hyperbole etc to catch the social network’s attention for more likes and retweets. Natural Language Processing Tasks on such social media datasets like Sentiment Analysis, Opinion Mining, Argument Analysis etc struggle to maintain high performance, when applied to Ironic texts. We try to tackle this hard problem of Irony Detection using new advances in Deep Learning technologies. We approach the first tasks in our work based on SemEval 2018 dataset. The first task is to detect if a tweet is “ironic” or not (binary classification of 0 or 1).

# SemEval Dataset Preprocessing and Corpus Analysis


The dataset was processed to ambiguate the urls as URL and the usernames as USER using regular expression. Also, the emojis were converted to their unicode equivalent aliases as defined by the [unicode consortium](http://www.unicode.org/emoji/charts/full-emoji-list.html).

In [37]:
import re,emoji
sent = u'@SincerelyTumblr: One day I want to travel with my bestfriend 🌏 ✈️ http://t.co/AXD3Ax5qC1 DONE DID TRAVELED DA WORLD!! @Bethanycsmithh 🖤 '
sent = emoji.demojize(sent)
sent = re.sub(r'https?:\/\/[^ \n\t\r]*', 'URL', sent)
sent = re.sub(r'@[a-zA-Z0-9_]+', 'USR', sent);
print(sent)

USR: One day I want to travel with my bestfriend :globe_showing_Asia-Australia: :airplane:️ URL DONE DID TRAVELED DA WORLD!! USR :black_heart: 


Collocations are expressions of multiple words which commonly co-occur. They give important insight into the common patterns in both classes. A number of measures are available in NLTK to score collocations or other associations. The collocations in the corpus were then, scored based on those metrices and ranked.

In [None]:
tokenizer = TweetTokenizer(preserve_case=False, reduce_len=True, strip_handles=True).tokenize
tokens = tokenizer('\n'.join(corpus))
finder = BigramCollocationFinder.from_words(tokens)
bigram_measures = BigramAssocMeasures()
scored = finder.score_ngrams(bigram_measures.chi_sq)
sorted(bigram for bigram, score in scored)
map(lambda x: print(' '.join(x[0]), x[1], "\n"), scored[:10])

## Collocation Scoring

| Raw Freq | Chi_Sq/Dice/Jaccard/Phi_Sq/pmi     | Likelihood Ratio          |
|----------|------------------------------------|---------------------------|
| ! !      | #034i #100                         | ! !                       |
| in the   | #100 #glitter                      | i love                    |
| . i      | #1stphoto #rektek                  | I I                       |
| i love   | #2003 2003                         | going to                  |
| I I      | #2015isthenewturnup #myboos        | to be                     |
| to be    | #2015season #2014sucks             | face_with_tears_of_joy :: |
| of the   | #2am http://t.co/49XwyrlADo        | in the                    |
| . I      | #2n1edition http://t.co/oRT6ZYfGhx | can't wait                |
| for the  | #2o14 #bestie                      | :: face_with_tears_of_joy |
| to the   | #2of6 #6daystretch                 | : ️                        |


# Baseline Implementations of traditional ML algorithms

In [10]:
def featurize(corpus):
    tokenizer = TweetTokenizer(preserve_case=False, reduce_len=True, strip_handles=True).tokenize
    vectorizer = TfidfVectorizer(strip_accents="unicode", analyzer="word", tokenizer=tokenizer, stop_words="english")
    X = vectorizer.fit_transform(corpus)
    return X

In [1]:
def most_informative_feature_for_binary_classification(vectorizer, classifier, n=10):
    class_labels = classifier.classes_
    feature_names = vectorizer.get_feature_names()
    topn_class1 = sorted(zip(classifier.coef_[0], feature_names))[:n]
    topn_class2 = sorted(zip(classifier.coef_[0], feature_names))[-n:]

    for coef, feat in topn_class1:
        print (class_labels[0], coef, feat)
    print ()
    for coef, feat in reversed(topn_class2):
        print (class_labels[1], coef, feat)

In [None]:
    K_FOLDS = 10 # 10-fold crossvalidation
    CLF = LinearSVC() # the default, non-parameter optimized linear-kernel SVM
    corpus, y = parse_dataset(DATASET_FP) # Loading dataset and featurised simple Tfidf-BoW model
    X = featurize(corpus)

    class_counts = np.asarray(np.unique(y, return_counts=True)).T.tolist()
    
    # Returns an array of the same size as 'y' where each entry is a prediction obtained by cross validated
    predicted = cross_val_predict(CLF, X, y, cv=K_FOLDS)
    
    # Modify F1-score calculation depending on the task
    if TASK.lower() == 'a':
        score = metrics.f1_score(y, predicted, pos_label=1)
    elif TASK.lower() == 'b':
        score = metrics.f1_score(y, predicted, average="macro")
    print ("F1-score Task", TASK, score)
    for p in predicted:
        PREDICTIONSFILE.write("{}\n".format(p))
    PREDICTIONSFILE.close()

# DNN Model to Capture Linguistic Property of Irony

# Deep Learning Algorithm Implementation and Model Tuning

# Boosting Implementation and Model Tuning

In addition to the deep learning model, we explored a boosting implementation. The idea was to use the same word embedding features that were used in the DNN in teh boosting algorithm to see if we could achieve a letter performance. Below is the implementation of our boosting algorithm.

In [13]:
from xgboost import XGBClassifier
from xgboost import XGBRegressor
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
import re
import numpy as np
import os

#preprocess data files that contain word_embedding features.
#May need to change path based on where data is
data_y_fp = "./labels.txt"
directory = "word_embeddings"
MAX_LENGTH = 310
data_x = []
data_y = []

#go through labels file:
with open(data_y_fp) as f:
    for label in f:
        label = int(label)
        data_y.append(label)
data_y = np.array(data_y)

#go through the feature files. Each file has 25 features
files = os.listdir(directory)
os.chdir(directory)
for filename in files:
    example = []
    name = filename.split(".")
    file_num = int(name[0])
    label = data_y[file_num-1]
    with open(filename) as f:
        for line in f:
            tmp = line.split(" ")
            tmp = tmp[:len(tmp)-1]
            tmp = map(float, tmp)
            example = example + tmp
    example = np.array(example)
    if len(example)>MAX_LENGTH:
        example = example[:MAX_LENGTH]
    if len(example)<MAX_LENGTH:
        example = np.pad(example, (0,MAX_LENGTH-len(example)), 'constant')
    example = np.append(example, label)
    data_x.append(example)
data_x = np.array(data_x)
X = data_x[:,0:MAX_LENGTH]
Y = data_x[:,MAX_LENGTH]

seed = 7
test_size = 0.1
X_train, X_test, y_train, y_test = train_test_split(X,Y, test_size=test_size, random_state=seed)
# fit model no training data
model = XGBRegressor(max_depth=3) #gave 56.51%
model.fit(X_train, y_train)
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))



ImportError: No module named xgboost

For the implementation, we decided to use the XGBRegressor from the xgboost library. This is a classifier uses a logistic regression model to perform the binary classification task. Based on the history of the performance of boosting, we hypothesized that with the proper feature selection, boosting would acheive a significntly high score. A variety of feature implementations were tested. We tried different combinations of sentiment score extractions and we tried word embedding. Out of all of the techniques, the word embedding features performed the best, resulting in a 58.33% accuracy, as listed above. We also present the code for the most successful accuracy output. Even though this score was better than random guessing, it did not outperform the DNN approach, as was expected. After careful deliberation, we have come to the conclusion that boosting only works successfully when the proper features are selected (?). In the case of detecting irony, only extracting sentiment scores or word embeddings were not strong enough features to characterize a tweet as ironic or not ironic.

Describe code, describe tuning, describe 10 fold testing

# Result Analysis and Test Data Evaluation Submission on SemEval Dataset