In this study, done as a part of project work for course CSC2515: Introduction to Machine Learning Fall 2021, we compared several machine learning classifiers on their ability to detect if a given tweet that talks about depression actually shows signs of depression. We fetched tweets from Twitter using a hashtag keyword matching procedure and used a pre-trained BERT sentiment model to separate potential tweets about depression into two classes. Since prior research indicated that including emojis can improve classification performance when working with social media textual data, we used Word2Vec and Emoji2Vec to create embeddings for both text and emojis in the tweets. Our best performing model was a Gaussian kernel support vector machine (SVM) with a test accuracy of around 85% both with and without emojis. Contrary to our expectations, including emojis did not noticeably improve performance which we attribute primarily to our limited dataset.
Emoji2Vec repository: https://github.com/uclnlp/emoji2vec nn10000: Pretrained sentiment model on 10000 tweets