# Poetry Sentiment Analysis: Comparative Text Classification Using Statistical and Embedding-Based Modelss

## Introduction

### Introduction to the Domain-Specific Area



*The first step of the coursework is to identify and describe the problem
or challenge. This is an area of industry or science where text
classification methods can contribute. Include relevant literature to
support the significance of the chosen area.*

Sentiment analysis, a sub-domain of NLP, involves the computational classification of texts based on their emotional connotations and meanings. This project focuses on the multi-class text classification of lines of poetry (taken from the [google-research-datasets/poem-sentiment corpus](https://huggingface.co/datasets/google-research-datasets/poem_sentiment)) into four possible categories: 0 for "negative sentiment" or emotion, 1 for "positive sentiment", 2 for "neutral" or "no impact" and 3 for "mixed" sentiment (both negative and positive sentiment). The project aims to evaluate and compare the performance of classical, statistical approaches to text classification with that of a deep learning model on the same dataset.

The unprecedented and vast availability of text data on the Web and social media since the early 2000s has led to the emergence of sentiment analysis as a highly valuable and impactful research field [2]. As summarized by Liu, a computational analysis of customer reviews has become crucial to developing a competitive edge for many businesses: online reviews, forum posts and comments on social media drive much of the data-driven decision-making for many firms. For example, an analysis of whether customers' social media posts a new product are largely positive or negative can help the company decide whether they should discontinue this line or not. Additionally, studying the polarity of people's opinions published on the Web can be used to predict election results and general social trends, and help political research organizations detect which issues the electorate feels most strongly (positively or negatively) about. While the benefits of sentiment analysis for measuring customer satisfaction or political moods using either reviews on the Web or social media posts have been extensively documented for the last two decades (**insert ref**), there has been relatively less research done in terms of identifying sentiment in literary and poetic texts.

Indeed, the rationale for sentiment analysis for poetic texts is not as immediately apparent as the clear financial advantage obtained by businesses by conducting sentiment analysis of customer reviews. Nonetheless, there are multiple reasons why applying sentiment polarity detection tools to poetry can be of some value. For one, poetic language constitutes a specific challenge for any text classification task, due to the prevalence of figurative language and unconventional uses of words in unexpected contexts known as "cathachresis" [2]. Moreover, the positive or negative connotations of literary text often depend on the author and reader's implicit world knowledge - this awareness of social context is difficult to model computationally. As noted by Kim and Klinger, obvious emotion-bearing words are often avoided in favour of writers striving to show emotion through figures of speech [1].  Consequently, identifying the sentiment of a poem constitutes an interesting and important challenge for the field of NLP to examine how models handle the classification of this kind of complex, indirect language. Metaphorical and figurative expressions are also used (albeit less frequently) in customer reviews and social media posts- thus, being unable to address this kind of complexity might negatively impact the performance of the sentiment analysis tools used for these business-oriented purposes.

Moreover, Kim and Klinger survey the growing fields of research known as the "digital humanities" and "computational literary studies", where computational techniques are used to facilitate the exploration and corroboration of literary scholars' theories[1]. The authors point out that "the stylistic properties of texts can be defined on the basis of their emotional interest", and not merely on their linguistic characteristics. For instance, Ethan Reed's sentiment analysis of American poetry from the 1960s and 1970s was used to explore how feelings of injustice were coded in terms of race and gender (**insert ref**). Reagan et al. (**insert ref**) split novels into equal-length sections and computed happiness scores for each section to try to analyze the narrative patterns in terms of sentiment fluctuations, and to find which patterns had higher download counts. As such, sentiment analysis techniques can shed a new perspective on research topics in literary studies, such as the evolution of the literary expression of emotions across historical periods, the affective qualities differentiating one poet from another, or the automatic analysis of narrative structure (e.g. does the text conclude with a happy ending?).

Sentiment analysis of literary texts can be used in detcting bias towards certain geographic locations or demographic groups over time. Indeed, the authors who compiled and annotated the dataset used here did so in order to counteract the bias of a poetry collaboration tool which generates the "next verse" of a poem in a particular style in response to some user input. Sheng and Uthus point out that the bias inherent in machine-learning and natural language processing applications can propagate and amplify societal bias, and aim to recognize and counter negative associations with certain demographic groups by using data augmentation techniques such as style transfer, rather than merely filtering out these negative descriptions of societal groups.

Consequently, the field of sentiment analysis of poetic language can be seen as valuable in terms of a more rigorous assessment of various models' capabilites to handle figurative language, to open up new perspectives in literary research, and to diagnose stereotypes and biases in literary texts.ds.

## References 

[1] [Evgeny Kim / Roman Klinger: A Survey on Sentiment and Emotion Analysis for Computational Literary Studies. (Zeitschrift für digitale Geisteswissenschaften, Herzog August Bibliothek, 2019. CC BY-SA 4.0. DOI: https://doi.org/10.17175/2019_008. URL: https://zfdg.de/2019_008_v1)](https://arxiv.org/abs/1808.03137)

[2] [Bing Liu: Sentiment Analysis and Opinion Mining. (1st ed., Springer Cham 23 May 2012, XIV, 167 pp. ISBN 978-3-031-01017-0, eBook ISBN 978-3-031-02145-9) Switzerland AG 2012. DOI: https://doi.org/10.1007/978-3-031-02145-9](https://link.springer.com/book/10.1007/978-3-031-02145-9)