In [1]:
# IMPORTS
import pandas as pd
import numpy as np
import spacy
from transformers import RobertaTokenizerFast, RobertaForSequenceClassification, pipeline

### **Process Corpus Text**

In [2]:
# IMPORT NOVEL TEXT
with open("MV_Abrv_Full.txt", "r", encoding="utf-8") as file:
    content = file.readlines()

In [3]:
# SEPARATE SENTENCES
content = [c.rstrip()+" " for c in content]
content = "".join(content)
content = content.split(". ")
content = [c+"." for c in content]

### **Initialize SpaCy and BERT Models**

The SpaCy model is used in this notebook for Named Entity Recognition (NER)

In [4]:
# INITIATE English SpaCy
nlp = spacy.load('en_core_web_sm')

The BERT model is used for Emotion Detection (ED)
* based on the entire sentence sentiment (later referred as "Sentence Emotion")
* not on a deeper understanding of the meaning
* 2 models are explored: AG and BS (abbreviated by their authors' names)

In [5]:
# Emotion Detector
tokenizer = RobertaTokenizerFast.from_pretrained("arpanghoshal/EmoRoBERTa")
model = RobertaForSequenceClassification.from_pretrained("arpanghoshal/EmoRoBERTa", from_tf=True)
emotion = pipeline('sentiment-analysis', model='arpanghoshal/EmoRoBERTa')

All TF 2.0 model weights were used when initializing RobertaForSequenceClassification.

All the weights of RobertaForSequenceClassification were initialized from the TF 2.0 model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForSequenceClassification for predictions without further training.
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at arpanghoshal/EmoRoBERTa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


In [6]:
emotion2 = pipeline("text-classification", model='bhadresh-savani/distilbert-base-uncased-emotion', top_k=None)

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


### **Analysis**

#### **#1:** Identification of Characters and Sentence Emotions

Each sentence is a discrete timestep, where:
* one character is identified at max, if any (no Entity Disambiguation)
* an overall sentence emotion

In [7]:
# Initializing Dataframe
df = pd.DataFrame(columns=["Sentence", "Entities", "Persons", "SynDep", "Emotion_AG", "Emotion_BS"])
df["Sentence"] = content

In [8]:
# Populating Dataframe
for i in df.index:

    sentence_nlp = nlp(df["Sentence"][i])
    entities = sentence_nlp.ents
    entities_dict = {str(t): t.label_ for t in entities}
    indices_entities = [e.start for e in entities if e.label_=="PERSON"]

    df["Entities"][i] = entities_dict
    df["Persons"][i] = [e.lemma_ for e in entities if e.label_=="PERSON"] # Only keep Named Enities who are Persons
    df["SynDep"][i] = [sentence_nlp[ind+1].dep_ \
                       if sentence_nlp[ind].dep_=="compound" \
                        else sentence_nlp[ind].dep_ \
                            for ind in indices_entities ]  
    df["Emotion_AG"][i] = emotion(df["Sentence"][i])[0]['label'] # ED model 1
    df["Emotion_BS"][i] = emotion2(df["Sentence"][i])[0][0]['label'] # ED model 2

In [9]:
df.head(50)

Unnamed: 0,Sentence,Entities,Persons,SynDep,Emotion_AG,Emotion_BS
0,"Shylock, the Jew, lived at Venice: he was an u...","{'Shylock': 'PERSON', 'Jew': 'NORP', 'Venice':...",[Shylock],[nsubj],neutral,joy
1,"Shylock, being a hard-hearted man, exacted the...","{'Shylock': 'PERSON', 'Antonio': 'PERSON', 'Ve...","[Shylock, Antonio, Shylock, Antonio, Antonio]","[nsubj, pobj, nsubj, dobj, appos]",anger,anger
2,Whenever Antonio met Shylock on the Rialto (or...,"{'Antonio': 'PERSON', 'Shylock': 'PERSON', 'Je...","[Antonio, Shylock]","[nsubj, dobj]",neutral,anger
3,"Antonio was the kindest man that lived, the b...","{'Antonio': 'PERSON', 'Roman': 'NORP', 'Italy'...",[Antonio],[nsubj],admiration,anger
4,He was greatly beloved by all his fellow-citiz...,"{'Bassanio': 'PERSON', 'Venetian': 'NORP'}",[Bassanio],[attr],love,joy
5,"Whenever Bassanio wanted money, Antonio assist...","{'Bassanio': 'PERSON', 'Antonio': 'PERSON', 'o...","[Bassanio, Antonio]","[nsubj, nsubj]",neutral,joy
6,"One day Bassanio came to Antonio, and told hi...","{'One day': 'DATE', 'Bassanio': 'GPE', 'Antoni...","[Antonio, Antonio]","[pobj, dobj]",neutral,sadness
7,Antonio had no money by him at that time to l...,"{'Antonio': 'PERSON', 'Shylock': 'PERSON'}","[Antonio, Shylock]","[nsubj, pobj]",neutral,joy
8,Antonio and Bassanio went together to Shylock...,"{'Antonio': 'PERSON', 'Bassanio': 'GPE', 'Shyl...","[Antonio, Shylock, Antonio]","[nsubj, pobj, nsubj]",neutral,anger
9,"On this, Shylock thought within himself: 'If I...","{'Shylock': 'PERSON', 'Jewish': 'NORP'}",[Shylock],[nsubj],neutral,anger


In [10]:
df["Person1"] = [p[0] if p else "" for p in df["Persons"]] # Assign emotion to 1st Named Entity identified (simplification by this baseline notebook model)

The lack of a "neutral" emotion label in the BS model results in an overfitting of non-neutral emotions when none exist

#### **2:** Emotional Timeline of Characters

In [11]:
dfp = df.reset_index().pivot(index=["index"], columns=["Person1"], values="Emotion_AG")
dfp1 = df.reset_index().pivot(index=["index"], columns=["Person1"], values="Emotion_BS")

In [12]:
#dfp = dfp.fillna(method="ffill", axis=0) # Forward fill the detected emotion of a character
dfp = dfp.fillna("") # Replace NaN with ""
dfp = dfp[dfp.columns.drop("")] # Drop non-character sentences
dfp.index.names = ["TimeStep"]

In [13]:
#dfp1 = dfp1.fillna(method="ffill", axis=0) # Forward fill the detected emotion of a character
dfp1 = dfp1.fillna("") # Replace NaN with ""
dfp1 = dfp1[dfp1.columns.drop("")] # Drop non-character sentences
dfp1.index.names = ["TimeStep"]

In [14]:
dfp["Sentence"] = df["Sentence"]
dfp.head()

Person1,Antonio,Bassanio,Daniel,Jew,Nerissa,Portia,Shylock,shylock,Sentence
TimeStep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,,,,,,,neutral,,"Shylock, the Jew, lived at Venice: he was an u..."
1,,,,,,,anger,,"Shylock, being a hard-hearted man, exacted the..."
2,neutral,,,,,,,,Whenever Antonio met Shylock on the Rialto (or...
3,admiration,,,,,,,,"Antonio was the kindest man that lived, the b..."
4,,love,,,,,,,He was greatly beloved by all his fellow-citiz...


In [15]:
dfp1["Sentence"] = df["Sentence"]
dfp1.head()

Person1,Antonio,Bassanio,Daniel,Jew,Nerissa,Portia,Shylock,shylock,Sentence
TimeStep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,,,,,,,joy,,"Shylock, the Jew, lived at Venice: he was an u..."
1,,,,,,,anger,,"Shylock, being a hard-hearted man, exacted the..."
2,anger,,,,,,,,Whenever Antonio met Shylock on the Rialto (or...
3,anger,,,,,,,,"Antonio was the kindest man that lived, the b..."
4,,joy,,,,,,,He was greatly beloved by all his fellow-citiz...


In [16]:
dfp.to_csv("MV_emotional_timeline_AG.csv")
dfp1.to_csv("MV_emotional_timeline_BS.csv")

Potential Improvements:
* Add logic to extract emotion form sentences without a proper noun name
* Add logic to apply emotion to multiple persons if mentioned in same sentence
* Add logic to extract who has which emotion (subject/object)

### **Observations on Shortcomings of Initial Model**

#### **#1:** Long Sentences
* are more likely to containt Multiple NERs
* with Multiple Emotions
* not captured by this initial model

In [24]:
dfp[dfp.index==1]

Person1,Antonio,Bassanio,Daniel,Jew,Nerissa,Portia,Shylock,shylock,Sentence
TimeStep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,,,,,,,anger,,"Shylock, being a hard-hearted man, exacted the..."


In [25]:
print(dfp["Sentence"][1])

Shylock, being a hard-hearted man, exacted the payment of the money he lent with such severity that he was much disliked by all good men, and particularly by Antonio, a young merchant of Venice; and Shylock as much hated Antonio, because he used to lend money to people in distress, and would never take any interest for the money he lent; therefore there was great enmity between this covetous Jew and the generous merchant Antonio.


* Long sentence with many idea, perhaps **good to split at the semi-colon** 
* Shylock has **hatred, not anger** (limitation of chosen model)
* Emotion **assigned only to one Person while multiple persons have emotions** (e.g., Antonio also "dislikes" (hatred) Shylock)
* "all good men" also exhibit a dislike/hatred emotion, unidentified as they are not a Named Entity

#### **#2:** Emotion Assigned to Wrong Entity/Person

In [26]:
dfp[dfp.index==3]

Person1,Antonio,Bassanio,Daniel,Jew,Nerissa,Portia,Shylock,shylock,Sentence
TimeStep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
3,admiration,,,,,,,,"Antonio was the kindest man that lived, the b..."


In [27]:
print(dfp["Sentence"][3])

 Antonio was the kindest man that lived, the best conditioned, and had the most unwearied spirit in doing courtesies; indeed, he was one in whom the ancient Roman honour more appeared than in any that drew breath in Italy.


* Emotion assigned to Antonnio, but this emotion **belongs to the authour** or Venetian People (which is implied, but not explicitly written)

#### **#3:** Emotion is Unassigned due to Lack of Entity Disambiguation in the absence of a Named Entity

In [30]:
dfp[dfp.index==9]

Person1,Antonio,Bassanio,Daniel,Jew,Nerissa,Portia,Shylock,shylock,Sentence
TimeStep,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
9,,,,,,,neutral,,"On this, Shylock thought within himself: 'If I..."


In [31]:
print(dfp["Sentence"][9])

On this, Shylock thought within himself: 'If I can once catch him on the hip, I will feed fat the ancient grudge I bear him; he hates our Jewish nation; he lends out money gratis, and among merchants he rails at me and my well-earned bargains, which he calls interest.


* No emotion (hated) assigned to Antonio (**implied in "He"**) due to the absence of a Named Entity

#### **#4:** Misidentifiation as Named Entities

In [23]:
dfp.columns

Index(['Antonio', 'Bassanio', 'Daniel', 'Jew', 'Nerissa', 'Portia', 'Shylock',
       'shylock', 'Sentence'],
      dtype='object', name='Person1')

* 'again shylock' is just the same as 'Shylock'
* Single occurance of this error, can be fixed manually