# Sentiment-based graph of speakers

In this notebook, we try to implement one of our ideas from Milestone 1 - designing a weighted, directed graph where each node represents an individual (a speaker from the Quotebank dataset), and an edge between two nodes depends on the sentiment with which they talk about each other in news articles. For example, if Donald Trump and Joe Biden are nodes in the graph, than the Trump -> Biden edge will depend on the sentiment in Trump's mentions of Biden, and vice-versa. In the end, this idea didn't end up being as feasible as we hoped... the main villain being sentiment analysis... Anyways, since we still hope to utilize sentiment analysis at some point, we plan to figure some ways of using it for Milestone 3.

## Installing dependencies and importing packages

In [20]:
!pip install aspect_based_sentiment_analysis

In [2]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 
import json
import os
import bz2
import itertools 

import nltk 
nltk.download('vader_lexicon')

from nltk.sentiment import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()

import aspect_based_sentiment_analysis as absa
nlp = absa.load()

from google.colab import drive
drive.mount('/content/drive')


[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Some layers from the model checkpoint at absa/classifier-rest-0.2 were not used when initializing BertABSClassifier: ['dropout_379']
- This IS expected if you are initializing BertABSClassifier from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertABSClassifier from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of BertABSClassifier were not initialized from the model checkpoint at absa/classifier-rest-0.2 and are newly initialized: ['dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Inspecting sentiment analysis models

Let's first try out a few pre-trained models for sentiment analysis that can be used for this task and see how they behave. We'll do this on a made-up quote, that is used only to better illustrate model behaviours and justify our logic. If we imagine that this is a quote by Donald Trump, and we're computing the edge weight for the Trump-Hillary relation, we can clearly see that the sentiment should be pretty negative.  

In [3]:
sample_quote = 'I love myself because I am pretty but I really hate Hillary. '
sample_quote += 'My brother is a very smart man.'

### NLTK Sentiment Intensity Analyzer 

Basic sentiment analysis that takes into account the whole given sentence and computes the overall sentiment.

In [4]:
ABBREVIATIONS = {'neu': 'neutral', 'neg': 'negative', 'pos': 'positive'}

# Compute sentiment polarity scores using the NLTK model
nltk_scores = sia.polarity_scores(sample_quote)

for abbr, sentiment_str in ABBREVIATIONS.items():
  print(f'{sentiment_str}: {nltk_scores[abbr]}') 

neutral: 0.437
negative: 0.218
positive: 0.345


### Aspect Based Sentiment Analysis

A modified version of the sentiment analysis model that also takes as input the 'aspects', or target words on which we would like to focus our analysis (in this case 'Hillary').

In [5]:
SENTIMENT_INDEXING = {'neutral': 0, 'negative': 1, 'positive': 2}
ASPECTS = ['Hillary']

# Compute sentiment scores using the ABSA model, 
# taking 'Hillary' as the aspect word
task = nlp(text=(sample_quote), aspects=ASPECTS)
absa_scores = task.examples[0].scores

for sentiment_str, ind in SENTIMENT_INDEXING.items():
  print(f'{sentiment_str}: {absa_scores[ind]:.3f}')

neutral: 0.002
negative: 0.991
positive: 0.007


The previous example successfully showes the reason that we decided to opt for aspect-based sentiment analysis. Namely, as we could expect, a given quote doesn't necessarily contain mentions of only one person, nor does it have to contain one 'type' of sentiment. Given our desired use-case, where we'd like to assess the sentiment of the speaker towards another person (object), basic sentiment analysis over the whole sentence doesn't make much sense. So, in the made-up quote above, we give an example of how the overall sentiment of a sentence can be mostly positive (containing a couple of positive thoughts towards some topics), but still be pretty negative towards our topic (person) of interest! That is where the aspect-based sentiment analysis really excels compared to the basic approach.

### Problem with ABSA

Sadly, the model that showed more promise, came out as very bad in scaling up to the number of mentions, to the point where it is impossible to use it. That is demonstrated using the same made-up quote, just passed through the model $N=20$ times. We measure the execution times of the model from NLTK and the ABSA model.


In [6]:
import time 
N_ITERS = 20

# ABSA
# Start time measurement
start_time = time.time()

for iter in range(N_ITERS):
  task = nlp(text=(sample_quote), aspects=ASPECTS)

# End time measurement and compute elapsed time
end_time = time.time()
elapsed = np.round(end_time - start_time, 2)
print(f'ABSA model execution time for {N_ITERS} iterations: {elapsed} s')


ABSA model execution time for 20 iterations: 17.71 s


In [7]:
N_ITERS = 20

# NLTK
# Start time measurement
start_time = time.time()

for iter in range(N_ITERS):
  scores = sia.polarity_scores(sample_quote)

# End time measurement and compute elapsed time
end_time = time.time()
elapsed = np.round(end_time - start_time, 2)
print(f'NLTK model execution time for {N_ITERS} iterations: {elapsed} s')


NLTK model execution time for 20 iterations: 0.01 s


After showing this, it is needless to say that we are stepping away from the ABSA model ( 🐌 ), but with some hopes of maybe finding an alternative for Milestone 3. In the meantime, let's try what the basic sentiment analyzer form NLTK gives us.


## Constructing the person-person graph 

Let's now construct the aforementioned graph of individuals and see what the edges will look like based on sentiment analysis between speakers. The graph contains just 3 nodes - Donald Trump, Hillary Clinton, and Barack Obama. As shown later, the year we are focusing on in this example is 2016, which was the year when Donald Trump became president, running against Clinton.

In [8]:
# Define the nodes as a dictionary of format {QID: name}
nodes_qids = {
    'Q22686': 'Donald Trump', 
    'Q6294': 'Hillary Clinton', 
    'Q6279': 'Barack Obama',
    }
nodes_qids_set = set(nodes_qids.keys())

# Initialize the node-quote dictionary
quotes_per_node = {k: [] for k in nodes_qids}


In [21]:
# Take the quotes of interest (assigned to the individuals defined as nodes)
SAMPLES_TO_PROCESS = 3e6
year = 2016
path_to_file = f'/content/drive/MyDrive/Quotebank_limunADA/quotes-no-nones-{year}.json.bz2' 

print(f'\nExtracting quotes per node for year {year}\n')

with bz2.open(path_to_file, 'rb') as s_file:
  for instance_cnt, instance in enumerate(s_file):
    # Loading a sample
    instance = json.loads(instance) 
    
    # Take the intersection of the set of QIDs that we defined as nodes, 
    # and the speaker QIDs of the current quote. If there is an intersection,
    # take its first element (there should only be one anyways)
    qids_intersect = nodes_qids_set.intersection(set(instance['qids']))
    if len(qids_intersect) > 0:
      curr_qid = qids_intersect.pop()
      quotes_per_node[curr_qid].append(instance['quotation'])

    if instance_cnt % 100000 == 0:
      print(f'Instance {instance_cnt}')
    
    if instance_cnt == int(SAMPLES_TO_PROCESS):
      break

### Computing graph edges

In order to compute the edge weights between certain nodes, we need to define the ways of quantifying the sentiments between them. We assign weights to each of the types of sentiments and aggregate that into a final score.

In [11]:
SENTIMENT_KEYS_NLTK = {'neutral': 'neu', 'negative': 'neg', 'positive': 'pos'}
SENTIMENT_INDEXING = {'neutral': 0, 'negative': 1, 'positive': 2}
SENTIMENT_WEIGHTS = {'neutral': 0, 'negative': -2, 'positive': 1}

def get_sentiment_score_nltk(text, nlp_model):
  """
  Computes the sentiment score for the given text, weighting each of the 
  sentiment types by a predefined weight.
  """
  nltk_scores = nlp_model.polarity_scores(text)
  score = 0
  for sentiment_str, nltk_key in SENTIMENT_KEYS_NLTK.items():
    score += nltk_scores[nltk_key] * SENTIMENT_WEIGHTS[sentiment_str] 

  return score


def get_sentiment_score_absa(text, aspect, nlp_model):
  task = nlp(text=(text), aspects=[aspect])
  absa_scores = task.examples[0].scores

  score = 0
  for sentiment_str, ind in SENTIMENT_INDEXING.items():
    score += scores[ind] * SENTIMENT_WEIGHTS[sentiment_str]

  return score



Now that we have defined the score functions, we need to extract the quotes for each of the node-node pairs depending on their mentions. For example, when computing the edge weight between nodes $i$ and $j$ (in that direction), we take into account all the nodes where the $i$ is recognized as the speaker, and in which node $j$ is mentioned (either by first or second name). The features that we collect are the mean sentiment score in all the quotes between $i$ and $j$, as well as their count.


In [18]:
def get_mentions_from_list(quotes_list, look_for):
  """
  Finds all the quotes that contain each of the words in the look_for list.
  It returns a list of tuples, each of format (quote_text, target_word).
  If a quote contains more than 1 word from the look_for list, only the first
  one that was found is taken into account.
  """

  # List of tuples of format (target_word, quote)
  mentions = []
  for quote in quotes_list:
    for target_word in look_for:
      if target_word.lower() in quote.lower():
        mentions.append((quote, target_word))
        break

  return mentions


def compute_edge_features(nodes_qids, quotes_per_node):
  """
  Function that computes features for all the existing edges.
  As input, it takes a mapping of the speaker(node) QIDs and the speaker names,
  as well as the dictionary that contains all quotes for each node.
  """

  # Predefining edges as tuples of node QIDs, resulting in 
  # the list of edges called edges_qids
  qids_list = list(nodes_qids.keys())
  edges_qids = []
  for pair in itertools.product(qids_list, qids_list):
    if pair[0] != pair[1]:
      edges_qids.append(pair)

  # For each edge (pair of nodes), we find all the quotes of interest
  edge_features = {}
  for (speaker_qid, mention_qid) in edges_qids:
    print(f'\nComputing features for edge {speaker_qid} - {mention_qid}')

    # Split the full name of the target person into its first and second name
    # (or however many there are) and look for both of those in the 
    # quotes of the current speaker
    mention_name = nodes_qids[mention_qid]
    look_for = mention_name.split()

    # Find the quotes mentioning the names
    curr_mentions = get_mentions_from_list(
        quotes_per_node[speaker_qid], look_for
        )

    if len(curr_mentions) == 0:
      print('No mentions for this edge. Skipping... \n')
      continue 

    print(f'Number of mentions {len(curr_mentions)}')

    # Iterate through current mentions of interest, compute the above-defined 
    # sentiment score (for the NLTK model) and assign it to the edge 
    all_scores = []
    for mention_iter, mention in enumerate(curr_mentions):
      if mention_iter % 20 == 0:
        print(f'Mention number {mention_iter}')

      curr_score = get_sentiment_score_nltk(mention[0], sia)
      all_scores.append(curr_score)
      
      # Assign the mean sentiment score and the quote count to the edge features
      mean_score = np.mean(all_scores)
      num_mentions = len(curr_mentions)

      edge_features[(speaker_qid, mention_qid)] = {
          'mean_sentiment': mean_score,
          'num_mentions': num_mentions,
          }

  return edge_features
  

In [19]:
edge_features = compute_edge_features(nodes_qids, quotes_per_node)

print('\n\nEdge features: \n')
# Print the edge features
for key, val in edge_features.items():
  print(nodes_qids[key[0]], nodes_qids[key[1]])
  print(val, '\n')


Computing features for edge Q22686 - Q6294
Number of mentions 87
Mention number 0
Mention number 20
Mention number 40
Mention number 60
Mention number 80

Computing features for edge Q22686 - Q6279
Number of mentions 31
Mention number 0
Mention number 20

Computing features for edge Q6294 - Q22686
Number of mentions 43
Mention number 0
Mention number 20
Mention number 40

Computing features for edge Q6294 - Q6279
Number of mentions 14
Mention number 0

Computing features for edge Q6279 - Q22686
No mentions for this edge. Skipping... 


Computing features for edge Q6279 - Q6294
Number of mentions 2
Mention number 0


Edge features: 

Donald Trump Hillary Clinton
{'mean_sentiment': -0.07831034482758623, 'num_mentions': 87} 

Donald Trump Barack Obama
{'mean_sentiment': -0.013967741935483861, 'num_mentions': 31} 

Hillary Clinton Donald Trump
{'mean_sentiment': -0.13325581395348837, 'num_mentions': 43} 

Hillary Clinton Barack Obama
{'mean_sentiment': -0.13135714285714287, 'num_mentions'

## Conclusion 
After implementing our idea and examining it thoroughly, we conclude that (so far) it hasn't shown much promise, due to the fact that the sentiment analysis models haven't worked as we hoped they will. Namely, the model from NLTK is not specific enough for our use-case (given that it analyzes the whole sentence and is not aspect-based). On the other hand, the ABSA model that seemed like a great fit in theory is extremely slow and thus impossible to use in this setting, with this many data points (quotes).

**ToDo** - find alternative models that also leverage the aspect-based sentiment analysis, but scale much better to the size of our dataset.

### Too good to go 🍞

We also tried to create a very naive and simple alternative to the aspect-based sentiment analysis, that would work at least partly well as ABSA itself, but using the NLTK model, thus fixing the execution time problem. What we did was the following:     
* Given a quote and target word, we found all the occurences of the target word in the quote (let's say that it occurs $n$ times)
* We split the quote into $n$ subquotes, each of them being an *epsilon-neighborhood* of each of the target word occurences
* Finally, we computed the sentiment score for all of those subquotes and averaged them, arriving to the final score for the given quote

Sadly, it didn't give us much improvement, but here's a part of that code anyways.


In [None]:
def split_quote_into_subquotes(quote, target, eps=5):
  """
  Function that splits the original quote into as many subquotes as 
  the number of the occurences of the target word in the original quote.
  """
  words = quote.split(' ')
  # Find the indexes the target word in the given quote
  target_inds = [i for i, word in enumerate(words) if target in word]

  # Iterate through the occurences of the target word and take
  # the epsilon neighborhoods
  subquotes = []
  for target_ind in target_inds:
    # Making sure we avoid index out of range
    start_ind = max(0, target_ind - eps)
    end_ind = min(len(words) - 1, target_ind + eps)

    subquotes.append(' '.join(words[start_ind : end_ind + 1]))

  return subquotes


def get_sentiment_score_for_quote(quote, target, eps=4):
  """
  Function that computes the final score for the given quote, first 
  splitting into subquotes and then averaging all the individual scores.
  """
  subquotes = split_quote_into_subquotes(quote, target, eps=eps)

  subquote_scores = []
  for subquote in subquotes:
    curr_score = get_sentiment_score_nltk(subquote)
    subquote_scores.append(curr_score)
      
  return np.mean(subquote_scores)