# Exploring the twitterverse for feelings on NPIs in response to COVID-19 in Canada

# Introduction

**Question:** How is twitter responding to NPIs?

**Motivation:** To understand the public’s response to non-pharmaceutical interventions (NPIs) across Canada, we may be able to leverage sentiment analysis of social media platforms. Which interventions were positively or negatively recieved?

**Solution:** A very casual, exploratory sentiment analysis of tweets by intervention categories.

**Take-aways:**
1. A dataset of tweets related to the interventions in the CAN-NPI dataset
2. Investigation on how sentiment analyses can go wrong
3. A first-go at a workflow to roughly understand how people feel about interventions.

I am no expert, so would love to get some feedback and any expertise, expecially regarding improvement of the sentiment analyses, pulling tweets, and data visualization :) 


# Method

## Data

I use the CAN-NPI dataset (`covid19-challenges/npi_canada.csv`), and use any tweets that contained any of the `source_urls` from this dataset which were pulled using [twint](https://github.com/twintproject/twint).


## Overview
0. **Set up:** Load packages, import modules, download data.
1. **Data Preprocessing:** Clean the tweets
2. **Data Analysis:** Comparison of sentiment analysis on text-only and a custom sentiment analysis that incorporates the sentiment of emojis.
3. **Visualization:** Plot the proportion of positive, negative, and neutral tweets of intervention categories with "sufficient" tweet coverage.

## Set Up

In [None]:
# download necessary packages
!pip install langdetect
!pip install emoji

In [None]:
# load modules
import pandas as pd
from datetime import datetime, date, timedelta
import numpy as np
import re
import os

import matplotlib.pyplot as plt
import seaborn as sns

import nltk 
nltk.download('stopwords')
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
import gensim
from sklearn.model_selection import cross_val_score, StratifiedShuffleSplit,train_test_split, GroupShuffleSplit
from langdetect import detect
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
from keras.wrappers.scikit_learn import KerasClassifier
from textblob import TextBlob


os.environ['KMP_DUPLICATE_LIB_OK']='True'

In [None]:
# load CAN-NPI dataset
npis_csv = "/kaggle/input/covid19-challenges/npi_canada.csv"
raw_data = pd.read_csv(npis_csv,encoding = "ISO-8859-1")
# remove any rows that don't have a start_date, region, or intervention_category
df = raw_data.dropna(how='any', subset=['start_date', 'region', 'intervention_category'])
df['region'] = df['region'].replace('Newfoundland', 'Newfoundland and Labrador')
num_rows_removed = len(raw_data)-len(df)
print("Number of rows removed: {}".format(num_rows_removed))

# get all regions
regions = list(set(df.region.values))
print("Number of unique regions: {}".format(len(regions)))

# get all intervention categories
num_cats = list(set(df.intervention_category.values))
num_interventions = len(num_cats)
print("Number of unique intervention categories: {}".format(len(num_cats)))

# get earliest start date and latest start date
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
earliest_start_date = df['start_date'].min()
latest_start_date = df['start_date'].max()
num_days = latest_start_date - earliest_start_date
print("Analyzing from {} to {} ({} days)".format(earliest_start_date.date(), latest_start_date.date(), num_days))
print("DONE READING DATA")

In [None]:
# load tweets
merged_tweets_csv = '/kaggle/input/npi-twitterverse-april-30/tweets_to_intervention_category.source_urls.tsv'
colnames = ["npi_record_id", "intervention_category", "oxford_government_response_category", "source_url", "id", "conversation_id", "created_at", "date", "time", "timezone", "user_id", "username", "name", "place", "tweet", "mentions", "urls", "photos", "replies_count", "retweets_count", "likes_count", "hashtags", "cashtags", "link", "retweet", "quote_url", "video", "near", "geo", "source", "user_rt_id", "user_rt", "retweet_id", "reply_to", "retweet_date", "translate", "trans_src", "trans_dest"]
tweets_df = pd.read_csv(merged_tweets_csv, encoding = "utf-8", error_bad_lines=False, engine='python', names=colnames)
# drop any rows without tweets - aka any interventions supported by non-tweeted media urls
tweets_df = tweets_df.dropna(how='any', subset=['npi_record_id', 'intervention_category', 'tweet'])

# only get english tweets
data = []
for index, row in tweets_df.iterrows():
    # detect only english tweets
    tweet = row['tweet'].strip()
    if tweet != "":
        language =""
        try:
            language = detect(tweet)
        except:
            language = "error"
        if language == "en":
            data.append([row['intervention_category'], tweet])
tweets_df_en = pd.DataFrame(data, columns=["intervention_category", "tweet"])
print("Number of non-english tweets = {}".format(len(tweets_df) - len(tweets_df_en)))
print("Number of tweets collected = {}".format(len(tweets_df_en)))

## Data Preprocessing

I performed some standard text preprocessing on the tweets. I masked any URLs, usernames, and removed any hashtags or non-alpabetical characters. Any words with repeated characters were shortened (for ex. "hellooooooo"-->"hello").

Some non-standard practices I used, included removal of words that highly influenced the sentiment analysis but I found did not make much sense in this context. For example, words that such as "first", "positive", or "confirmed", seemed to drive the polarity scores positively. This makes sense intuitively in other contexts. However, in the CAN-NPI dataset, this unfairly skewed the sentiments of tweets related to "First death announcements" or "General case announcements".

### Examples of the pitfalls when using off-the-shelf sentiment analyses in NPI sentiment analyses

In [None]:
# Here's a few examples of First death announcements
ex1 = "Here's a wrap of the latest coronavirus news in Canada: 77 cases, one death, an outbreak in a B.C. nursing home and Ottawa asks provinces about their critical supply gaps.  https://www.theglobeandmail.com/canada/article-bc-records-canadas-first-coronavirus-death/"
ex2 = "B.C. records Canada’s first coronavirus death  http://dlvr.it/RRZPGL  pic.twitter.com/pn8T4yumQJ"
print("Example 1 = {}".format(ex1))
print("Example 2 = {}".format(ex2))

These are just announcements, pretty neutral but the scores are the following:

In [None]:
ex1_tb = TextBlob(ex1)
ex1_ss = ex1_tb.sentiment[0]
print("Example 1 has score={}".format(ex1_ss))
ex2_tb = TextBlob(ex2)
ex2_ss = ex2_tb.sentiment[0]
print("Example 2 has score={}".format(ex2_ss))

Words like "first" in other contexts is pretty positive, but not in this case. What is the effect of removing this word on the sentiment score?

In [None]:
ex = "first coronavirus death"
ex_tb = TextBlob(ex)
ex_ss = ex_tb.sentiment[0]
print("{} with score={}".format(ex, ex_ss))

ex = "coronavirus death"
ex_tb = TextBlob(ex)
ex_ss = ex_tb.sentiment[0]
print("{} with score={}".format(ex, ex_ss))

It got more positive with the word "first". Moving forward, I remove other words that show that same pattern such as "confirm", and "positive".

In [None]:
import re 
import nltk
nltk.download('punkt')

def tweet_preprocess(text):
  '''Return tokenized text with 
  rsemoved URLs, usernames, hashtags, weird characters, repeated
  characters, stop words, and numbers
  '''
  text = text.lower()
  text = re.sub('((www\.[^\s]+)|(https?://[^\s]+))', 'URL', text) # remove URLs
  text = re.sub(r'@[A-Za-z0-9]+','USER',text) # removes any usernames in tweets
  text = re.sub(r'#([^\s]+)', r'\1', text) # remove the # in #hashtag
  text = re.sub('[^a-zA-Z0-9-*. ]', ' ', text) # remove any remaining weird characters
  words = word_tokenize(text)  # remove repeated characters (helloooooooo into hello)
  ignore = set(stopwords.words('english'))
  more_ignore = {'at', 'and', 'also', 'or', "http", "ca", "www", "https", "com", "twitter", "html", "news", "link", \
                 "positive", "first", "First", "confirmed", "confirm", "confirms"}
  ignore.update(more_ignore)
  #porter = PorterStemmer()
  #cleaned_words_tokens = [porter.stem(w) for w in words if w not in ignore]
  cleaned_words_tokens = [w for w in words if w not in ignore]
  cleaned_words_tokens = [w for w in cleaned_words_tokens if w.isalpha()]

  return cleaned_words_tokens

## Data analysis

### Sentiment analysis (text-based only)

In [None]:
def run_sentiment_analysis(tweets_df):
  tweets_df["sentiment"] = 0
  for index, row in tweets_df.iterrows():
    tokens = tweet_preprocess(row['tweet'])
    clean_text = ' '.join(tokens)
    analysis = TextBlob(row['tweet'])
    analysis_after_clean = TextBlob(clean_text)

    print("{}: {} \n before cleaning score={}, after cleaning score={}".format(row['intervention_category'], row['tweet'], analysis.sentiment[0], analysis_after_clean.sentiment[0]))

    if analysis.sentiment[0]>0:
      print('Positive')
    elif analysis.sentiment[0]<0:
      print('Negative')
    else:
      print('Neutral')
    print("======================================")

In [None]:
run_sentiment_analysis(tweets_df_en[:5])

### Sentiment analysis (text+emoji)

I found that some tweets that were clearly positive, were not being scored as such. 

For example, this Public Announcement: 
> "THANK YOU Government of #Canada ! ❤❤❤❤❤❤ Government of #Canada evacuating Canadians on board #DiamondPrincess cruise ship   https://bit.ly/2UVjHgx  #outbreak #COVID19 #SARSCoV2 #Coronavirus #nCoV2019 #COVIDー1"

Had a polarity score of 0.0. However, it's hard to argue that it's a neutral tweet.

I wondered if there was a way to better account for the use of emojis. Previous work, seemed to show that incorporating emojis significantly improves polarity scores and often "dominate[s] the sentiment conveyed by textual cues and forms a good proxy for the polarity of text" [(Hogenboom et al., 2015)](https://personal.eur.nl/frasincar/papers/JWE2015/jwe2015.pdf). I introduce a *very rough* modified sentiment score using the emoji sentiment mapping and scoring scheme from [Novak et al., 2015](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144296#pone.0144296.ref006). Here, they give a sentiment score for each emoji.

Due to the lack of availability of labeled tweets, emojis are sometimes used to "distantly" label the sentiment of tweets [(Felbo et al., 2017)](https://arxiv.org/pdf/1708.00524.pdf). Given that previous work actually use the emojis as labels and often dominate the sentiment, I use a rule-based method, where emojis with a "high" score either negative or positive determine the overall sentiment of the tweet. For cases, where there are multiple emojis, I average the sentiment scores. While the emoji sentiment mappings provided previously are fairly comprehensive, in some cases, the emojis are not found in their set. In these cases, I do not take these emojis into account when averaging their scores.

If no emojis exist, I use the sentiment analysis on the preprocessed tweets. After manual inspection, I found that sometimes the sentiment scores did not make sense again. I determine a very strict threshold based on a subset of tweets, only classifying a tweet as positive if the sentiment scores are greater than 0.25 and negative if less than -0.25.

In [None]:
# download sentiment map
!wget https://www.clarin.si/repository/xmlui/bitstream/handle/11356/1048/Emoji_Sentiment_Data_v1.0.csv

In [None]:
import emoji

# get emoji sentiment map
emoji_sent_csv = "Emoji_Sentiment_Data_v1.0.csv"
emoji_data = pd.read_csv(emoji_sent_csv,encoding = "ISO-8859-1")

def extract_emojis(str):
  return ''.join(c for c in str if c in emoji.UNICODE_EMOJI)

def calc_emoji_sent(e):
    e_uc = '0x{:X}'.format(ord(e)).lower()
    #print(e_uc)
    count_pos =0
    count_neg =0
    count_neutral = 0
    sr = emoji_data.loc[emoji_data["Unicode codepoint"] == e_uc.lower()]
    score = -100
    if not sr.empty:
        oc = int(sr["Occurrences"].astype(int))
        num_pos = int(sr["Positive"].astype(int))
        num_neut = int(sr["Neutral"].astype(int))
        num_neg = int(sr["Negative"].astype(int))
        score = 1*num_pos/oc + -1*num_neg/oc + 0*num_neut/oc
    #print("{} with score={}".format(e, score))
    return score

def run_sentiment_analysis_mod(tweets_df):
  tweets_df["sentiment_score"] = 0.0
  tweets_df["sentiment_class"] = ""

  for index, row in tweets_df.iterrows():
    tokens = tweet_preprocess(row['tweet'])
    clean_text = ' '.join(tokens)
    analysis = TextBlob(row['tweet'])
    analysis_after_clean = TextBlob(clean_text)
    c_score = analysis_after_clean.sentiment[0]
    
    # add emojis in sentiment analysis
    emojis_detected = extract_emojis(row['tweet'])
    avg_emoji_sent_score = 0
    emoji_counts = 0
    if emojis_detected:
        for e in emojis_detected:
            em_sent_score = calc_emoji_sent(e)
            if em_sent_score == -100:
              continue
            avg_emoji_sent_score += em_sent_score
            emoji_counts += 1
        if emoji_counts > 0:
            avg_emoji_sent_score = avg_emoji_sent_score/emoji_counts
        #print(avg_emoji_sent_score)


    # final score calculations
    score = 0.0
    label = "NEUTRAL"
    if avg_emoji_sent_score > 0.10:
        score = avg_emoji_sent_score
        label = "POSITIVE"
    elif avg_emoji_sent_score < -0.10:
        score = avg_emoji_sent_score
        label = "NEGATIVE"
    else:
        score = analysis_after_clean.sentiment[0]
        if score > 0.25:
          label = "POSITIVE"
        elif score < -0.25:
          label = "NEGATIVE"
    tweets_df.at[index, "sentiment_score"] = score
    tweets_df.at[index, "sentiment_class"] = label 
    '''print("=============================")
    print(row["intervention_category"] + "\n")
    print(row['tweet'])
    print(clean_text)
    print("Score (no clean) = {}".format(analysis.sentiment[0]))
    print("Score (clean) = {}".format(c_score))
    print("Final Score = {}".format(score))
    print(label)'''
  return tweets_df

mod_tweets_df = run_sentiment_analysis_mod(tweets_df)

## Results (Preliminary)

Let's see the proportion of sentiment classes by intervention category for intervention categories with at least 50 tweets. 

In [None]:
import plotly.graph_objects as go
import plotly

def split_data_by_class(tweets_df):
    total_tweets_by_cat = tweets_df.groupby('intervention_category')["id"].count().reset_index(name="count").sort_values("intervention_category", ascending=False)
    counts = tweets_df.groupby(['intervention_category',"sentiment_class"])["id"].count().reset_index(name="count").sort_values("intervention_category", ascending=False)
    counts["proportion"] = 0.0
    for index, row in counts.iterrows():
        total_tweets = int(total_tweets_by_cat.loc[total_tweets_by_cat["intervention_category"] == row["intervention_category"]]["count"].astype(int))
        counts.at[index, "proportion"] = row["count"]/total_tweets

    y = counts["intervention_category"].unique().tolist()

    # fill gaps - some sentiment_class + intervention_category combinations are empty
    # and it messes up my graphs :(
    fill_data = []
    for ic in y:
      for sc in ["POSITIVE", "NEUTRAL", "NEGATIVE"]:
        subset = counts[(counts.sentiment_class == sc) & (counts.intervention_category == ic)]
        if subset.empty:
          fill_data.append([ic, sc, 0, 0.0])
    fill_data_df = pd.DataFrame(fill_data, columns=["intervention_category", "sentiment_class", "count", "proportion"])
    full_counts = counts.append(fill_data_df).sort_values("intervention_category", ascending=False)

    return full_counts, y

def plot(full_counts, y, measure):
    # only plot intervention_category if it had "sufficient" number of tweets
    THRESH = 50
    total_tweets_by_cat = tweets_df.groupby('intervention_category')["id"].count().reset_index(name="count").sort_values("intervention_category", ascending=False)
    if measure == "proportion":
      # find all intervention_category with enough tweets
      y = total_tweets_by_cat[total_tweets_by_cat["count"] > THRESH]["intervention_category"].unique().tolist()
      full_counts = full_counts[full_counts.intervention_category.isin(y)]

    # split up by sentiment_class
    pos_counts = full_counts.loc[full_counts["sentiment_class"] == "POSITIVE"]
    neg_counts = full_counts.loc[full_counts["sentiment_class"] == "NEGATIVE"]
    neut_counts = full_counts.loc[full_counts["sentiment_class"] == "NEUTRAL"]
    print("Mean {} for positive class: {}".format(measure, round(pos_counts[measure].mean(),2)))
    print("Mean {} for negative class: {}".format(measure, round(neg_counts[measure].mean(),2)))
    print("Range {} for positive class: {}-{}".format(measure, round(pos_counts[measure].min(),2), round(pos_counts[measure].max(),2)))
    print("Range {}  for negative class: {}-{}".format(measure, round(neg_counts[measure].min(),2), round(neg_counts[measure].max(),2)))
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        y=y,
        x=pos_counts[measure],
        name='Positive',
        orientation='h',
        marker=dict(
            color='rgba(90, 191,165, 1.0)',
            line=dict(color='rgba(255, 255, 255, 1.0)', width=1)
        )
    ))
    fig.add_trace(go.Bar(
        y=y,
        x=neg_counts[measure],
        name='Negative',
        orientation='h',
        marker=dict(
            color='rgba(230, 130, 130, 1.0)',
            line=dict(color='rgba(255, 255, 255, 1.0)', width=1)
        )
    ))
    fig.add_trace(go.Bar(
        y=y,
        x=neut_counts[measure],
        name='Neutral',
        orientation='h',
        marker=dict(
            color='rgba(190, 203, 200, 1.0)',
            line=dict(color='rgba(255, 255, 255, 1.0)', width=1)
        )
    ))


    fig.update_layout(width=800, height=1200,barmode='stack', 
                      template='plotly_white',
                      bargap=0.5, # gap between bars of adjacent location coordinates.
                      #bargroupgap=0.5 # gap between bars of the same location coordinate.
                     )
    fig.show()
    #plotly.offline.iplot(fig, filename='fig.png')

full_counts, y = split_data_by_class(mod_tweets_df)
plot(full_counts,y, "proportion")
plot(full_counts,y, "count")

### Main points

* Majority of tweets are neutral
* When not neutral, for the most part twitter is responding pretty positively to the NPIs, with a mean proportion of 0.17.
* There was an order of magnitute difference in the number of tweets related to school closure with >20K tweets. The next closest intervention category was Emergency economic funding with 2650 tweets.

# Discussion

This work is in development still but it's interesting to see the pitfalls of sentiment analyses especially in the context of NPIs in response to COVID-19. 

## Future work
* Pull tweets from the Oxford Government Response Tracker, and see how the feelings differ in different regions.
* Sentiment analysis over time
* Sentiment analysis by region
* Use Twitter API to pull more tweets and replies. Tweepy is limited in the replies it pulls. I expect that most of the tweets here will be neutral as they are often coming from government officials or websites distributing news. I could possibly gain more of the public's perspective if I catch more replies.
* Evaluating how the different preprocessing steps influence sentiment
* Possibly develop a sentiment classifier. Off-the-shelf classfiers such as TextBlob do not seem to perform as well in cases such as described above with any general case announcements or first death announcements where words like "positive", "confirmed", and "first" are likely positive in sentiment in other contexts but not in the context of NPIs and COVID.