# Lexicon-based sentiment analysis of tweets: using VADER and TextBlob

Calculate the sentiment score of the tweets using lexicon-based models VADER and TextBlob. Both are constructed from a generalizable, valence-based, human-curated gold standard sentiment lexicon.

## Set up

In [1]:
import os
import re
import string
import pandas as pd
import numpy as np

from emot.emo_unicode import UNICODE_EMO, EMOTICONS

In [2]:
import sys
print(sys.executable)
print(sys.version)
print(sys.version_info)

/Users/alessiatosi/DS_projects/behavioural-sci-perception/venv/bin/python
3.8.1 (default, Apr  8 2020, 10:42:19) 
[Clang 11.0.0 (clang-1100.0.33.17)]
sys.version_info(major=3, minor=8, micro=1, releaselevel='final', serial=0)


In [3]:
%load_ext autoreload
from src.preproc_text import *
from src.utils import chain_functions
from src.analyse_text import get_sentiment_score_VDR, get_sentiment_score_TB

In [4]:
%reload_ext autoreload

In [5]:
os.getcwd()

'/Users/alessiatosi/DS_projects/behavioural-sci-perception/notebooks'

In [6]:
pd.options.display.max_seq_items = 10000
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

Environment variables and constants

In [7]:
DATA_DIR = os.environ.get("DIR_DATA_INTERIM")

In [8]:
FILENAME = "tweets_original_en"

### Define domain-specific stopwords

Sentiment requires context. 

When implementing an easy approach to sentiment analysis, you just have to kind of hope that you can ignore context and the sentiments will average out to the right trend.

However, we can take context into account by excluding those terms that are sentiment-loaded but that in the covid-19 context are so common to be "neutral" (e.g., crisis, virus, pandemic). 


In [9]:
EXTRA_STOPWORDS = ["coronavirus", "covid", "covid19", "covid-", "covid-19" "’s", "link", "dominic", "cummings", "boris", "johnson", 
                   "dr", "david", "halpern", "susan", "michie", "richard", "amlot", "thaler", "cass", "sunstein", 
                   "daniel", "kahneman", 
                   "d-", "th", "january", "february", "march", "april", "may", "june", "july", "august", "september", 
                   "october", "november", "december", "corona", "virus", "wd", "&amp;", "article", "here", "%", "'s",
                  "'ve'", '&', 'amp', "'re", "via", "hoe", "'ve",
                  "crisis", "pandemic", "epidemic"]

## Get Data

In [10]:
tweets_df = pd.read_csv(os.path.join(DATA_DIR, FILENAME + '.csv'))

In [11]:
tweets_df.shape

(3107, 18)

# Text preprocessing

In [12]:
tweets_df.text.head(10)

0                   "The introduction of the rules of behavior taken from the corporate sector into politics means that politicians no longer see people whom they rule as co-citizens but as employees." – @BrankoMilan …sounds familiar UK? #NudgeUnit #coronavirus #HerdImmunity https://t.co/vABII5Y2jB
1    @chprotary 🖐️🚰 #rotary #handwashing #behaviourchange project #handhygieneforhealth WATCH VIDEO https://t.co/uKEEDPh8e0 https://t.co/HtUOzsVYKT with #SPATAP Portable Tap making communities hygienic instantly https://t.co/7X3H0SlUB7  info@handhygieneforhealth.org #COVID19 https://t.co/0PkpOJsC5Q
2                                     "Of course, should behaviour change as governments would wish, actual outcomes will be less terrifying than the models had originally forecast. This is not evidence that the policy was unnecessary: Rather, it is evidence that it worked." https://t.co/b5jKA0D6r7
3                                                                                                   

## Are there still duplicates?

Looks like there are still duplicates in the dataset that we need to get rid of. Consider re-teweet counts when doing do.

In [13]:
# Find a duplicate texts
duplicate_tweets = tweets_df[tweets_df.duplicated(['text'])]
print(duplicate_tweets[['favorite_count', 'retweet_count', 'text']])

      favorite_count  retweet_count  \
1212               0              1   
1520               0              2   
1851               1              0   
2057               0              0   
2122               0              2   
2323               2              1   
2709               0              0   
2810               1              1   
2812               1              0   
2997               0              0   

                                                                                                                                                                                                                                                                               text  
1212                                                     The deployment of behavioural science is a feature of the public health response to the Covid-19 pandemic both internationally and in Ireland, https://t.co/o16Wjg5BRd #COVID19ireland #COVID19 #nudge #behaviouralscience  
1520  "It is c

Apparently they are all duplicates of one single tweets.

We will keep the one with the largest count of "favourites". 

In [14]:
duplicate_tweets[duplicate_tweets.favorite_count == max(duplicate_tweets.favorite_count)].index

Int64Index([2323], dtype='int64')

In [15]:
# get index
duplicate_tweets_index = duplicate_tweets[duplicate_tweets.favorite_count != 
                                          max(duplicate_tweets.favorite_count)].index

In [16]:
duplicate_tweets_index

Int64Index([1212, 1520, 1851, 2057, 2122, 2709, 2810, 2812, 2997], dtype='int64')

In [17]:
tweets_df = tweets_df.drop(duplicate_tweets_index, axis=0).copy()

In [18]:
tweets_df.shape

(3098, 18)

### Quick and dirty sentiment analysis without preprocessing the text

In [19]:
tweets_df['quick_VDR_sentiment'] = [get_sentiment_score_VDR(tweet) for tweet in tweets_df.text]

In [20]:
tweets_df['quick_TB_sentiment'] = [round(get_sentiment_score_TB(tweet),3) for tweet in tweets_df.text]

In [21]:
tweets_df[['text', 'quick_VDR_sentiment', 'quick_TB_sentiment']][:10]

Unnamed: 0,text,quick_VDR_sentiment,quick_TB_sentiment
0,"""The introduction of the rules of behavior taken from the corporate sector into politics means that politicians no longer see people whom they rule as co-citizens but as employees."" – @BrankoMilan …sounds familiar UK? #NudgeUnit #coronavirus #HerdImmunity https://t.co/vABII5Y2jB",{'compound': -0.1531},0.188
1,@chprotary 🖐️🚰 #rotary #handwashing #behaviourchange project #handhygieneforhealth WATCH VIDEO https://t.co/uKEEDPh8e0 https://t.co/HtUOzsVYKT with #SPATAP Portable Tap making communities hygienic instantly https://t.co/7X3H0SlUB7 info@handhygieneforhealth.org #COVID19 https://t.co/0PkpOJsC5Q,{'compound': 0.0},0.0
2,"""Of course, should behaviour change as governments would wish, actual outcomes will be less terrifying than the models had originally forecast. This is not evidence that the policy was unnecessary: Rather, it is evidence that it worked."" https://t.co/b5jKA0D6r7",{'compound': -0.1796},-0.238
3,"My cartoon - he doesn’t trust doctors, he wants to see a behavioural scientist \n#coronavirus https://t.co/BLCsrb88cb",{'compound': 0.5106},0.2
4,So it seems clear that UK strategy is to let virus spread to achieve herd immunity and to try and protect vulnerable by ... cocooning elderly people? This doctor is a psychologist https://t.co/wbBIjLeOt2,{'compound': 0.552},-0.2
5,Love it.👏👏👏👏 https://t.co/RjOEc3DYPw,{'compound': 0.6369},0.5
6,RT @shayonislynn: Using #behaviouralscience to improve behaviours #nudgesinthewild #COVID19 https://t.co/OyIDElZwR7,{'compound': 0.4404},0.0
7,"What role will behavioural science play in lifting the lockdown in the UK? Oxera Senior Consultants Leon Fields and Tim Hogg and Senior Adviser Peter Andrews explore underlying questions of nudging, empiricism and compliance here: https://t.co/2ZMVZTSCiV #covid19 #lockdown",{'compound': 0.34},0.0
8,"Behaviour Change in the context of #COVID19 , will the tried and tested still hold?. Join @AfricaSBC on 13/05/2020 for an indepth discussion on behaviour change in the context of COVID-19. Click the link to join- https://t.co/dqLFxF6ZHS #KomeshaCorona #coronaviruskenya https://t.co/dv6BcOXL99",{'compound': 0.5267},0.0
9,"Hey #Winnipeg!! Looking forward to joining @CBCIsmaila on today's #UpToSpeed to talk isolation, stress, and how to safeguard your mental health in these unprecedented times. @CBCManitoba #Manitoba #coronavirus #COVID19 #psychology #mentalhealth #behaviouralscience @senecacollege",{'compound': -0.5399},0.25


Looks like many negative tweets have not been captured as such.

## Text-preprocessing steps

#### First part

1. Replace emojis and emoticons with corresponding text description

#### Second part 

2. Replace URLs with word "url" or remove them
3. Remove first name of users metioned or replace them with "user_mentioned"
4. Replace all the hashtags with the words with the hash symbol (e.g., "#hello" -> "hello")

Given that 1. and 2. are not part of the lexicons, so do not contribute to the sentiment score.

#### Third part

5. Split compounded-by-upper-case strings
6. Split compounded-by-underscore "_" strings (this is two get the words that make up the emojis descriptions)
7. Remove digits
8. Remove single-character words
9. Split domain-specific compunded all-lower-case strings (e.g., behaviouraleconomics)

#### Fourth part

10. Remove stop words
11. Remove punctuation (but keep !?...)


First part

In [21]:
# as a check: a sample of tweets that contain emojis
idx_sample_tweets_emojs = [37, 57, 135, 136]

In [22]:
tweets = [convert_emojis(t) for t in tweets_df.text]

In [23]:
tweets = [convert_emoticons(t) for t in tweets]

In [24]:
# check
[tweets[i] for i in idx_sample_tweets_emojs]

['@CCHQPress @SkyNews Amazing how desperate #Tories have become now they realise tactics from Dominic Cummings\' seemingly infallible "Get Brexit Done" behavioural science playbook, do not work during a pandemic...expressionless_face #COVID19 #NHS #PPE #BorisJohnson',
 'On Ep. 24 of the #HumanRisk podcast @JezGroom\n&amp; @April_Vellacott of @CowryConsulting talk about their new book #Ripplebackhand_index_pointing_righthttpsSkeptical_annoyed_undecided_uneasy_or_hesitant/t.co/3S1YSyH4X2 &amp; explore some of the #BehaviouralScience dynamics of the #Coronavirus.\n\nFind it wherever you get your headphone &amp; backhand_index_pointing_righthttpsSkeptical_annoyed_undecided_uneasy_or_hesitant/t.co/jWy9IRcDGm httpsSkeptical_annoyed_undecided_uneasy_or_hesitant/t.co/gxg7F2NVUF',
 'Part 2 of our blog series on COVID-19 is available to read now!! party_popper @KingsIoPPN @inspirethemind_.               @emilyjanehayes and I review messages from government in the UK &amp; elsewhere through the l

Second part

In [25]:
tweets = [clean_tweet_quibbles(tweet) for tweet in tweets]

In [26]:
tweets[:10]

['As covid-19 sweeps the world, shoppers forced to change purchase behaviour. coronavirus consumer behaviorchange restaurants theatre automobile health finance COVID19 CoronaVirusUpdate Link:',
 'They live on a different planet.... COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics NWO RevolutionNow Censorship Stasi endthelockdown NoVaccineForMe',
 'Ask leaders to make BehavioralScience core education RT Behavioural Design to keep safe distance coronavirus COVID19 socialdistancing behavioralscience nudge',
 'Ask leaders to make BehavioralScience core education RT RT COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics',
 "Wake up world. You're being conned. Switch off MSM &amp; start thinking for yourselves. COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics NWO",
 "I have been critical re the behavioural scien

In [27]:
[tweets[i] for i in idx_sample_tweets_emojs]

['Amazing how desperate Tories have become now they realise tactics from Dominic Cummings\' seemingly infallible "Get Brexit Done" behavioural science playbook, do not work during a pandemic...expressionless_face COVID19 NHS PPE BorisJohnson',
 'On Ep. 24 of the HumanRisk podcast &amp; of talk about their new book Ripplebackhand_index_pointing_right &amp; explore some of the BehaviouralScience dynamics of the Coronavirus. Find it wherever you get your headphone &amp; backhand_index_pointing_right',
 'Part 2 of our blog series on COVID-19 is available to read now!! party_popper and I review messages from government in the UK &amp; elsewhere through the lens of behavioural sciencedetectivefemale_sign chart_increasingchart_decreasing',
 'Ask leaders to make BehavioralScience core education RT COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience']

Third part

In [28]:
preproc_pipe1 = chain_functions(split_lowercase_compounds,
                                split_string_at_uppercase,
                                break_compound_words,
                                remove_digits, 
                                remove_single_characters)

In [29]:
tweets = [preproc_pipe1(tweet) for tweet in tweets]

In [30]:
tweets[:10]

['As covid- sweeps the world, shoppers forced to change purchase behaviour. coronavirus consumer behavior change restaurants theatre automobile health finance Corona Virus Update Link:',
 'They live on different planet.... Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics Revolution Now Censorship Stasi endthelockdown No Vaccine For Me',
 'Ask leaders to make Behavioral Science core education Behavioural Design to keep safe distance coronavirus social distancing behavioral science nudge',
 'Ask leaders to make Behavioral Science core education Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics',
 "Wake up world. You're being conned. Switch off &amp; start thinking for yourselves. Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics",
 "have been critical re the behavioural science &amp; public communication strategies but

In [31]:
[tweets[i] for i in idx_sample_tweets_emojs]

['Amazing how desperate Tories have become now they realise tactics from Dominic Cummings\' seemingly infallible Get Brexit Done" behavioural science playbook, do not work during pandemic...expressionless face Boris Johnson',
 'On Ep. of the Human Risk podcast &amp; of talk about their new book Ripplebackhand index pointing right &amp; explore some of the Behavioural Science dynamics of the Coronavirus. Find it wherever you get your headphone &amp; backhand index pointing right',
 'Part of our blog series on D- is available to read now!! party popper and review messages from government in the &amp; elsewhere through the lens of behavioural sciencedetectivefemale sign chart increasingchart decreasing',
 'Ask leaders to make Behavioral Science core education Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science']

Fourth part

In [32]:
# lower text
tweets = [tweet.lower() for tweet in tweets]

In [33]:
tokenise_pipe = chain_functions(tokenise_sent, tokenise_word)

In [34]:
tweets_tok = [tokenise_pipe(tweet) for tweet in tweets]

In [35]:
tweets_tok = [remove_stopwords(tweet, extra_stopwords= EXTRA_STOPWORDS) for tweet in tweets_tok]

In [36]:
# check
[tweets_tok[idx] for idx in idx_sample_tweets_emojs][:1]

[[['amazing',
   'desperate',
   'tories',
   'become',
   'realise',
   'tactics',
   "'",
   'seemingly',
   'infallible',
   'get',
   'brexit',
   'done',
   "''",
   'behavioural',
   'science',
   'playbook',
   ',',
   'not',
   'work',
   '...',
   'expressionless',
   'face']]]

Let's do not remove punctuation form the time being

In [37]:
tokens2string_pipe = chain_functions(flatten_irregular_listoflists, list, detokenise_list)

In [38]:
tweets_cleaned = [tokens2string_pipe(tweet) for tweet in tweets_tok]

In [39]:
# remove extra white space before punctuation
tweets_cleaned = [re.sub(r'\s([?.!,;:"](?:\s|$))', r'\1', tweet) for tweet in tweets_cleaned]

In [40]:
# let's take a look
[tweets_cleaned[i] for i in idx_sample_tweets_emojs]

["amazing desperate tories become realise tactics ' seemingly infallible get brexit done '' behavioural science playbook, not work ... expressionless face",
 'ep. human risk podcast; talk new book ripplebackhand index pointing right; explore behavioural science dynamics. find wherever get headphone; backhand index pointing right',
 'part blog series available read! ! party popper review messages government; elsewhere lens behavioural sciencedetectivefemale sign chart increasingchart decreasing',
 'ask leaders make behavioral science core education tories police state police state mass surveillance behavioral science']

## Merge to original dataset of tweets

In [41]:
len(tweets_cleaned)

4166

In [42]:
tweets_df['tweet_cleaned'] = tweets_cleaned

In [43]:
tweets_df[['text', 'tweet_cleaned']][:10]

Unnamed: 0,text,tweet_cleaned
0,"As covid-19 sweeps the world, shoppers forced to change purchase behaviour.\n#coronavirus #consumer #behaviorchange #restaurants #theatre #automobile #health #finance #COVID19 #CoronaVirusUpdate \nLink: https://t.co/402gGrrCAA https://t.co/WUsB26ingV","sweeps world, shoppers forced change purchase behaviour. consumer behavior change restaurants theatre automobile health finance update:"
1,They live on a different planet....\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown #NoVaccineForMe https://t.co/wyPcvRkL1C,live different planet .... tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown no vaccine
2,Ask leaders to make #BehavioralScience core #education RT @BriefcaseTweets: Behavioural Design to keep safe distance \n\n#coronavirus #COVIDー19 #socialdistancing #behavioralscience #nudge… https://t.co/OpraxDeqxb,ask leaders make behavioral science core education behavioural design keep safe distance social distancing behavioral science nudge
3,Ask leaders to make #BehavioralScience core #education RT @pikachanyan: RT @DavidIHodgson: #COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #…,ask leaders make behavioral science core education tories police state police state mass surveillance behavioral science behavioral economics
4,Wake up world. You're being conned. Switch off #MSM &amp; start thinking for yourselves.\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO https://t.co/1yOWt77vIE,wake world. conned. switch; start thinking. tories police state police state mass surveillance behavioral science behavioral economics
5,"I have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the #COVID19 pandemic so scientists of all stripes do consider joining! https://t.co/bal8v0DFXv","critical behavioural science; public communication strategies also willing help contribute, signed help government crowdsource expertise. aspects scientists stripes consider joining!"
6,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown https://t.co/Nz0MyGl1gS,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown
7,Ask leaders to make #BehavioralScience core #education RT @DavidIHodgson: @BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus… https://t.co/kVtcI3OyeQ,ask leaders make behavioral science core education chance mention sake balance?
8,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship https://t.co/HDqdJQES7T,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship
9,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this COVID19 pandemic! #WASH #Vaccine #COVID19 #Hygiene https://t.co/uPrwSsBJWv,importance integrating hygiene behaviour change routine immunisation programme important ever! vaccine hygiene


# VADER sentiment analysis on cleaned-text tweets

VADER stands for Valence Aware Dictionary for Sentiment Reasoning and is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. 

Intro to VADER: https://towardsdatascience.com/sentimental-analysis-using-vader-a3415fef7664

In [44]:
tweets_df['VDR_sentiment'] = [get_sentiment_score_VDR(tweet) for tweet in tweets_df.tweet_cleaned]

In [45]:
tweets_df[['text', 'tweet_cleaned','VDR_sentiment']][:10]

Unnamed: 0,text,tweet_cleaned,VDR_sentiment
0,"As covid-19 sweeps the world, shoppers forced to change purchase behaviour.\n#coronavirus #consumer #behaviorchange #restaurants #theatre #automobile #health #finance #COVID19 #CoronaVirusUpdate \nLink: https://t.co/402gGrrCAA https://t.co/WUsB26ingV","sweeps world, shoppers forced change purchase behaviour. consumer behavior change restaurants theatre automobile health finance update:",{'compound': -0.4588}
1,They live on a different planet....\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown #NoVaccineForMe https://t.co/wyPcvRkL1C,live different planet .... tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown no vaccine,{'compound': -0.296}
2,Ask leaders to make #BehavioralScience core #education RT @BriefcaseTweets: Behavioural Design to keep safe distance \n\n#coronavirus #COVIDー19 #socialdistancing #behavioralscience #nudge… https://t.co/OpraxDeqxb,ask leaders make behavioral science core education behavioural design keep safe distance social distancing behavioral science nudge,{'compound': 0.4404}
3,Ask leaders to make #BehavioralScience core #education RT @pikachanyan: RT @DavidIHodgson: #COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #…,ask leaders make behavioral science core education tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0}
4,Wake up world. You're being conned. Switch off #MSM &amp; start thinking for yourselves.\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO https://t.co/1yOWt77vIE,wake world. conned. switch; start thinking. tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0}
5,"I have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the #COVID19 pandemic so scientists of all stripes do consider joining! https://t.co/bal8v0DFXv","critical behavioural science; public communication strategies also willing help contribute, signed help government crowdsource expertise. aspects scientists stripes consider joining!",{'compound': 0.5255}
6,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown https://t.co/Nz0MyGl1gS,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown,{'compound': 0.25}
7,Ask leaders to make #BehavioralScience core #education RT @DavidIHodgson: @BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus… https://t.co/kVtcI3OyeQ,ask leaders make behavioral science core education chance mention sake balance?,{'compound': 0.25}
8,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship https://t.co/HDqdJQES7T,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship,{'compound': 0.25}
9,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this COVID19 pandemic! #WASH #Vaccine #COVID19 #Hygiene https://t.co/uPrwSsBJWv,importance integrating hygiene behaviour change routine immunisation programme important ever! vaccine hygiene,{'compound': 0.5562}


# TextBlob sentiment analysis on cleaned-text tweets

Intro to TextBlob: https://planspace.org/20150607-textblob_sentiment/  

In [46]:
tweets_df['TB_sentiment'] = [round(get_sentiment_score_TB(tweet),3) for tweet in tweets_df.tweet_cleaned]

In [47]:
tweets_df[['text', 'tweet_cleaned','VDR_sentiment', 'TB_sentiment']][:10]

Unnamed: 0,text,tweet_cleaned,VDR_sentiment,TB_sentiment
0,"As covid-19 sweeps the world, shoppers forced to change purchase behaviour.\n#coronavirus #consumer #behaviorchange #restaurants #theatre #automobile #health #finance #COVID19 #CoronaVirusUpdate \nLink: https://t.co/402gGrrCAA https://t.co/WUsB26ingV","sweeps world, shoppers forced change purchase behaviour. consumer behavior change restaurants theatre automobile health finance update:",{'compound': -0.4588},-0.3
1,They live on a different planet....\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown #NoVaccineForMe https://t.co/wyPcvRkL1C,live different planet .... tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown no vaccine,{'compound': -0.296},0.068
2,Ask leaders to make #BehavioralScience core #education RT @BriefcaseTweets: Behavioural Design to keep safe distance \n\n#coronavirus #COVIDー19 #socialdistancing #behavioralscience #nudge… https://t.co/OpraxDeqxb,ask leaders make behavioral science core education behavioural design keep safe distance social distancing behavioral science nudge,{'compound': 0.4404},0.267
3,Ask leaders to make #BehavioralScience core #education RT @pikachanyan: RT @DavidIHodgson: #COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #…,ask leaders make behavioral science core education tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0},0.0
4,Wake up world. You're being conned. Switch off #MSM &amp; start thinking for yourselves.\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO https://t.co/1yOWt77vIE,wake world. conned. switch; start thinking. tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0},0.0
5,"I have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the #COVID19 pandemic so scientists of all stripes do consider joining! https://t.co/bal8v0DFXv","critical behavioural science; public communication strategies also willing help contribute, signed help government crowdsource expertise. aspects scientists stripes consider joining!",{'compound': 0.5255},0.104
6,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown https://t.co/Nz0MyGl1gS,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown,{'compound': 0.25},0.0
7,Ask leaders to make #BehavioralScience core #education RT @DavidIHodgson: @BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus… https://t.co/kVtcI3OyeQ,ask leaders make behavioral science core education chance mention sake balance?,{'compound': 0.25},0.0
8,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship https://t.co/HDqdJQES7T,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship,{'compound': 0.25},0.0
9,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this COVID19 pandemic! #WASH #Vaccine #COVID19 #Hygiene https://t.co/uPrwSsBJWv,importance integrating hygiene behaviour change routine immunisation programme important ever! vaccine hygiene,{'compound': 0.5562},0.5


Let's compare these scores to the ones obtained for the non-preprocessed tweet texts:

## VADER with individual score for pos/neu/neg

In [48]:
tweets_df['VDR_detailed_sentiment'] = [get_sentiment_score_VDR(tweet, score_type='all') for tweet in tweets_df.tweet_cleaned]

In [49]:
tweets_df[['text', 'tweet_cleaned','VDR_sentiment', 'VDR_detailed_sentiment', 'TB_sentiment']][:10]

Unnamed: 0,text,tweet_cleaned,VDR_sentiment,VDR_detailed_sentiment,TB_sentiment
0,"As covid-19 sweeps the world, shoppers forced to change purchase behaviour.\n#coronavirus #consumer #behaviorchange #restaurants #theatre #automobile #health #finance #COVID19 #CoronaVirusUpdate \nLink: https://t.co/402gGrrCAA https://t.co/WUsB26ingV","sweeps world, shoppers forced change purchase behaviour. consumer behavior change restaurants theatre automobile health finance update:",{'compound': -0.4588},"{'neg': 0.167, 'neu': 0.833, 'pos': 0.0}",-0.3
1,They live on a different planet....\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown #NoVaccineForMe https://t.co/wyPcvRkL1C,live different planet .... tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown no vaccine,{'compound': -0.296},"{'neg': 0.099, 'neu': 0.901, 'pos': 0.0}",0.068
2,Ask leaders to make #BehavioralScience core #education RT @BriefcaseTweets: Behavioural Design to keep safe distance \n\n#coronavirus #COVIDー19 #socialdistancing #behavioralscience #nudge… https://t.co/OpraxDeqxb,ask leaders make behavioral science core education behavioural design keep safe distance social distancing behavioral science nudge,{'compound': 0.4404},"{'neg': 0.0, 'neu': 0.847, 'pos': 0.153}",0.267
3,Ask leaders to make #BehavioralScience core #education RT @pikachanyan: RT @DavidIHodgson: #COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #…,ask leaders make behavioral science core education tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0},"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}",0.0
4,Wake up world. You're being conned. Switch off #MSM &amp; start thinking for yourselves.\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO https://t.co/1yOWt77vIE,wake world. conned. switch; start thinking. tories police state police state mass surveillance behavioral science behavioral economics,{'compound': 0.0},"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}",0.0
5,"I have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the #COVID19 pandemic so scientists of all stripes do consider joining! https://t.co/bal8v0DFXv","critical behavioural science; public communication strategies also willing help contribute, signed help government crowdsource expertise. aspects scientists stripes consider joining!",{'compound': 0.5255},"{'neg': 0.092, 'neu': 0.68, 'pos': 0.228}",0.104
6,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown https://t.co/Nz0MyGl1gS,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship stasi endthelockdown,{'compound': 0.25},"{'neg': 0.0, 'neu': 0.9, 'pos': 0.1}",0.0
7,Ask leaders to make #BehavioralScience core #education RT @DavidIHodgson: @BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus… https://t.co/kVtcI3OyeQ,ask leaders make behavioral science core education chance mention sake balance?,{'compound': 0.25},"{'neg': 0.0, 'neu': 0.833, 'pos': 0.167}",0.0
8,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship https://t.co/HDqdJQES7T,chance mention sake balance? tories police state police state mass surveillance behavioral science behavioral economics revolution censorship,{'compound': 0.25},"{'neg': 0.0, 'neu': 0.889, 'pos': 0.111}",0.0
9,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this COVID19 pandemic! #WASH #Vaccine #COVID19 #Hygiene https://t.co/uPrwSsBJWv,importance integrating hygiene behaviour change routine immunisation programme important ever! vaccine hygiene,{'compound': 0.5562},"{'neg': 0.0, 'neu': 0.685, 'pos': 0.315}",0.5


## Let's clean the text less

Let's try not to clean the text as much as Vader should be sensitive to emoticons, capital letter that emphasise, etc... See: https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f

So, we will:

- keep emojis and emoticons in as they are
- not lemmatise
- not lower-case
- not remove stop-words 

#### New cleaning steps

1. Replace URLs with word "url" or remove them
2. Remove first name of users metioned or replace them with "user_mentioned"
3. Replace all the hashtags with the words with the hash symbol (e.g., "#hello" -> "hello")


4. Split compounded-by-upper-case strings
5. Split compounded-by-underscore "_" strings (this is two get the words that make up the emojis descriptions)
6. Remove digits
7. Remove single-character words
8. Split domain-specific compunded all-lower-case strings (e.g., behaviouraleconomics)

In [50]:
tweets_2 = [clean_tweet_quibbles(tweet) for tweet in tweets_df.text]

In [51]:
tweets_2[:5]

['As covid-19 sweeps the world, shoppers forced to change purchase behaviour. coronavirus consumer behaviorchange restaurants theatre automobile health finance COVID19 CoronaVirusUpdate Link:',
 'They live on a different planet.... COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics NWO RevolutionNow Censorship Stasi endthelockdown NoVaccineForMe',
 'Ask leaders to make BehavioralScience core education RT Behavioural Design to keep safe distance coronavirus COVID19 socialdistancing behavioralscience nudge',
 'Ask leaders to make BehavioralScience core education RT RT COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics',
 "Wake up world. You're being conned. Switch off MSM &amp; start thinking for yourselves. COVID19 Coronavirus ToriesOut PoliceState PoliceStateUK MassSurveillance BehavioralScience behavioraleconomics NWO"]

In [52]:
tweets_2 = [preproc_pipe1(tweet) for tweet in tweets_2]

In [53]:
tweets_2[:5]

['As covid- sweeps the world, shoppers forced to change purchase behaviour. coronavirus consumer behavior change restaurants theatre automobile health finance Corona Virus Update Link:',
 'They live on different planet.... Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics Revolution Now Censorship Stasi endthelockdown No Vaccine For Me',
 'Ask leaders to make Behavioral Science core education Behavioural Design to keep safe distance coronavirus social distancing behavioral science nudge',
 'Ask leaders to make Behavioral Science core education Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics',
 "Wake up world. You're being conned. Switch off &amp; start thinking for yourselves. Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics"]

In [54]:
tweets_df['tweet_cleaned_less'] = tweets_2

In [55]:
tweets_df['VDR_sentiment_2'] = [get_sentiment_score_VDR(tweet) for tweet in tweets_df.tweet_cleaned_less]

In [56]:
tweets_df['VDR_detailed_sentiment_2'] = [get_sentiment_score_VDR(tweet, score_type='all') for tweet in tweets_df.tweet_cleaned_less]

In [57]:
tweets_df[['text', 'tweet_cleaned_less','VDR_sentiment', 'VDR_sentiment_2', 'VDR_detailed_sentiment', 'VDR_detailed_sentiment_2']][:10]

Unnamed: 0,text,tweet_cleaned_less,VDR_sentiment,VDR_sentiment_2,VDR_detailed_sentiment,VDR_detailed_sentiment_2
0,"As covid-19 sweeps the world, shoppers forced to change purchase behaviour.\n#coronavirus #consumer #behaviorchange #restaurants #theatre #automobile #health #finance #COVID19 #CoronaVirusUpdate \nLink: https://t.co/402gGrrCAA https://t.co/WUsB26ingV","As covid- sweeps the world, shoppers forced to change purchase behaviour. coronavirus consumer behavior change restaurants theatre automobile health finance Corona Virus Update Link:",{'compound': -0.4588},{'compound': -0.4588},"{'neg': 0.167, 'neu': 0.833, 'pos': 0.0}","{'neg': 0.115, 'neu': 0.885, 'pos': 0.0}"
1,They live on a different planet....\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown #NoVaccineForMe https://t.co/wyPcvRkL1C,They live on different planet.... Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics Revolution Now Censorship Stasi endthelockdown No Vaccine For Me,{'compound': -0.296},{'compound': -0.296},"{'neg': 0.099, 'neu': 0.901, 'pos': 0.0}","{'neg': 0.078, 'neu': 0.922, 'pos': 0.0}"
2,Ask leaders to make #BehavioralScience core #education RT @BriefcaseTweets: Behavioural Design to keep safe distance \n\n#coronavirus #COVIDー19 #socialdistancing #behavioralscience #nudge… https://t.co/OpraxDeqxb,Ask leaders to make Behavioral Science core education Behavioural Design to keep safe distance coronavirus social distancing behavioral science nudge,{'compound': 0.4404},{'compound': 0.4404},"{'neg': 0.0, 'neu': 0.847, 'pos': 0.153}","{'neg': 0.0, 'neu': 0.868, 'pos': 0.132}"
3,Ask leaders to make #BehavioralScience core #education RT @pikachanyan: RT @DavidIHodgson: #COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #…,Ask leaders to make Behavioral Science core education Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics,{'compound': 0.0},{'compound': 0.0},"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}"
4,Wake up world. You're being conned. Switch off #MSM &amp; start thinking for yourselves.\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO https://t.co/1yOWt77vIE,Wake up world. You're being conned. Switch off &amp; start thinking for yourselves. Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics,{'compound': 0.0},{'compound': 0.0},"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0}"
5,"I have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the #COVID19 pandemic so scientists of all stripes do consider joining! https://t.co/bal8v0DFXv","have been critical re the behavioural science &amp; public communication strategies but also willing to help and contribute, signed up below to help the government crowdsource expertise. It's on all aspects of the pandemic so scientists of all stripes do consider joining!",{'compound': 0.5255},{'compound': 0.7745},"{'neg': 0.092, 'neu': 0.68, 'pos': 0.228}","{'neg': 0.034, 'neu': 0.812, 'pos': 0.154}"
6,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship #Stasi #endthelockdown https://t.co/Nz0MyGl1gS,Any chance of mention for the sake of balance? Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics Revolution Now Censorship Stasi endthelockdown,{'compound': 0.25},{'compound': 0.25},"{'neg': 0.0, 'neu': 0.9, 'pos': 0.1}","{'neg': 0.0, 'neu': 0.929, 'pos': 0.071}"
7,Ask leaders to make #BehavioralScience core #education RT @DavidIHodgson: @BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus… https://t.co/kVtcI3OyeQ,Ask leaders to make Behavioral Science core education Any chance of mention for the sake of balance? Coronavirus,{'compound': 0.25},{'compound': 0.25},"{'neg': 0.0, 'neu': 0.833, 'pos': 0.167}","{'neg': 0.0, 'neu': 0.895, 'pos': 0.105}"
8,@BBCNews @itvnews @piersmorgan @GMB \n\nAny chance of a mention for the sake of balance?\n\n#COVID19 #Coronavirus #ToriesOut #PoliceState #PoliceStateUK #MassSurveillance\n#BehavioralScience #behavioraleconomics #NWO #RevolutionNow #Censorship https://t.co/HDqdJQES7T,Any chance of mention for the sake of balance? Coronavirus Tories Out Police State Police State Mass Surveillance Behavioral Science behavioral economics Revolution Now Censorship,{'compound': 0.25},{'compound': 0.25},"{'neg': 0.0, 'neu': 0.889, 'pos': 0.111}","{'neg': 0.0, 'neu': 0.923, 'pos': 0.077}"
9,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this COVID19 pandemic! #WASH #Vaccine #COVID19 #Hygiene https://t.co/uPrwSsBJWv,The importance of Integrating hygiene behaviour change into routine immunisation programme is more important then ever in this pandemic! Vaccine Hygiene,{'compound': 0.5562},{'compound': 0.5974},"{'neg': 0.0, 'neu': 0.685, 'pos': 0.315}","{'neg': 0.0, 'neu': 0.795, 'pos': 0.205}"


Not much difference, really.

## Save dataset with sentiment scores

In [58]:
tweets_df_to_save = tweets_df[['id', 'created_at', 'favorite_count', 'retweet_count', 
           'text', 'tweet_cleaned', 'tweet_cleaned_less', 'VDR_sentiment', 'VDR_sentiment_2', 'TB_sentiment', 'VDR_detailed_sentiment', 'VDR_detailed_sentiment_2']]

In [59]:
tweets_df_to_save.to_csv(os.path.join(DATA_DIR, "tweets_en_lexicon_sentiments.csv"))

In [60]:
tweets_df_to_save.to_pickle(os.path.join(DATA_DIR, "tweets_en_lexicon_sentiments.pickle"))