## Sentiment Intensity Analysis

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

We will use this sentiment analysis of the tweets to calculate a score that will represent the importance of each tweet.

### Libraries

In [9]:
import numpy as np
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from tqdm import tqdm

Some fixed variables:

In [10]:
directory = '~/PycharmProjects/tfm_hugopobil'
tweets = pd.read_csv(f'{directory}/data/sampled_data/tweets_clean_v2.csv', low_memory=False)

In [12]:
print(tweets.shape)
tweets.head()

(23200, 15)


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date
0,21523,Iconic Holding,"Frankfurt am Main, Germany",Professional Crypto Asset Ventures \nhttps://t...,2021-01-05 13:22:24,301.0,1075.0,361.0,False,2021-02-05 10:52:04,debunking bitcoin myths by crypto...,"['Bitcoin', 'cryptocurrency', 'bitcoin', 'cryp...",Twitter Web App,False,2021-02-05
1,21521,Iconic Holding,"Frankfurt am Main, Germany",Professional Crypto Asset Ventures \nhttps://t...,2021-01-05 13:22:24,301.0,1075.0,361.0,False,2021-02-05 10:52:07,blockchain by cryptocurrency ...,"['Blockchain', 'cryptocurrency', 'bitcoin', 'c...",Twitter Web App,False,2021-02-05
2,21514,TOP AIM STOCKS,United Kingdom,2021 stocks NEW CHANNEL https://t.co/I323dIOkP...,2015-05-31 20:20:57,16546.0,224.0,71404.0,False,2021-02-05 10:58:47,bitcoin braces for as inverse head and shou...,"['Bitcoin', 'BTC']",Twitter Web App,False,2021-02-05
3,21510,Kur Ne Oldu,Turkey,Günlük Döviz Kurları /\n\nDaily Currency Excha...,2019-02-11 08:43:21,4154.0,76.0,46.0,False,2021-02-05 11:00:05,bitcoin bitcoin btc btcusd,"['bitcoin', 'btc', 'BTCUSD']",KurNeOldu,False,2021-02-05
4,21509,Iconic Funds,"Frankfurt, Germany",Professional Crypto Asset Management\nhttps://...,2017-08-03 10:44:25,16813.0,818.0,1201.0,False,2021-02-05 11:00:21,weekend read keen to learn about crypto ...,['crypto'],Twitter Web App,False,2021-02-05


### Analyzer model definition

Define the analyzer method and apply to tweets. We obtain a polarity score that will be defined as Compound in the dataframe.

In [16]:
analyzer = SentimentIntensityAnalyzer()
# Compound is the score given to the intensity of sentiment detected in tweets

compound = []

for i, s in enumerate(tqdm(tweets['text'],
                           position = 0,
                           leave = True)):

    # Variable vs can be modified to test different sentiment analysis
    
    vs = analyzer.polarity_scores(str(s))


    compound.append(vs['compound'])

tweets['compound'] = compound
tweets.head(100)

100%|██████████| 23200/23200 [00:02<00:00, 11435.78it/s]


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date,compound,score
0,21523,Iconic Holding,"Frankfurt am Main, Germany",Professional Crypto Asset Ventures \nhttps://t...,2021-01-05 13:22:24,301.0,1075.0,361.0,False,2021-02-05 10:52:04,debunking bitcoin myths by crypto...,"['Bitcoin', 'cryptocurrency', 'bitcoin', 'cryp...",Twitter Web App,False,2021-02-05,0.0000,0.000000
1,21521,Iconic Holding,"Frankfurt am Main, Germany",Professional Crypto Asset Ventures \nhttps://t...,2021-01-05 13:22:24,301.0,1075.0,361.0,False,2021-02-05 10:52:07,blockchain by cryptocurrency ...,"['Blockchain', 'cryptocurrency', 'bitcoin', 'c...",Twitter Web App,False,2021-02-05,0.0000,0.000000
2,21514,TOP AIM STOCKS,United Kingdom,2021 stocks NEW CHANNEL https://t.co/I323dIOkP...,2015-05-31 20:20:57,16546.0,224.0,71404.0,False,2021-02-05 10:58:47,bitcoin braces for as inverse head and shou...,"['Bitcoin', 'BTC']",Twitter Web App,False,2021-02-05,0.4019,6.650239
3,21510,Kur Ne Oldu,Turkey,Günlük Döviz Kurları /\n\nDaily Currency Excha...,2019-02-11 08:43:21,4154.0,76.0,46.0,False,2021-02-05 11:00:05,bitcoin bitcoin btc btcusd,"['bitcoin', 'btc', 'BTCUSD']",KurNeOldu,False,2021-02-05,0.0000,0.000000
4,21509,Iconic Funds,"Frankfurt, Germany",Professional Crypto Asset Management\nhttps://...,2017-08-03 10:44:25,16813.0,818.0,1201.0,False,2021-02-05 11:00:21,weekend read keen to learn about crypto ...,['crypto'],Twitter Web App,False,2021-02-05,0.4939,8.304435
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,20809,Jesse Holten,"Venlo, Nederland",,2020-11-04 14:54:17,18.0,61.0,867.0,False,2021-02-05 16:12:12,people always talk about a bitcoin mine i don...,"['Bitcoin', 'BTC', 'dogecoin', 'DOGE']",Twitter for Android,False,2021-02-05,0.5423,0.010304
96,20807,Tetris Trading,,Offering insight into the potential of the sto...,2020-09-07 07:38:43,8.0,6.0,12.0,False,2021-02-05 16:12:23,target hit on ether timing slightly out ...,"['ETHER', 'bitcoin', 'ether', 'doge', 'btc', '...",Hootsuite Inc.,False,2021-02-05,0.0000,0.000000
97,20800,KIP🤓,"Abuja, Nigeria","Wurld Stan🎶\n\nBoy, not Girl|♓blood|I stand by...",2017-01-28 11:13:49,651.0,599.0,17612.0,False,2021-02-05 16:14:50,business continues bitcoin buhari crypto btc...,"['Bitcoin', 'Buhari', 'BTC', 'bringbackourcryp...",Twitter for Android,False,2021-02-05,0.0000,0.000000
98,20784,Investor Insider,,Stocks only go up🚀 I am not a financial adviso...,2014-02-24 02:59:25,46.0,8.0,121.0,False,2021-02-05 16:21:58,who s your favorite broker to invest in bitcoi...,"['crypto', 'Bitcoin', 'cryptocurrency', 'BTC',...",Twitter for iPhone,False,2021-02-05,0.4588,0.021564


### Calculate score for each tweet

We will give different weights to each score depending on the importante and relevance of the acount using the number of followers and user favourites.

Score = Compound * user_followers * user_favourites / sum(

In [14]:
scores = []
for i, s in tqdm(tweets.iterrows(),
                 total=tweets.shape[0],
                 position=0,
                 leave=True):

    try:
        scores.append(s["compound"] * (int(s["user_followers"])+1) / 1000)

    except:
        scores.append(np.nan)

tweets["score"] = scores

100%|██████████| 23200/23200 [00:00<00:00, 39713.52it/s]


In [17]:
tweets.columns

Index(['index', 'user_name', 'user_location', 'user_description',
       'user_created', 'user_followers', 'user_friends', 'user_favourites',
       'user_verified', 'date', 'text', 'hashtags', 'source', 'is_retweet',
       'sample_date', 'compound', 'score'],
      dtype='object')

In [23]:
tweets[['text', 'sample_date', 'user_followers', 'compound', 'score']].head()

Unnamed: 0,text,sample_date,user_followers,compound,score
0,debunking bitcoin myths by crypto...,2021-02-05,301.0,0.0,0.0
1,blockchain by cryptocurrency ...,2021-02-05,301.0,0.0,0.0
2,bitcoin braces for as inverse head and shou...,2021-02-05,16546.0,0.4019,6.650239
3,bitcoin bitcoin btc btcusd,2021-02-05,4154.0,0.0,0.0
4,weekend read keen to learn about crypto ...,2021-02-05,16813.0,0.4939,8.304435


### Save to local:

In [7]:
tweets.to_csv(f'{directory}/data/sampled_data/tweets_scores_v2.csv')