## Sentiment Intensity Analysis

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

We will use this sentiment analysis of the tweets to calculate a score that will represent the importance of each tweet.

### Libraries

In [1]:
import numpy as np
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from tqdm import tqdm

Some fixed variables:

In [2]:
directory = '~/PycharmProjects/tfm_hugopobil'
tweets = pd.read_csv(f'{directory}/data/sampled_data/tweets_clean_v2.csv', low_memory=False)

In [3]:
print(tweets.shape)
tweets.head()

(23200, 15)


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date
0,21512,Kur Kaç Oldu?,,Günlük Döviz Kurları /\n\nDaily Currency Excha...,2016-08-14 08:28:42,2158.0,1.0,3.0,False,2021-02-05 11:00:03,bitcoin bitcoin btc btcusd,"['bitcoin', 'btc', 'BTCUSD']",KurKacOldu,False,2021-02-05
1,21499,EM_CryPT0,Nederland,▪️@CryptoBrothers5 Team ▪️💯% #Crypto▪️#BTC ▪️N...,2010-07-12 17:04:23,16100.0,602.0,1014.0,False,2021-02-05 11:08:30,to do or not to do crypto btc bitcoin ethereum,"['crypto', 'btc', 'Bitcoin', 'Ethereum']",Twitter for iPhone,False,2021-02-05
2,21498,aWebAnalysis | Crypto,Blockchain,Cryptocurrencies price monitor & analysis tool...,2017-08-30 19:26:58,1878.0,1454.0,33.0,False,2021-02-05 11:10:02,bitcoin btc current price hour hours days btc ...,"['btc', 'bitcoin']",AwebAnalysis,False,2021-02-05
3,21496,Mr Fulcanelli,Argentina,"be decentralized, be a smart contract",2010-08-23 20:41:38,157.0,96.0,8570.0,False,2021-02-05 11:10:39,node for bitcoin blockchain btc,"['Bitcoin', 'blockchain', 'BTC']",Twitter for iPhone,False,2021-02-05
4,21490,PCEX Member: India's Trusted BTC & Crypto Exch...,India,"PCEX Member is #India's fastest, reliable and ...",2020-04-15 08:18:20,319.0,135.0,455.0,False,2021-02-05 11:14:54,there may be other currencies like it that may...,,Twitter Web App,False,2021-02-05


### Analyzer model definition

Define the analyzer method and apply to tweets. We obtain a polarity score that will be defined as Compound in the dataframe.

In [4]:
analyzer = SentimentIntensityAnalyzer()
# Compound is the score given to the intensity of sentiment detected in tweets

compound = []

for i, s in enumerate(tqdm(tweets['text'],
                           position = 0,
                           leave = True)):

    # Variable vs can be modified to test different sentiment analysis
    
    vs = analyzer.polarity_scores(str(s))


    compound.append(vs['compound'])

tweets['compound'] = compound
tweets.head(100)

100%|██████████| 23200/23200 [00:02<00:00, 11249.55it/s]


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date,compound
0,21512,Kur Kaç Oldu?,,Günlük Döviz Kurları /\n\nDaily Currency Excha...,2016-08-14 08:28:42,2158.0,1.0,3.0,False,2021-02-05 11:00:03,bitcoin bitcoin btc btcusd,"['bitcoin', 'btc', 'BTCUSD']",KurKacOldu,False,2021-02-05,0.0000
1,21499,EM_CryPT0,Nederland,▪️@CryptoBrothers5 Team ▪️💯% #Crypto▪️#BTC ▪️N...,2010-07-12 17:04:23,16100.0,602.0,1014.0,False,2021-02-05 11:08:30,to do or not to do crypto btc bitcoin ethereum,"['crypto', 'btc', 'Bitcoin', 'Ethereum']",Twitter for iPhone,False,2021-02-05,0.0000
2,21498,aWebAnalysis | Crypto,Blockchain,Cryptocurrencies price monitor & analysis tool...,2017-08-30 19:26:58,1878.0,1454.0,33.0,False,2021-02-05 11:10:02,bitcoin btc current price hour hours days btc ...,"['btc', 'bitcoin']",AwebAnalysis,False,2021-02-05,0.0000
3,21496,Mr Fulcanelli,Argentina,"be decentralized, be a smart contract",2010-08-23 20:41:38,157.0,96.0,8570.0,False,2021-02-05 11:10:39,node for bitcoin blockchain btc,"['Bitcoin', 'blockchain', 'BTC']",Twitter for iPhone,False,2021-02-05,0.0000
4,21490,PCEX Member: India's Trusted BTC & Crypto Exch...,India,"PCEX Member is #India's fastest, reliable and ...",2020-04-15 08:18:20,319.0,135.0,455.0,False,2021-02-05 11:14:54,there may be other currencies like it that may...,,Twitter Web App,False,2021-02-05,0.4019
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,20782,HChaya 🇨🇦,"Montréal, Canada",Visual Artist - MBA,2009-02-03 10:47:08,69.0,255.0,771.0,False,2021-02-05 16:23:07,hours availability degrees of services and saf...,"['Bitcoin', 'btc']",Twitter for iPhone,False,2021-02-05,0.7269
96,20775,CryptoVision,,,2019-12-10 17:49:36,382.0,549.0,2519.0,False,2021-02-05 16:27:14,cmefutures listing eth eth ethereum on monday ...,"['cmefutures', 'eth', 'ethereum', 'btc']",Twitter Web App,False,2021-02-05,0.0000
97,20757,Kelvynn AB,,💜 RnB/Soul 💜,2015-06-16 09:19:10,601.0,385.0,743.0,False,2021-02-05 16:33:11,when other nations are out there making life e...,,Twitter for iPhone,False,2021-02-05,0.6908
98,20738,Aloha Maui,,"Self-made multi-millionaire - stocks, bonds, b...",2020-08-03 19:49:47,224.0,129.0,985.0,False,2021-02-05 16:44:19,twitter ceo jack dorsey has set up his own bit...,"['BTC', 'Bitcoin', 'Twitter']",Twitter for iPad,False,2021-02-05,0.0000


### Calculate score for each tweet

We will give different weights to each score depending on the importante and relevance of the acount using the number of followers and user favourites.

Score = Compound * user_followers * user_favourites / sum(

In [5]:
scores = []
for i, s in tqdm(tweets.iterrows(),
                 total=tweets.shape[0],
                 position=0,
                 leave=True):

    try:
        scores.append(s["compound"] * (int(s["user_followers"])+1) / 1000)

    except:
        scores.append(np.nan)

tweets["score"] = scores

100%|██████████| 23200/23200 [00:00<00:00, 39169.22it/s]


In [6]:
tweets.columns

Index(['index', 'user_name', 'user_location', 'user_description',
       'user_created', 'user_followers', 'user_friends', 'user_favourites',
       'user_verified', 'date', 'text', 'hashtags', 'source', 'is_retweet',
       'sample_date', 'compound', 'score'],
      dtype='object')

In [7]:
tweets[['text', 'sample_date', 'user_followers', 'compound', 'score']].head()

Unnamed: 0,text,sample_date,user_followers,compound,score
0,bitcoin bitcoin btc btcusd,2021-02-05,2158.0,0.0,0.0
1,to do or not to do crypto btc bitcoin ethereum,2021-02-05,16100.0,0.0,0.0
2,bitcoin btc current price hour hours days btc ...,2021-02-05,1878.0,0.0,0.0
3,node for bitcoin blockchain btc,2021-02-05,157.0,0.0,0.0
4,there may be other currencies like it that may...,2021-02-05,319.0,0.4019,0.128608


In [9]:
tweets[tweets['score'] == 0].count()

index               7773
user_name           7773
user_location       4242
user_description    7234
user_created        7773
user_followers      7773
user_friends        7773
user_favourites     7773
user_verified       7773
date                7773
text                7771
hashtags            7408
source              7641
is_retweet          7772
sample_date         7773
compound            7773
score               7773
dtype: int64

### Save to local:

In [7]:
tweets.to_csv(f'{directory}/data/sampled_data/tweets_scores_v2.csv')