## Sentiment Intensity Analysis

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

We will use this sentiment analysis of the tweets to calculate a score that will represent the importance of each tweet.

### Libraries

In [1]:
import numpy as np
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from tqdm import tqdm

Some fixed variables:

In [3]:
directory = '~/PycharmProjects/tfm_hugopobil'
tweets = pd.read_csv(f'{directory}/data/sampled_data/tweets_clean_v2.csv', low_memory=False)

In [4]:
print(tweets.shape)
tweets.head()

(23200, 15)


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date
0,21519,Rahul Chahal,,#Bitcoin #BTC,2009-03-26 10:41:20,100.0,388.0,3401.0,False,2021-02-05 10:53:49,Bitcoin and ETH both have bullish setups for a...,"['Bitcoin', 'ETH', 'BTC']",Twitter for iPad,False,2021-02-05
1,21507,Iconic Funds,"Frankfurt, Germany",Professional Crypto Asset Management\nhttps://...,2017-08-03 10:44:25,16813.0,818.0,1201.0,False,2021-02-05 11:00:24,4⃣ 🎙️ Bloomberg LP CryptoOutlook 2021 with ⬇️...,"['CryptoOutlook', 'cryptocurrency', 'bitcoin',...",Twitter Web App,False,2021-02-05
2,21491,CryptoSquawk,Australia,24x7 Crypto market real-time audio squawk for ...,2017-10-05 10:13:09,1282.0,25.0,72.0,False,2021-02-05 11:14:46,⬇️⬇️ $BTC SELLING PRESSURE ALERT 📉 Price tradi...,"['Bitcoin', 'BTC', 'crypto']",CryptoSquawkBot,False,2021-02-05
3,21483,Drew MacMartin,,"Co-Founder Wealth Playbook, Macro Economics & ...",2016-03-05 15:34:35,73.0,122.0,274.0,False,2021-02-05 11:20:02,"If hyperinflation does hit again, think of the...","['hyperinflation', 'inflation', 'flood', 'GOLD...",Twitter for iPhone,False,2021-02-05
4,21443,DeriBot.info,"Chicago, IL",https://t.co/HsKyGWNVqh - The Fastest Cloud Se...,2013-07-27 07:40:54,1973.0,127.0,631.0,False,2021-02-05 11:45:50,DeriBot Daily Trading Report 5.02.2021 11:42 U...,['Bitcoin'],Twitter Web App,False,2021-02-05


### Analyzer model definition

Define the analyzer method and apply to tweets. We obtain a polarity score that will be defined as Compound in the dataframe.

In [5]:
analyzer = SentimentIntensityAnalyzer()
# Compound is the score given to the intensity of sentiment detected in tweets

compound = []

for i, s in enumerate(tqdm(tweets['text'],
                           position = 0,
                           leave = True)):

    # Variable vs can be modified to test different sentiment analysis
    
    vs = analyzer.polarity_scores(str(s))


    compound.append(vs['compound'])

tweets['compound'] = compound
tweets.head()

100%|██████████| 23200/23200 [00:02<00:00, 9932.85it/s] 


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet,sample_date,compound
0,21519,Rahul Chahal,,#Bitcoin #BTC,2009-03-26 10:41:20,100.0,388.0,3401.0,False,2021-02-05 10:53:49,Bitcoin and ETH both have bullish setups for a...,"['Bitcoin', 'ETH', 'BTC']",Twitter for iPad,False,2021-02-05,0.6249
1,21507,Iconic Funds,"Frankfurt, Germany",Professional Crypto Asset Management\nhttps://...,2017-08-03 10:44:25,16813.0,818.0,1201.0,False,2021-02-05 11:00:24,4⃣ 🎙️ Bloomberg LP CryptoOutlook 2021 with ⬇️...,"['CryptoOutlook', 'cryptocurrency', 'bitcoin',...",Twitter Web App,False,2021-02-05,0.0
2,21491,CryptoSquawk,Australia,24x7 Crypto market real-time audio squawk for ...,2017-10-05 10:13:09,1282.0,25.0,72.0,False,2021-02-05 11:14:46,⬇️⬇️ $BTC SELLING PRESSURE ALERT 📉 Price tradi...,"['Bitcoin', 'BTC', 'crypto']",CryptoSquawkBot,False,2021-02-05,0.0
3,21483,Drew MacMartin,,"Co-Founder Wealth Playbook, Macro Economics & ...",2016-03-05 15:34:35,73.0,122.0,274.0,False,2021-02-05 11:20:02,"If hyperinflation does hit again, think of the...","['hyperinflation', 'inflation', 'flood', 'GOLD...",Twitter for iPhone,False,2021-02-05,0.4215
4,21443,DeriBot.info,"Chicago, IL",https://t.co/HsKyGWNVqh - The Fastest Cloud Se...,2013-07-27 07:40:54,1973.0,127.0,631.0,False,2021-02-05 11:45:50,DeriBot Daily Trading Report 5.02.2021 11:42 U...,['Bitcoin'],Twitter Web App,False,2021-02-05,0.0


### Calculate score for each tweet

We will give different weights to each score depending on the importante and relevance of the acount using the number of followers and user favourites.

Score = Compound * user_followers * user_favourites / sum(

In [6]:
scores = []
for i, s in tqdm(tweets.iterrows(),
                 total=tweets.shape[0],
                 position=0,
                 leave=True):

    try:
    # Scores are calculated using the relevance of the tweeter poster
    # Divide by 1000 to scale the scores
    # Divide by mean...
        scores.append(s["compound"] * (int(s["user_followers"])+1) / 1000)

    except:
        scores.append(np.nan)

tweets["score"] = scores

100%|██████████| 23200/23200 [00:00<00:00, 40252.20it/s]


### Save to local:

In [7]:
tweets.to_csv(f'{directory}/data/sampled_data/tweets_scores_v2.csv')