## Sentiment Analysis (clean_tweets) - VADER

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

VADER takes into account:
negations and contractions (not good, wasn’t good) Punctuation (good!!!), CAPS, emotes :), emojis Intensificators (very, kind of), acronyms ‘lol’ Scores between -1.0 (negative) and 1.0 (positive)

We will use this sentiment analysis of the tweets to calculate a score that will represent the importance of each tweet.

In [2]:
from time import sleep
import json
import io
import re
import numpy as np
import matplotlib.pyplot as plot
import seaborn as sb
import pandas as pd
from tqdm import tqdm
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from tqdm import tnrange, tqdm_notebook, tqdm
from sklearn.feature_extraction.text import CountVectorizer
from pysentimiento import create_analyzer

from sklearn import preprocessing
import matplotlib.pyplot as plt

In [3]:
tweets = pd.read_csv('tweets_clean.csv', low_memory=False)

In [4]:
print(tweets.shape)
tweets.head()

(23475, 14)


Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
0,1545110,Muhammad Sa'ad,,Dota 2,2021-08-12 09:52:26,8.0,1089.0,1403.0,False,2021-10-28 12:30:44,"Airdrop is live , dont be late sir :)\n\n\n\n\...","['Airdrop', 'Airdrops', 'Airdropinspector', 'B...",Twitter for Android,False
1,1431064,Kikan4444,,share the positive vibes of the universe 🧲. 💜 ...,2021-07-09 15:46:05,346.0,214.0,1534.0,False,2021-10-22 05:47:45,something Big is coming \n$KLV Klever 💜🌟\n\nBT...,"['Klever', 'BTC', 'ETH', 'BNB', 'TRX']",Twitter for Android,False
2,1070632,Kripto Tiger 🇹🇷,,#bsc #bscgem #gem #shrew #gmrfinance #nftart,2020-11-07 07:37:33,164.0,298.0,5577.0,False,2021-08-16 09:58:44,Easy money with \n\n bsc bnb BinanceSmartChain...,"['bsc', 'bnb', 'BinanceSmartChain', 'binance',...",Twitter Web App,False
3,1276832,Sheesh | @NanoDogecoin | INDC,,#Staking #INDC!\n#Stake #NanoDogecoin & #earn:...,2011-02-10 04:39:28,151.0,453.0,6228.0,False,2021-10-19 02:09:26,NanoDogecoin is the King👑of the BSC Blockchain...,"['NanoDogecoin', 'King', 'BSC', 'Blockchain', ...",Twitter for Android,False
4,1980309,CoinMarketDaddy (CMD),United Kingdom,CMD stands for Coin Market Daddy & Was Built B...,2018-04-19 02:19:06,2833.0,1.0,118.0,False,2021-12-30 16:03:28,"Bitcoin News Roundup for June 17, 2020 cryptoc...","['cryptocurrencies', 'bitcoin', 'crypto', 'cry...",SocialBee.io v2,False


### Create a column indataframe with sentiment analysis:

In [7]:
analyzer = create_analyzer(task='sentiment', lang='en')

loading configuration file https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis/resolve/main/config.json from cache at /Users/hpp/.cache/huggingface/transformers/cb09766f7ba60b5f7a1bb640617b24f1499c4a6f3ab160c4a0ac171e3a377c68.008dca06003188334001a96363da79ced4944abc68d94a2f1e0db786dc5aa08b
Model config RobertaConfig {
  "_name_or_path": "finiteautomata/bertweet-base-sentiment-analysis",
  "architectures": [
    "RobertaForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "NEG",
    "1": "NEU",
    "2": "POS"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "NEG": 0,
    "NEU": 1,
    "POS": 2
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 130,
  "model_type": "roberta",
  "num_attention_heads": 12,

In [8]:
tweets

Unnamed: 0,index,user_name,user_location,user_description,user_created,user_followers,user_friends,user_favourites,user_verified,date,text,hashtags,source,is_retweet
0,1545110,Muhammad Sa'ad,,Dota 2,2021-08-12 09:52:26,8.0,1089.0,1403.0,False,2021-10-28 12:30:44,"Airdrop is live , dont be late sir :)\n\n\n\n\...","['Airdrop', 'Airdrops', 'Airdropinspector', 'B...",Twitter for Android,False
1,1431064,Kikan4444,,share the positive vibes of the universe 🧲. 💜 ...,2021-07-09 15:46:05,346.0,214.0,1534.0,False,2021-10-22 05:47:45,something Big is coming \n$KLV Klever 💜🌟\n\nBT...,"['Klever', 'BTC', 'ETH', 'BNB', 'TRX']",Twitter for Android,False
2,1070632,Kripto Tiger 🇹🇷,,#bsc #bscgem #gem #shrew #gmrfinance #nftart,2020-11-07 07:37:33,164.0,298.0,5577.0,False,2021-08-16 09:58:44,Easy money with \n\n bsc bnb BinanceSmartChain...,"['bsc', 'bnb', 'BinanceSmartChain', 'binance',...",Twitter Web App,False
3,1276832,Sheesh | @NanoDogecoin | INDC,,#Staking #INDC!\n#Stake #NanoDogecoin & #earn:...,2011-02-10 04:39:28,151.0,453.0,6228.0,False,2021-10-19 02:09:26,NanoDogecoin is the King👑of the BSC Blockchain...,"['NanoDogecoin', 'King', 'BSC', 'Blockchain', ...",Twitter for Android,False
4,1980309,CoinMarketDaddy (CMD),United Kingdom,CMD stands for Coin Market Daddy & Was Built B...,2018-04-19 02:19:06,2833.0,1.0,118.0,False,2021-12-30 16:03:28,"Bitcoin News Roundup for June 17, 2020 cryptoc...","['cryptocurrencies', 'bitcoin', 'crypto', 'cry...",SocialBee.io v2,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
23470,1804718,brettmurphynet,"Bay Area, CA",#affiliate #bitcoin #business #blogger #crypto...,2020-08-12 05:34:02,335.0,328.0,33.0,False,2021-11-18 16:24:07,#linkedin #twitter #facebook #instagram #bitco...,"['linkedin', 'twitter', 'facebook', 'instagram...",ContentStudio.io,False
23471,1973638,Moneymakerr777,,,2020-06-22 09:46:33,98.0,237.0,7100.0,False,2021-12-30 19:30:23,I'm new to Bitcoin donate if you can this is m...,['Bitcoin'],Twitter for Android,False
23472,1728273,Jimbo4901,Mars,#JoeMustGo\n#letsGoBrandon #FJB \n17° 1.6.21Si...,2021-05-20 20:13:36,452.0,925.0,6949.0,False,2021-11-12 01:31:10,Diamond Hands? Only 12.9% of #Bitcoin Supply R...,['Bitcoin'],Twitter for Android,False
23473,1184203,Muhammad Shoaib,"Lahore, Pakistan","Freelancer 📚,\nblogger 🖊️,\nDigital & Social M...",2021-05-16 11:41:46,23.0,30.0,638.0,False,2021-08-26 14:43:36,Future Project:\n• Unity Launch Pad\n• UnitySw...,"['UnityProtocol', 'unitycol', 'BTC', 'BNB', 'a...",Twitter for Android,False


In [9]:
%%time
tweets['sentiment'] = analyzer.predict(tweets['text'])

  0%|          | 0/734 [00:00<?, ?ba/s]

The following columns in the test set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Prediction *****
  Num examples = 23475
  Batch size = 32



KeyboardInterrupt



In [None]:
tweets['sentiment'].iloc[0].output

In [None]:
tweets['simplified_sentiment'] = tweets["sentiment"].apply(lambda x: x.output)
tweets.simplified_sentiment.value_counts()

### Save to local

In [None]:
tweets.to_csv('tweets_scores_PNN.csv')