<a href="https://colab.research.google.com/github/nakhimchea/sentiment_analysis_ipynb/blob/main/SmartSA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *Import Libraries*

**SmartSA: Analysis of Tweeter and Apply to Strategy**

In [None]:
!pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

import numpy
import pandas

**Twitter Scraping**

In [None]:
!pip install snscrape
import snscrape.modules.twitter as twitter

# *Loading RoBERTa model*

**Get Model**

In [None]:
RoBERTa = 'cardiffnlp/twitter-roberta-base-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(RoBERTa)

**Initialize Tokenizer**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(RoBERTa)

**Sparse Labels**

In [5]:
labels = ['Negative', 'Neutral', 'Positive']

# *Getting Data from Social Network (Twitter)*

**Search Query**

In [69]:
query = 'BTC ETH (BTC OR ETH OR SOL OR NEAR) -DOGE -SHIB lang:en min_replies:2 min_faves:10 min_retweets:5 since_time:1667115400'
limit = 1000

**Query Tweets and Preprocessing**

In [70]:
tweets = []
for tweet in twitter.TwitterSearchScraper(query).get_items():
  # print(tweet)
  
  #preprocessing tweets
  tweetWords = []
  for word in tweet.content.split(' '):
    if word.startswith('@') and len(word) > 1:
      word = '@user'
    elif word.startswith('http'):
      word = 'http'
    tweetWords.append(word)

  tweetContent = ' '.join(tweetWords)

  if len(tweets) == limit:
    break
  else:
    tweets.append([tweet.date, tweet.username, tweetContent])

tweetsDF = pandas.DataFrame(tweets, columns=['Date', 'User', 'Tweet'])

**Print DataFrame**

In [71]:
print(tweetsDF)

                         Date             User  \
0   2022-10-31 07:12:51+00:00     Sangita_gems   
1   2022-10-31 07:09:04+00:00  nftanothersigma   
2   2022-10-31 06:54:24+00:00        EthEiNino   
3   2022-10-31 06:38:22+00:00       Johnnyjhn7   
4   2022-10-31 05:57:30+00:00     BraverCrypto   
..                        ...              ...   
106 2022-10-30 08:44:12+00:00         PadiSwap   
107 2022-10-30 08:28:19+00:00       Deluxe_Ape   
108 2022-10-30 08:25:02+00:00    AltCryptoGems   
109 2022-10-30 08:24:11+00:00   CryptoBull3000   
110 2022-10-30 08:01:40+00:00       Maxi__1981   

                                                 Tweet  
0    I Found new hidden gem 💎👸 Expected 3x ⌛I will ...  
1                   Gm☕️🍪🍪🍪\n\nHappy Halloween 🎃🫡 http  
2    🎉Crocodile Gang x The Apesons🎉\n\n5 X #FreeMin...  
3    🎁 #NFTGiveaway 🎁\n\n📢You have the chance to wi...  
4    HIGH RISK TRADE\n\nAped into $BABYVINE - only ...  
..                                                 ...  


**Save Tweets**

In [None]:
tweetsDF.to_csv('tweets.csv')

# *Sentiment Analysis*

**Tweet Classifications**

In [72]:
target = []
for index in range(len(tweetsDF.Tweet)):
  encodedTweet = tokenizer(tweetsDF.Tweet[index], return_tensors='pt')
  roBERTaBottleNeck = model(**encodedTweet)
  probabilities = softmax(roBERTaBottleNeck[0][0].detach().numpy())

  target.append([tweetsDF.Tweet[index], labels[numpy.argmax(probabilities)]])

**Classification Table**

In [73]:
targetTable = pandas.DataFrame(target, columns=['Tweet', 'Annotation'])
print(targetTable)

                                                 Tweet Annotation
0    I Found new hidden gem 💎👸 Expected 3x ⌛I will ...   Positive
1                   Gm☕️🍪🍪🍪\n\nHappy Halloween 🎃🫡 http   Positive
2    🎉Crocodile Gang x The Apesons🎉\n\n5 X #FreeMin...    Neutral
3    🎁 #NFTGiveaway 🎁\n\n📢You have the chance to wi...   Positive
4    HIGH RISK TRADE\n\nAped into $BABYVINE - only ...    Neutral
..                                                 ...        ...
106  New #Crypto GIVEAWAY !  100 $PADI Tokens  [@Ve...    Neutral
107  🎁 Weekly #Giveaway\n\n🏆 1x WL\n⏰ 48 Hrs\n\nTo ...    Neutral
108         Would you rather have $1m in #BTC or #ETH?    Neutral
109  ⚠️⚠️⚠️ $Meishu ⚠️⚠️⚠️\n\nXBOX ?! 🤯\nCRYTEK ?! ...    Neutral
110  🚀🌓 TTM FIRST QUARTER MINT 🌓🚀\n\n➡️ 🔥 1st Novem...   Positive

[111 rows x 2 columns]


**Normalize Result**

In [74]:
countLabels = [0, 0, 0]
for index in range(0, len(targetTable.Annotation)):
  if targetTable.Annotation[index] == labels[0]:
    countLabels[0] += 1
  elif targetTable.Annotation[index] == labels[1]:
    countLabels[1] += 1
  elif targetTable.Annotation[index] == labels[2]:
    countLabels[2] += 1

print('Probability      : {:.2f} {:.2f} {:.2f}'.format(countLabels[0]/sum(countLabels), countLabels[1]/sum(countLabels), countLabels[2]/sum(countLabels)))
print('Final Analysis   : {}'.format(labels[numpy.argmax(countLabels)]))
print('Confidentiality  : {:.2f}%'.format(countLabels[numpy.argmax(countLabels)]/sum(countLabels)*100))

Probability      : 0.05 0.51 0.43
Final Analysis   : Neutral
Confidentiality  : 51.35%
