<a href="https://colab.research.google.com/github/nakhimchea/sentiment_analysis_ipynb/blob/main/SmartSA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *Import Libraries*

**SmartSA: Analysis of Tweeter and Apply to Strategy**

In [None]:
!pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

import numpy
import pandas

**Twitter Scraping**

In [None]:
!pip install snscrape
import snscrape.modules.twitter as twitter

# *Loading RoBERTa model*

**Get Model**

In [None]:
RoBERTa = 'cardiffnlp/twitter-roberta-base-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(RoBERTa)

**Initialize Tokenizer**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(RoBERTa)

**Sparse Labels**

In [5]:
labels = ['Negative', 'Neutral', 'Positive']

# *Getting Data from Social Network (Twitter)*

**Search Query**

In [31]:
query = 'BTC (BTC OR ETH OR SOL OR NEAR) -DOGE -SHIB lang:en min_replies:2 min_faves:10 min_retweets:5 since_time:1667192400'
limit = 1000

**Query Tweets and Preprocessing**

In [32]:
tweets = []
for tweet in twitter.TwitterSearchScraper(query).get_items():
  # print(tweet)
  
  #preprocessing tweets
  tweetWords = []
  for word in tweet.content.split(' '):
    if word.startswith('@') and len(word) > 1:
      word = '@user'
    elif word.startswith('http'):
      word = 'http'
    tweetWords.append(word)

  tweetContent = ' '.join(tweetWords)

  if len(tweets) == limit:
    break
  else:
    tweets.append([tweet.date, tweet.username, tweetContent])

tweetsDF = pandas.DataFrame(tweets, columns=['Date', 'User', 'Tweet'])

**Print DataFrame**

In [33]:
print(tweetsDF)

                         Date             User  \
0   2022-10-31 15:23:11+00:00     CryptoNoob_1   
1   2022-10-31 15:18:58+00:00       JohalMiles   
2   2022-10-31 15:17:30+00:00      Justin_Bons   
3   2022-10-31 15:08:58+00:00         H_AAsfar   
4   2022-10-31 15:05:09+00:00   SatoshiFlipper   
..                        ...              ...   
186 2022-10-31 05:12:16+00:00       Bearomon10   
187 2022-10-31 05:12:02+00:00     cryptohunnys   
188 2022-10-31 05:10:10+00:00   Bybit_Official   
189 2022-10-31 05:01:21+00:00  LEXCOINOFFICIAL   
190 2022-10-31 05:00:07+00:00      mark_cullen   

                                                 Tweet  
0    Is #Bitcoin once again at its apex where it is...  
1    Going to keep posting this until the melt up s...  
2    Happy Bitcoin Whitepaper Day!\n\nPlease use al...  
3    A team working day and night to put the Gulf p...  
4    Getting a kick out of all the imminent $BTC co...  
..                                                 ...  


**Save Tweets**

In [21]:
tweetsDF.to_csv('tweets.csv')

# *Sentiment Analysis*

**Tweet Classifications**

In [34]:
target = []
for index in range(len(tweetsDF.Tweet)):
  encodedTweet = tokenizer(tweetsDF.Tweet[index], return_tensors='pt')
  roBERTaBottleNeck = model(**encodedTweet)
  probabilities = softmax(roBERTaBottleNeck[0][0].detach().numpy())

  target.append([tweetsDF.Tweet[index], labels[numpy.argmax(probabilities)]])

**Classification Table**

In [35]:
targetTable = pandas.DataFrame(target, columns=['Tweet', 'Annotation'])
print(targetTable)

                                                 Tweet Annotation
0    Is #Bitcoin once again at its apex where it is...    Neutral
1    Going to keep posting this until the melt up s...    Neutral
2    Happy Bitcoin Whitepaper Day!\n\nPlease use al...   Positive
3    A team working day and night to put the Gulf p...   Positive
4    Getting a kick out of all the imminent $BTC co...    Neutral
..                                                 ...        ...
186  #Altcoins are popping here &amp; there some ev...    Neutral
187  NEWS: The U.S. Department of Justice has accus...   Negative
188  Pfft, hallow-what? 🎃 Celebrate the big day wit...   Positive
189  Get ready for #LEXCOIN\nKnow more: http #NFT #...    Neutral
190  #GM #Crypto \n\n#Bitcoin pulled back to the 20...    Neutral

[191 rows x 2 columns]


**Normalize Result**

In [36]:
countLabels = [0, 0, 0]
for index in range(0, len(targetTable.Annotation)):
  if targetTable.Annotation[index] == labels[0]:
    countLabels[0] += 1
  elif targetTable.Annotation[index] == labels[1]:
    countLabels[1] += 1
  elif targetTable.Annotation[index] == labels[2]:
    countLabels[2] += 1

print('Probability      : {:.2f} {:.2f} {:.2f}'.format(countLabels[0]/sum(countLabels), countLabels[1]/sum(countLabels), countLabels[2]/sum(countLabels)))
print('Final Analysis   : {}'.format(labels[numpy.argmax(countLabels)]))
print('Confidentiality  : {:.2f}%'.format(countLabels[numpy.argmax(countLabels)]/sum(countLabels)*100))

Probability      : 0.05 0.46 0.49
Final Analysis   : Positive
Confidentiality  : 49.21%
