<a href="https://colab.research.google.com/github/nakhimchea/sentiment_analysis_ipynb/blob/main/SmartSA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# *Import Libraries*

**SmartSA: Analysis of Tweeter and Apply to Strategy**

In [None]:
!pip install transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from scipy.special import softmax

import time
import numpy
import pandas

**Twitter Scraping**

In [None]:
!pip install snscrape
import snscrape.modules.twitter as twitter

# *Loading RoBERTa model*

**Get Model**

In [None]:
RoBERTa = 'cardiffnlp/twitter-roberta-base-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(RoBERTa)

**Initialize Tokenizer**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(RoBERTa)

**Sparse Labels**

In [5]:
labels = ['Negative', 'Neutral', 'Positive']

# *Getting Data from Social Network (Twitter)*

**Search Query**

In [80]:
query = 'BTC (BTC OR ETH OR SOL OR NEAR) -DOGE -SHIB lang:en min_replies:2 min_faves:1 min_retweets:0 since_time:1654621200 until_time:1655053200'
limit = 1000

**Query Tweets and Preprocessing**

In [81]:
tweets = []
for tweet in twitter.TwitterSearchScraper(query).get_items():
  # print(tweet)
  
  #preprocessing tweets
  tweetWords = []
  for word in tweet.content.split(' '):
    if word.startswith('@') and len(word) > 1:
      word = '@user'
    elif word.startswith('http'):
      word = 'http'
    tweetWords.append(word)

  tweetContent = ' '.join(tweetWords)

  if len(tweets) == limit:
    break
  else:
    tweets.append([tweet.date, tweet.username, tweetContent])

tweetsDF = pandas.DataFrame(tweets, columns=['Date', 'User', 'Tweet'])

**Print DataFrame**

In [82]:
print(tweetsDF)

                         Date             User  \
0   2022-06-12 16:56:37+00:00         BigCheds   
1   2022-06-12 16:55:50+00:00  thedeliriumclub   
2   2022-06-12 16:55:07+00:00  CrazyLadyTrader   
3   2022-06-12 16:53:46+00:00           tihols   
4   2022-06-12 16:53:09+00:00          nomee83   
..                        ...              ...   
995 2022-06-11 18:29:52+00:00       JohalMiles   
996 2022-06-11 18:29:25+00:00        Andy47640   
997 2022-06-11 18:27:43+00:00    capitalist_sd   
998 2022-06-11 18:27:26+00:00  ItsAirplaneJane   
999 2022-06-11 18:24:37+00:00    davevickery12   

                                                 Tweet  
0    $BTC 4H so far rejected at confluence of under...  
1    The Real #NFTcollectors come, justify and coll...  
2    Sure I believe it got too overhyped. It was in...  
3    @user Are you considering the BTC pair as well...  
4      @user What you think today btc pump bull trap ?  
..                                                 ...  


**Save Tweets**

In [70]:
tweetsDF.to_csv('tweets.csv')

# *Sentiment Analysis*

**Tweet Classifications**

In [86]:
target = []
for index in range(len(tweetsDF.Tweet)):
  encodedTweet = tokenizer(tweetsDF.Tweet[index], return_tensors='pt')
  roBERTaBottleNeck = model(**encodedTweet)
  probabilities = softmax(roBERTaBottleNeck[0][0].detach().numpy())
  #probabilities = [probabilities[0]+probabilities[1], probabilities[1]+probabilities[2]]

  #newLabels = ['Negative', 'Positive']
  target.append([tweetsDF.Tweet[index], labels[numpy.argmax(probabilities)]])

**Classification Table**

In [87]:
targetTable = pandas.DataFrame(target, columns=['Tweet', 'Annotation'])
print(targetTable)

                                                 Tweet Annotation
0    $BTC 4H so far rejected at confluence of under...   Negative
1    The Real #NFTcollectors come, justify and coll...    Neutral
2    Sure I believe it got too overhyped. It was in...   Negative
3    @user Are you considering the BTC pair as well...    Neutral
4      @user What you think today btc pump bull trap ?    Neutral
..                                                 ...        ...
995  DXY going back for another run at the highs.\n...    Neutral
996  @user @user @user @user Which was achieved ear...    Neutral
997  Still you people selling $ETH..?\n\n$ETH, $BTC...    Neutral
998  #BTC next #AVWAP looks to be around 26,560 on ...    Neutral
999  @user Depends on the exchange.  A lot are prob...    Neutral

[1000 rows x 2 columns]


**Normalize Result**

In [88]:
countLabels = [0, 0, 0]
for index in range(0, len(targetTable.Annotation)):
  if targetTable.Annotation[index] == labels[0]:
    countLabels[0] += 1
  elif targetTable.Annotation[index] == labels[1]:
    countLabels[1] += 1
  elif targetTable.Annotation[index] == labels[2]:
    countLabels[2] += 1

print('Probability      : {:.2f} {:.2f} {:.2f}'.format(countLabels[0]/sum(countLabels), countLabels[1]/sum(countLabels), countLabels[2]/sum(countLabels)))
print('Final Analysis   : {}'.format(labels[numpy.argmax(countLabels)]))
print('Confidentiality  : {:.2f}%'.format(countLabels[numpy.argmax(countLabels)]/sum(countLabels)*100))

Probability      : 0.22 0.58 0.20
Final Analysis   : Neutral
Confidentiality  : 58.20%
