<a href="https://colab.research.google.com/github/shreyasd1/twitter_sentiment_analysis/blob/main/twitter_sentiment_analysis_nlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**importing libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tweepy as tw
from wordcloud import WordCloud
import re
from textblob import TextBlob
plt.style.use('fivethirtyeight') 

**Twitter Api auth**

In [None]:
consumer_key= ''
consumer_secret= ''
access_token= ''
access_token_secret= ''

In [None]:
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

**extracting tweets**

In [None]:
tweets = api.user_timeline(screen_name = 'BillGates', count = 200, lang = 'en', tweet_mode = 'extended')
i = 1
for tweet in tweets[0:5]:
  print(str(i) + ')' + tweet.full_text + '\n')
  i += 1

1)Africa's success against wild polio last year is one of the most inspiring stories in global health. I’m optimistic that, with the commitment and collaboration that achieved that goal, countries in the region can beat all forms of the virus and #EndPolio https://t.co/lpJsrhAQcf

2)None of us would be where we are today without the incredible teachers who helped shape our perspectives. Brooke Brown is an extraordinary teacher who has helped her students adapt to extraordinary times. https://t.co/btiQikTyVX

3).@GAP_Foundation’s Bio-Hermes Study is the first to compare results of blood and digital biomarker tests with imaging and traditional cognitive tests. This could help identify Alzheimer’s earlier and detect progression of the disease, to help us get closer to a cure. https://t.co/9xUQ0ZldSI

4)I recently sat down with the extraordinary Brooke Brown, Washington State’s 2021 Teacher of the Year. It was a pleasure to meet with her and take part in one of her favorite lessons: https:

In [None]:
df = pd.DataFrame([tweet.full_text for tweet in tweets], columns= ['tweet'])
df.insert(1, "retweets", [tweet.retweet_count for tweet in tweets], True)
df.insert(2, "likes", [tweet.favorite_count for tweet in tweets], True)
df.shape

(200, 3)

**applying stemming and removing stopwords**

In [None]:
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer           # stem means converting liked, liking etc to like similarly 
corpus = []
for i in range(0, 200):
  tweet = re.sub(r'@[A-Za-z0-9]+', '', df['tweet'][i]) # removing mentions
  tweet = re.sub(r'#', '', df['tweet'][i]) # removing Hashtags
  tweet = re.sub(r'RT[\s]+', '', df['tweet'][i]) # removing RT
  tweet = re.sub(r'https?:\/\/\S+', '', df['tweet'][i])
  tweet = tweet.lower()                                   # lowercase all the letters
  tweet = tweet.split()                                   # split tweet into different words
  ps = PorterStemmer()                                      # applying stemming
  all_stopwords = stopwords.words('english')
  all_stopwords.remove('not')
  tweet = [ps.stem(word) for word in tweet if not word in set(all_stopwords)] # applied stemming and removed stopwords
  tweet = ' '.join(tweet)                                 # join words with space between them
  corpus.append(tweet)
print(corpus)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
["africa' success wild polio last year one inspir stori global health. i’m optimist that, commit collabor achiev goal, countri region beat form viru #endpolio", 'none us would today without incred teacher help shape perspectives. brook brown extraordinari teacher help student adapt extraordinari times.', '.@gap_foundation’ bio-herm studi first compar result blood digit biomark test imag tradit cognit tests. could help identifi alzheimer’ earlier detect progress disease, help us get closer cure.', 'recent sat extraordinari brook brown, washington state’ 2021 teacher year. pleasur meet take part one favorit lessons:', 'happi birthday, warren! you’v true friend invalu sourc advic years, i’m lucki know you. it’ honor work foundat best turn generos live save improved. here’ mani birthdays!', 'invest malaria program help build stronger health system not bring end malaria als

**Bag of words model**

In [None]:
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
  return TextBlob(text).sentiment.polarity

df['subjectivity'] = df['tweet'].apply(getSubjectivity)
df['polarity'] = df['tweet'].apply(getPolarity)
def getAnalysis(score):
  if score < 0:
    return 0
  elif score == 0:
    return 0
  else:
    return 1

df['liked'] = df['polarity'].apply(getAnalysis)
df

Unnamed: 0,tweet,retweets,likes,subjectivity,polarity,liked
0,Africa's success against wild polio last year ...,211,2126,0.327778,0.233333,1
1,None of us would be where we are today without...,211,2228,0.966667,0.522222,1
2,.@GAP_Foundation’s Bio-Hermes Study is the fir...,155,1189,0.395833,0.062500,1
3,I recently sat down with the extraordinary Bro...,191,1853,0.634722,0.169444,1
4,"Happy Birthday, Warren! You’ve been a true fri...",722,8185,0.630556,0.634722,1
...,...,...,...,...,...,...
195,RT @melindagates: My parents taught me to leav...,299,0,0.500000,0.500000,1
196,I’m a big fan of author @harari_yuval and was ...,482,4184,0.533333,0.208333,1
197,RT @GlobalFund: “I will continue fighting. I w...,125,0,0.312500,0.062500,1
198,Today is the 25th anniversary of my first book...,543,6805,0.393810,0.067143,1


In [None]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features= 900)
X = cv.fit_transform(corpus).toarray()
y = df.iloc[:, -1].values

**training model**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0) 

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [None]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
result = accuracy_score(y_test, y_pred)
print("accuracy: ", result * 100, "%")

[[ 2  8]
 [ 1 29]]
accuracy:  77.5 %
