# Sentiment analysis using existing toolkits like Vader and TextBlob

In [6]:
pip install vaderSentiment

Note: you may need to restart the kernel to use updated packages.Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
     ------------------------------------ 126.0/126.0 kB 820.6 kB/s eta 0:00:00
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2




[notice] A new release of pip available: 22.2.2 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [7]:
import pandas as pd
import csv
import re
import numpy as np
import plotly.express as px
from plotly.offline import init_notebook_mode
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from pre import clean_text, remove_stopwords

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [9]:

df = pd.read_csv("../data/threads_data.csv")
df

Unnamed: 0,tweet_id_str,date_time,location,tweet_text,media_urls
0,1907092398431510754,Tue Apr 01 15:27:14 +0000 2025,Fascist Regime,"must watch kunal kamra , another stand comedia...",
1,1903892256891052215,Sun Mar 23 19:31:01 +0000 2025,,modi bhakts cell destroyed top class humour st...,
2,1907089741432180745,Tue Apr 01 15:16:41 +0000 2025,K'tka,"devesh dixit , stand comedian came another nig...",
3,1907337838401646625,Wed Apr 02 07:42:31 +0000 2025,K'tka,"devesh dixit , stand comedian taking bjp clean...",
4,1858545925326856373,Mon Nov 18 16:20:53 +0000 2024,India,"samay raina , best stand comedian india right ...",
...,...,...,...,...,...
90,939886467089760257,Sun Dec 10 15:56:10 +0000 2017,coolandfunnytshirts@gmail.com,well deserved. practically one congress worth ...,
91,1173619244124258304,Mon Sep 16 15:26:47 +0000 2019,"New Delhi, India","start speech ravish kumar , moving nidhi razda...",
92,1908802423881080920,Sun Apr 06 08:42:16 +0000 2025,,gadmad ho gya sab .. one leader modiji one jok...,
93,1909218199293338054,Mon Apr 07 12:14:24 +0000 2025,wherever you are,stand comedian video likely nishant tanwar. ro...,


In [10]:
df['tweet_text']

0     must watch kunal kamra , another stand comedia...
1     modi bhakts cell destroyed top class humour st...
2     devesh dixit , stand comedian came another nig...
3     devesh dixit , stand comedian taking bjp clean...
4     samay raina , best stand comedian india right ...
                            ...                        
90    well deserved. practically one congress worth ...
91    start speech ravish kumar , moving nidhi razda...
92    gadmad ho gya sab .. one leader modiji one jok...
93    stand comedian video likely nishant tanwar. ro...
94    politics imtiazsiddique netan enjoy ... comedy...
Name: tweet_text, Length: 95, dtype: object

<hr>

## 1. Sentiment Analysis with TextBlob

In [11]:
def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
    return TextBlob(text).sentiment.polarity

In [12]:

your_text = 'I like this comedy'
getPolarity(your_text), getSubjectivity(your_text)

(0.0, 0.0)

We now have subjectivity and polarity scores for each of our tweets, which we add to our dataframe.

In [13]:
df['tweet_text'] = df['tweet_text'].fillna('')

def getSubjectivity(text):
    return TextBlob(text).sentiment.subjectivity

def getPolarity(text):
    return TextBlob(text).sentiment.polarity

df['subjectivity'] = df['tweet_text'].apply(getSubjectivity)
df['polarity'] = df['tweet_text'].apply(getPolarity)

df

Unnamed: 0,tweet_id_str,date_time,location,tweet_text,media_urls,subjectivity,polarity
0,1907092398431510754,Tue Apr 01 15:27:14 +0000 2025,Fascist Regime,"must watch kunal kamra , another stand comedia...",,0.750000,-0.587500
1,1903892256891052215,Sun Mar 23 19:31:01 +0000 2025,,modi bhakts cell destroyed top class humour st...,,0.415179,0.196429
2,1907089741432180745,Tue Apr 01 15:16:41 +0000 2025,K'tka,"devesh dixit , stand comedian came another nig...",,0.500000,-0.300000
3,1907337838401646625,Wed Apr 02 07:42:31 +0000 2025,K'tka,"devesh dixit , stand comedian taking bjp clean...",,0.500000,-0.300000
4,1858545925326856373,Mon Nov 18 16:20:53 +0000 2024,India,"samay raina , best stand comedian india right ...",,0.417857,0.642857
...,...,...,...,...,...,...,...
90,939886467089760257,Sun Dec 10 15:56:10 +0000 2017,coolandfunnytshirts@gmail.com,well deserved. practically one congress worth ...,,0.550000,0.316667
91,1173619244124258304,Mon Sep 16 15:26:47 +0000 2019,"New Delhi, India","start speech ravish kumar , moving nidhi razda...",,0.000000,0.000000
92,1908802423881080920,Sun Apr 06 08:42:16 +0000 2025,,gadmad ho gya sab .. one leader modiji one jok...,,1.000000,-0.900000
93,1909218199293338054,Mon Apr 07 12:14:24 +0000 2025,wherever you are,stand comedian video likely nishant tanwar. ro...,,0.705000,0.175000


Creating a function to add a sentiment label to each tweet, based on it's polarity score.

In [14]:
def get_sentiment_label(score):
    if score < 0:
        return 'Negative'
    elif score == 0:
        return 'Neutral'
    else:
        return 'Positive'    

In [15]:
# Apply the get_sentiment_label function to the polarity column
# and add the sentiment results as a new column in our dataframe

df['TBsentiment'] = df['polarity'].apply(get_sentiment_label)
df

Unnamed: 0,tweet_id_str,date_time,location,tweet_text,media_urls,subjectivity,polarity,TBsentiment
0,1907092398431510754,Tue Apr 01 15:27:14 +0000 2025,Fascist Regime,"must watch kunal kamra , another stand comedia...",,0.750000,-0.587500,Negative
1,1903892256891052215,Sun Mar 23 19:31:01 +0000 2025,,modi bhakts cell destroyed top class humour st...,,0.415179,0.196429,Positive
2,1907089741432180745,Tue Apr 01 15:16:41 +0000 2025,K'tka,"devesh dixit , stand comedian came another nig...",,0.500000,-0.300000,Negative
3,1907337838401646625,Wed Apr 02 07:42:31 +0000 2025,K'tka,"devesh dixit , stand comedian taking bjp clean...",,0.500000,-0.300000,Negative
4,1858545925326856373,Mon Nov 18 16:20:53 +0000 2024,India,"samay raina , best stand comedian india right ...",,0.417857,0.642857,Positive
...,...,...,...,...,...,...,...,...
90,939886467089760257,Sun Dec 10 15:56:10 +0000 2017,coolandfunnytshirts@gmail.com,well deserved. practically one congress worth ...,,0.550000,0.316667,Positive
91,1173619244124258304,Mon Sep 16 15:26:47 +0000 2019,"New Delhi, India","start speech ravish kumar , moving nidhi razda...",,0.000000,0.000000,Neutral
92,1908802423881080920,Sun Apr 06 08:42:16 +0000 2025,,gadmad ho gya sab .. one leader modiji one jok...,,1.000000,-0.900000,Negative
93,1909218199293338054,Mon Apr 07 12:14:24 +0000 2025,wherever you are,stand comedian video likely nishant tanwar. ro...,,0.705000,0.175000,Positive


We can have a quick look at the sentiment distribution of the tweets as follows:

In [16]:
df['TBsentiment'].value_counts()

TBsentiment
Positive    65
Neutral     18
Negative    12
Name: count, dtype: int64

In [17]:
# Filter and print the negative sentiment tweets
negative_tweets = df[df['TBsentiment'] == 'Negative']
print("\nNegative Sentiment Tweets:")
print(negative_tweets[['tweet_text', 'polarity']])


Negative Sentiment Tweets:
                                           tweet_text  polarity
0   must watch kunal kamra , another stand comedia... -0.587500
2   devesh dixit , stand comedian came another nig... -0.300000
3   devesh dixit , stand comedian taking bjp clean... -0.300000
26  new stand bit talk little modiji , jaitley sah... -0.025568
28  comedians cancel stand shows due public outrag... -0.062500
57  look shakeel siddiqi , umar sharief , moin akh... -0.076923
68  remember laughing insanely stand ups raju sriv... -0.417143
72  everytime leftist comedian gets news outrageou... -0.350000
79  wonder rohan joshi ' aib failed ! raju shrivas... -0.398810
85  since dis qualified mp , must take standup com... -0.010606
86  joking becomes crime world ' largest democracy... -0.050000
92  gadmad ho gya sab .. one leader modiji one jok... -0.900000


In [18]:
sorted_df = df.sort_values(by=['polarity'], ascending=False)

Top 15 most _positive_ tweets, which are now the first 15 tweets in the new dataframe.

In [19]:
for i, tweet in enumerate(sorted_df.head(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 arnav goswami exposes us democrats rahul gandhi. arnav monologues best. 

2 rahul gandhi currently best stand comedian 

3 kunal kamra munawar faruqi best stand comics came region. pakistani standup comics dont even balls tweet jokes topics let alone perform infront audience . 

4 abhishek upmanyu best indian stand comedian. content , delivery , gestures , structure show. everything perfect . 

5 manik mahna upmanyu best indian stand comedians cap fr fr 

6 best joke stand comedian stands up. everytime congressi walks . 

7 kejriwal replaced rahul gandhi best comedian india 

8 raju srivastava , best ever stand comedian ever known. versatile personality always remembered humour amp gentlemanship. 

9 guys brilliant #politicalcomedy 

10 must read article stand comedian anti india propagandist ? brilliant article 

11 india produced great humourists satirists. stand , would vote jaspal bhatti. choices ? 

12 #kapilsharma overrated comedian propelled salman ' khan pr agencies. best sta

Top 15 most _negative_ tweets, which are now the last 15 tweets in the new dataframe.

In [20]:
for i, tweet in enumerate(sorted_df.tail(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 tallest shortest height jokers indian politics exposes level time amp leaves stone unturned entertain public 

2 days indian stand comedy reduced taking political potshot , ridiculing prime minister , maligning hindu religion amp festivals , glorifying separatists. comedy left . 

3 unfair ! many stand comedians news channels. remember #aajtak sweta singh nanochip r2000 note ? navika kumar raga movies ? gaurav sawant calf runs meet greet yogi ? bhupendra choube sunny leone ? arnab anytime ! 

4 since dis qualified mp , must take standup comedy new profession serious note. speeches far proved massive hidden talent man .. comedy .. comedy must real passion .. 

5 new stand bit talk little modiji , jaitley sahab , arnab goswami others. watch , rt like it. link 

6 joking becomes crime world ' largest democracy , wire ' discusses comedian sanjay rajoura , parvezhassan , political comedy becoming dangerous india. 

7 comedians cancel stand shows due public outrage. 1. munawar faroqqi 2. k

Now let's visualise distribution of the polarity and subjectivity assignments from TextBlob. To do this, we will make use of interactive plots from [Plotly](https://plotly.com/python/).

Interactive Plotly plots make use of JavaScript behind the scenes. To connect our Jupyter notebook with JavaScript, we need to execute the following line of code:

In [21]:
# ___Cell no. 12___

init_notebook_mode(connected=True)

Below, we use [Plotly Express](https://plotly.com/python/plotly-express/) to create a simple scatter plot of the polarity and subjectivity data. As this is an interactive plot, you will be able to hover your mouse over a point to view it's properties. 

Note how plotly express automatically labels our axes for us according to our dataframe column names.

In [22]:
from plotly.offline import plot

# ___Cell no. 13___

# Use plotly offline mode to render the plot
fig = px.scatter(df, x="polarity", y="subjectivity", hover_data=['tweet_text'],
                 title="TextBlob Sentiment Analysis")

# Plot offline
plot(fig)


'temp-plot.html'

<hr>

## 2. Sentiment Analysis with VADER

In [23]:
analyser = SentimentIntensityAnalyzer()

In [24]:
# ___Cell no. 15___

your_text = 'i like this movie'
analyser.polarity_scores(your_text)

{'neg': 0.0, 'neu': 0.545, 'pos': 0.455, 'compound': 0.3612}

Let us now use VADER to retrieve the compound sentiment score for all tweets and add this information to our original (unsorted) dataframe.

In [25]:
#Create a function to get the polarity

def get_vaderCompoundPolarity(text):
    return analyser.polarity_scores(text)['compound']
    
df['vader_compound'] = df['tweet_text'].apply(get_vaderCompoundPolarity)
df

Unnamed: 0,tweet_id_str,date_time,location,tweet_text,media_urls,subjectivity,polarity,TBsentiment,vader_compound
0,1907092398431510754,Tue Apr 01 15:27:14 +0000 2025,Fascist Regime,"must watch kunal kamra , another stand comedia...",,0.750000,-0.587500,Negative,0.1531
1,1903892256891052215,Sun Mar 23 19:31:01 +0000 2025,,modi bhakts cell destroyed top class humour st...,,0.415179,0.196429,Positive,0.1027
2,1907089741432180745,Tue Apr 01 15:16:41 +0000 2025,K'tka,"devesh dixit , stand comedian came another nig...",,0.500000,-0.300000,Negative,0.3818
3,1907337838401646625,Wed Apr 02 07:42:31 +0000 2025,K'tka,"devesh dixit , stand comedian taking bjp clean...",,0.500000,-0.300000,Negative,0.3818
4,1858545925326856373,Mon Nov 18 16:20:53 +0000 2024,India,"samay raina , best stand comedian india right ...",,0.417857,0.642857,Positive,0.7783
...,...,...,...,...,...,...,...,...,...
90,939886467089760257,Sun Dec 10 15:56:10 +0000 2017,coolandfunnytshirts@gmail.com,well deserved. practically one congress worth ...,,0.550000,0.316667,Positive,0.7650
91,1173619244124258304,Mon Sep 16 15:26:47 +0000 2019,"New Delhi, India","start speech ravish kumar , moving nidhi razda...",,0.000000,0.000000,Neutral,0.0000
92,1908802423881080920,Sun Apr 06 08:42:16 +0000 2025,,gadmad ho gya sab .. one leader modiji one jok...,,1.000000,-0.900000,Negative,-0.0772
93,1909218199293338054,Mon Apr 07 12:14:24 +0000 2025,wherever you are,stand comedian video likely nishant tanwar. ro...,,0.705000,0.175000,Positive,0.9442


Let us once again apply the 'get_sentiment_label' function to assign the VADER sentiment of each tweet given the compound score.

In [26]:

# Apply the get_sentiment_label function to the VADER compound score
# and add the VADER sentiment results as a new column in our dataframe

df['VADERsentiment'] = df['vader_compound'].apply(get_sentiment_label)
df

Unnamed: 0,tweet_id_str,date_time,location,tweet_text,media_urls,subjectivity,polarity,TBsentiment,vader_compound,VADERsentiment
0,1907092398431510754,Tue Apr 01 15:27:14 +0000 2025,Fascist Regime,"must watch kunal kamra , another stand comedia...",,0.750000,-0.587500,Negative,0.1531,Positive
1,1903892256891052215,Sun Mar 23 19:31:01 +0000 2025,,modi bhakts cell destroyed top class humour st...,,0.415179,0.196429,Positive,0.1027,Positive
2,1907089741432180745,Tue Apr 01 15:16:41 +0000 2025,K'tka,"devesh dixit , stand comedian came another nig...",,0.500000,-0.300000,Negative,0.3818,Positive
3,1907337838401646625,Wed Apr 02 07:42:31 +0000 2025,K'tka,"devesh dixit , stand comedian taking bjp clean...",,0.500000,-0.300000,Negative,0.3818,Positive
4,1858545925326856373,Mon Nov 18 16:20:53 +0000 2024,India,"samay raina , best stand comedian india right ...",,0.417857,0.642857,Positive,0.7783,Positive
...,...,...,...,...,...,...,...,...,...,...
90,939886467089760257,Sun Dec 10 15:56:10 +0000 2017,coolandfunnytshirts@gmail.com,well deserved. practically one congress worth ...,,0.550000,0.316667,Positive,0.7650,Positive
91,1173619244124258304,Mon Sep 16 15:26:47 +0000 2019,"New Delhi, India","start speech ravish kumar , moving nidhi razda...",,0.000000,0.000000,Neutral,0.0000,Neutral
92,1908802423881080920,Sun Apr 06 08:42:16 +0000 2025,,gadmad ho gya sab .. one leader modiji one jok...,,1.000000,-0.900000,Negative,-0.0772,Negative
93,1909218199293338054,Mon Apr 07 12:14:24 +0000 2025,wherever you are,stand comedian video likely nishant tanwar. ro...,,0.705000,0.175000,Positive,0.9442,Positive


let's have a look at what VADER has classified as the 15 most postive and negative tweets by using the same method shown in the TextBlob example.

In [27]:
sorted_df2 = df.sort_values(by=['vader_compound'], ascending=False)

In [28]:

for i, tweet in enumerate(sorted_df2.head(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 gurpatwant singh pannu , upcoming standup comedian republic khalistan ! disclaimer purely entertainment purposes. get ready laughs ! seems gurpatwant singh pannu overtaken neetu shatran wala world standup comedy ! competing kapil sharma though sadly , visit india. perhaps sony entertainment television netflix brace threats near future ! secret hideout , pannu recently released two hilarious video clips. one , offers 25k information whereabouts public appearances indian ambassador shri vinay kwatra russian high commissioner canada. real joke diplomats schedules events already publicly announced ! well respected figures , educated , responsible. seems pannu looking way stay spotlight. second statement , pannu hilariously claimed russia supplying chemical grenades india used farmers agitation , ones used ukraine war. threatened india chemical minister , according , amit shah. unfortunately , actual chemical minister jp nadda , shah. looks like pannu scriptwriter needs fired mix ! think 

In [29]:
#Print out the text from the last 15 tweets in the sorted dataframe

for i, tweet in enumerate(sorted_df2.tail(15)['tweet_text']):
    print(i+1, tweet, '\n')

1 tallest shortest height jokers indian politics exposes level time amp leaves stone unturned entertain public 

2 must watch kunal kamra , another stand comedian brutally trolled bjp demonetization done common people , buy mlas another burn l moment bjp cell amp rw ecosystem 

3 modi bhakts cell destroyed top class humour stand kunal kamra shaken entire bjp , filing firs left right this. watch deleted 

4 look shakeel siddiqi , umar sharief , moin akhtar epitome comedy , loose talk , barkat uzmi , khalid butt 

5 start speech ravish kumar , moving nidhi razdan , rajdeep sardesai , sagarika ghose , course barkha dutt , srinivasan jain , faye souza , karan thapar , abhisar sharma , arfa khanum , shekhar gupta , prannoy roy amp tell students anything idolize 

6 india #standupcomedy ecosystem used entirely dominated wokes qaumreds. slowly , h right breaking doors bastion too. 

7 #standup #kunalkamara #kunalkamracontroversy #politicaljoke #politicalsatire 

8 gadmad ho gya sab .. one lea

Compare the sentiment assignments of TextBlob and VADER by plotting another Plotly interactive scatter plot.

In [30]:
import plotly.express as px
from plotly.offline import plot

fig = px.scatter(df, x="polarity", y="vader_compound", hover_data=['tweet_text'],
                 title="TextBlob vs VADER")

plot(fig, filename='textblob_vs_vader.html')

'textblob_vs_vader.html'