# Sentiment Analysis on Demonitization in India using Twitter Data 

The demonetization of ₹500 and ₹1000 banknotes was a step taken by the Government of India on 8 November 2016, ceasing the usage of all ₹500 and ₹1000 banknotes of the Mahatma Gandhi Series as a form of legal tender in India from 9 November 2016.

The announcement was made by the Prime Minister of India Narendra Modi in an unscheduled live televised address to the nation at 20:15 Indian Standard Time (IST) the same day. In the announcement, Modi declared circulation of all ₹500 and ₹1000 banknotes of the Mahatma Gandhi Series as invalid and announced the issuance of new ₹500 and ₹2000 banknotes of the Mahatma Gandhi New Series in exchange for the old banknotes.

Content
The data contains 6000 most recent tweets on #demonetization. There are 6000 rows(one for each tweet) and 14 columns.

Metadata:
1. Text (Tweets)
2. favorited
3. favoriteCount
4. replyToSN
5. created
6. truncated
7. replyToSID
8. id
9. replyToUID
10. statusSource
11. screenName
12. retweetCount
13. isRetweet
14. retweeted

In [18]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [19]:
data = pd.read_csv("demonetization-tweets.csv", encoding="ISO-8859-1")
data.iloc[i]['text']

'RT @_Kirtibhairav: @Dkomal_KD @jaf_jamesbond @American__Singh @Noomiiali @iAkshayTeotia7 @iamnotthatrahul @sonunigam @priyankachopra @rajsr\x85'

In [20]:
from nltk.tokenize import word_tokenize

tweets = data.text
tokenized_tweets = []
rt_id = []
hash_id = []
for i in range(tweets.shape[0]):
    t = word_tokenize(tweets[i])
    if t[0] == 'RT':
        rt_id.append(t[2])
    else:
        rt_id.append(np.nan)
    
    if "#" in t:
        try:
            hash_id.append(t[t.index("#")+1].lower())
        except IndexError:
            hash_id.append(np.nan)
    else:
        hash_id.append(np.nan)
        
    tokenized_tweets.append(t)
    
data['rt_id'] = rt_id
data['hash_id'] = hash_id

data.head()

Unnamed: 0.1,Unnamed: 0,X,text,favorited,favoriteCount,replyToSN,created,truncated,replyToSID,id,replyToUID,statusSource,screenName,retweetCount,isRetweet,retweeted,rt_id,hash_id
0,1,1,RT @rssurjewala: Critical question: Was PayTM ...,False,0,,2016-11-23 18:40:30,False,,8.014957e+17,,"<a href=""http://twitter.com/download/android"" ...",HASHTAGFARZIWAL,331,True,False,rssurjewala,demonetization
1,2,2,RT @Hemant_80: Did you vote on #Demonetization...,False,0,,2016-11-23 18:40:29,False,,8.014957e+17,,"<a href=""http://twitter.com/download/android"" ...",PRAMODKAUSHIK9,66,True,False,Hemant_80,demonetization
2,3,3,"RT @roshankar: Former FinSec, RBI Dy Governor,...",False,0,,2016-11-23 18:40:03,False,,8.014955e+17,,"<a href=""http://twitter.com/download/android"" ...",rahulja13034944,12,True,False,roshankar,demonetization
3,4,4,RT @ANI_news: Gurugram (Haryana): Post office ...,False,0,,2016-11-23 18:39:59,False,,8.014955e+17,,"<a href=""http://twitter.com/download/android"" ...",deeptiyvd,338,True,False,ANI_news,demonetization
4,5,5,RT @satishacharya: Reddy Wedding! @mail_today ...,False,0,,2016-11-23 18:39:39,False,,8.014954e+17,,"<a href=""http://cpimharyana.com"" rel=""nofollow...",CPIMBadli,120,True,False,satishacharya,demonetization


In [21]:
import nltk
tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+')
tokenized_tweets = []
for tweet in tweets:
    tokenized_tweets.append(tokenizer.tokenize(tweet))
    
tokenized_tweets

[['RT',
  'rssurjewala',
  'Critical',
  'question',
  'Was',
  'PayTM',
  'informed',
  'about',
  'Demonetization',
  'edict',
  'by',
  'PM',
  'It',
  's',
  'clearly',
  'fishy',
  'and',
  'requires',
  'full',
  'disclosure',
  'amp'],
 ['RT',
  'Hemant_80',
  'Did',
  'you',
  'vote',
  'on',
  'Demonetization',
  'on',
  'Modi',
  'survey',
  'app'],
 ['RT',
  'roshankar',
  'Former',
  'FinSec',
  'RBI',
  'Dy',
  'Governor',
  'CBDT',
  'Chair',
  'Harvard',
  'Professor',
  'lambaste',
  'Demonetization',
  'If',
  'not',
  'for',
  'Aam',
  'Aadmi',
  'listen',
  'to',
  'th'],
 ['RT',
  'ANI_news',
  'Gurugram',
  'Haryana',
  'Post',
  'office',
  'employees',
  'provide',
  'cash',
  'exchange',
  'to',
  'patients',
  'in',
  'hospitals',
  'demonetization',
  'https',
  't',
  'co',
  'uGMxUP9'],
 ['RT',
  'satishacharya',
  'Reddy',
  'Wedding',
  'mail_today',
  'cartoon',
  'demonetization',
  'ReddyWedding',
  'https',
  't',
  'co',
  'u7gLNrq31F'],
 ['DerekSciss

As you can see the word tokenizer does not recognize the emoticons, hashtags and @mentions as separate tokens, so we need to build a custom word tokenizer for tweets. Tweets may also contain hyperlinks, numbers (in this case this we might see tweets regarding Rs.500 and Rs.1000 notes) etc.

In [22]:
# Removing Stop Words

import nltk
stop_words = nltk.corpus.stopwords.words("english")

for i,token in enumerate(tokenized_tweets):
    for sw in stop_words:
        if sw in token:
            tokenized_tweets[i].remove(sw)
            


In [26]:
tokens_combined = [token for tokens_user in tokenized_tweets for token in tokens_user]
unique_tokens = list(set(tokens_combined))
len(tokens_combined)
nltk.FreqDist(tokens_combined)
# lexical diversity
lex_div = len(unique_tokens)/len(tokens_combined)
lex_div

token_freq = nltk.FreqDist(tokens_combined)
sorted(token_freq,key = token_freq.get,reverse=True)

['U',
 'RT',
 'demonetization',
 'https',
 'co',
 'ed',
 'Demonetization',
 'India',
 'Modi',
 '00A0',
 'PM',
 'to',
 '00BD',
 'amp',
 '00B8',
 'Narendra',
 'is',
 'rich',
 'find',
 'Dear',
 'implement',
 'evanspiegel',
 'actually',
 'URautelaForever',
 'narendramodi',
 'people',
 'in',
 'DeMonetization',
 'the',
 'bank',
 'The',
 'Rs',
 'I',
 'It',
 't',
 'And',
 'support',
 'J',
 'K',
 'lakh',
 'cash',
 'since',
 'That',
 'terrorists',
 '40',
 'impact',
 'Third',
 'looted',
 'incident',
 'Nation',
 'Kishtwar',
 'ModiBharosa',
 'gauravcsawant',
 'back',
 '093E',
 'money',
 'like',
 'ATMs',
 'YouTube',
 '0915',
 '2',
 'Mr',
 'says',
 'supports',
 'question',
 'due',
 'goes',
 '0935',
 '0941',
 'of',
 '092D',
 'DrKumarVishwas',
 '00A2',
 '00AD',
 'Oscar',
 '00A9',
 '00A5',
 'ObQrhlNSL6',
 'BJP',
 'After',
 'whether',
 'still',
 'clearly',
 'PayTM',
 'notes',
 'good',
 'e',
 'full',
 'Was',
 'shortage',
 '008D',
 'poor',
 'We',
 'C',
 'rssurjewala',
 '80',
 'informed',
 'Critical',
 'edi

In [24]:
unique_token_count = [tokens_combined.count(token) for token in unique_tokens]

unique_tokens_df = pd.DataFrame([unique_tokens,unique_token_count]).transpose()
unique_tokens_df.columns = ['Tokens','Count']
unique_tokens_df = unique_tokens_df.sort_values('Count', ascending = False)
unique_tokens_df.head()

Unnamed: 0,Tokens,Count
6536,U,14496
8752,RT,11057
1099,demonetization,8001
12978,https,6526
5937,co,5669


In [None]:
unique_tokens_df.head(15)

In [None]:
#Analysing hash tags in the tweets
unique_hash = list(set(hash_id))
hash_count = [hash_id.count(x) for x in unique_hash]
hash_dict = dict(zip(unique_hash, hash_count))

import operator
sorted_hash_count = sorted(hash_dict.items(), key=operator.itemgetter(1), reverse = True)

As expected the most common hashtag used in demonetization/demonetisation, others include

In [None]:
[x[0] for x in sorted_hash_count if "demon" in str(x[0])]

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.sentiment.util import *

from nltk import tokenize

sia = SentimentIntensityAnalyzer()

#tweets.apply(lambda x:sia.polarity_scores(x))[0]

data['sentiment'] = data['text'].apply(sia.polarity_scores)

In [None]:
data['pos_sent'] = data['sentiment'].apply(lambda x: x['pos'])
data['neg_sent'] = data['sentiment'].apply(lambda x: x['neg'])
data['neut_sent'] = data['sentiment'].apply(lambda x: x['neu'])
data['compound_sent'] = data['sentiment'].apply(lambda x: x['compound'])

In [None]:
sent_data = data[['pos_sent','neg_sent','neut_sent','compound_sent']]
sent_data.head()

In [None]:
sent_data['sentiment'] = sent_data.apply(lambda x: 'NEUTRAL' if x['compound_sent'] == 0 else ("POSITIVE" if x['pos_sent'] > x['neg_sent'] else "NEGATIVE"), axis=1)
sent_data.head()

In [None]:
sent_data['sentiment'].value_counts().plot.barh(alpha = 0.65)
plt.title("Sentiment Analysis on \n Demonetization in India")
plt.xlabel("Tweet Count")

The sentiment has been mostly positive regarding the demonitization in india. 

In [None]:
pd.options.mode.chained_assignment = None


In [None]:
os.rpython -m pip3 install wordcloud