Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time
March 17, 2019 19:00
June 20, 2019 05:53
August 8, 2020 23:39

Language HitCount

Twitter Sentiment Analysis using R

A detailed sentiment analysis of Karnataka State Elections 2018 in India and gauge its impact on the final results. The entire summary of the project can be found in the presentation. The code has been reviewed by Rachael Tatman, Data Scientist at Kaggle, and can be watched on YouTube.

Table of contents

General info

The purpose of the project is to understand how we can extract tweets from Twitter and perform sentiment analysis on it. I find Indian politics very fascinating, and since the elections of one of the major state were coming up, I thought it would be interesting to understand how the sentiments are playing for the major parties and their leaders.


Example screenshot Example screenshot Example screenshot

Technologies and Tools

  • R - version 3.4
  • Microsoft Excel


The steps to extract tweets from Twitter with specific filters can be found in data extraction. Two files of positive and negative words have been used to classify the sentiment of the tweets and can be found here. The code can be used to replicate the results.

Code Examples

Some examples of usage:

ktk <- searchTwitter('#karnatka', n=no.of.tweets, lang="en")
ktk2 <- searchTwitter('#karnatkaelections2018', n=no.of.tweets, lang="en")
ktk3 <- searchTwitter('#karnatkaelection', n=no.of.tweets, lang="en")
ktk4 <- searchTwitter('#battleforkarnatka', n=no.of.tweets, lang="en")
ktk5 <- searchTwitter('#karnatkakurukshetra', n=no.of.tweets, lang="en")
ktk6 <- searchTwitter('#KarnatkaAssembly', n=no.of.tweets, lang="en")
ktk7 <- searchTwitter('#karnatkavoting', n=no.of.tweets, lang="en")
ktk8 <- searchTwitter('#karnatkapolling', n=no.of.tweets, lang="en")
bjp <- searchTwitter('bjp', n=10000, lang="en")
congress <- searchTwitter('congress', n=2000, lang="en")
namo <- searchTwitter('narendra modi', n=2000, lang="en")
raga <- searchTwitter('rahul gandhi', n=2000, lang="en")
core.sentiment = function(sentences, pos.words, neg.words, .progress='none')
  # we got a vector of sentences. plyr will handle a list
  # or a vector as an "l" for us
  # we want a simple array ("a") of scores back, so we use 
  # "l" + "a" + "ply" = "laply":
  scores = laply(sentences, function(sentence, pos.words, neg.words) {
    # clean up sentences with R's regex-driven global substitute, gsub():
    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)
    # and convert to lower case:
    sentence = tolower(sentence)
    # split into words. str_split is in the stringr package
    word.list = str_split(sentence, '\\s+')
    # sometimes a list() is one level of hierarchy too much
    words = unlist(word.list)
    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos.words)
    neg.matches = match(words, neg.words)
    # match() returns the position of the matched term or NA
    # we just want a TRUE/FALSE:
    pos.matches = !
    neg.matches = !
    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = sum(pos.matches) - sum(neg.matches)
  }, pos.words, neg.words, .progress=.progress )
  scores.df = data.frame(score=scores, text=sentences)
## Narendra Modi
namog <- ldply(namo,function(t) t$toDataFrame() )
result1 <- score.sentiment(namog$text,pos.words,neg.words)
hist(result1$score,col = 'dark orange', main = 'Sentiment Analysis for Narendra Modi ', ylab = 'Count of tweets')
write.xlsx(result1, "myResults.xlsx")


The sentiment analysis of the tweets was done successfully. The sentiment, however, does not indicate much deviations towards any particular party or leader.

To-do list:

  • Extract tweets for the entire week of election to gauge a trend in the sentiments.
  • Expand the horizon of the tweets extracted by including more political parties, leaders, and more hashtags.


Project is: finished, however, I would like to implement sentiment analysis in python using Natural Language Processing.


Indian news media is filled with umpteen number of analysis and statistics during elections. I always fancied everything displayed on the screen and wanted to try hands-on with some of the analysis done behind the screen.


Created by me.

If you loved what you read here and feel like we can collaborate to produce some exciting stuff, or if you just want to shoot a question, please feel free to connect with me on email, LinkedIn, or Twitter. My other projects can be found here.

GitHub Twitter