Twitter Sentiment Analysis of Popular programming language Python

Purpose

The purpose of this project is to apply learnings about Sentiment Analysis from IT518-Foundations of Data Science Course. Focuses on analyzing and understanding sentiments on popular programming language ‘Python’. In this study, data is collected and analyzed from recent twitter data related to the “python” to provide insights on how and where python is utilized in Data Science feild.

Sentiment Analysis

Sentiment analysis is contextual mining of text which identifies and extracts information in source material and assisting a business to understand the social sentiment of their brand, product or service while monitoring online conversations.

Steps 1: To have access to the Twitter API, login to Twitter Developer website and create an application. After registering, create an access token and grab your application’s Consumer Key, Consumer Secret, Access token and Access token secret from Keys and Access Tokens tab.

Step 2. To Build corpus, I will first use some packages in R. These are twitter, plyr, stringr and ggplot2. You can install these packages by the following commands:

Step 3: After twitter auth, we can fetch most recent tweets related to any keyword. I have used #python as python is mostly used for data analysis. p_tweets <- searchTwitter('python', n=1500, lang="en") p_tweets <- twListToDF(p_tweets) This command will get 1500 tweets related to Python. The function “searchTwitter” is used to download recent tweets. Now we need to convert this list of 1500 tweets into the data frame.

Step 4: Build a corpus and clean the corpus by removing punctuation, repeating words, links, numbers, and white spaces

p_text <- p_tweets$text

p_tweets <- tolower(ag_text)

p_tweets <- gsub("rt", "", p_tweets)

p_tweets <- gsub("@\w+", "", p_tweets)

p_tweets <- gsub("[[:punct:]]", "", p_tweets)

p_tweets <- gsub("http\w+", "", p_tweets)

p_tweets <- gsub("[ |\t]{2,}", "", p_tweets)

p_tweets <- gsub("^ ", "", p_tweets)

p_tweets <- gsub(" $", "", p_tweets)

p_tweets <- gsub("and", "", p_tweets)

p_tweets <- gsub("the", "", p_tweets)

p_tweets <- gsub("[^\x01-\x7F]", "", p_tweets)

p_bd_corpus = Corpus(VectorSource(p_tweets))

Inspect the corpus

inspect(p_bd_corpus[1:10])

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, tolower)

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, removePunctuation)

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, removeNumbers)

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, removeWords, stopwords("en"))

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, stripWhitespace)

p_bd_corpus_bd_clean <- iconv(p_bd_corpus,to="utf-8-mac")

p_bd_corpus_bd_clean <- tm_map(p_bd_corpus, removeWords, c("python", "Python"))

Step 5: Visualizing the tweets using Word Cloud

According to this wordcloud and Bar plot, we can see that Python is the most used term in the tweets followed by Bigdata, datascience, IOT, tenserflow, analytics which shows that while tweeting about python the person also connect the term python with the words like gdata, datascience, IOT, tenserflow.

References

Shashank Gupta, Sentiment Analysis: Concept, Analysis and Applications https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17, 2018

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
IT518_extra_credit.docx		IT518_extra_credit.docx
README.md		README.md
Screen Shot 2020-05-10 at 10.39.30 PM.png		Screen Shot 2020-05-10 at 10.39.30 PM.png
Screen Shot 2020-05-10 at 6.17.13 PM.png		Screen Shot 2020-05-10 at 6.17.13 PM.png
sentiment_analysis.R		sentiment_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis of Popular programming language Python

About

Uh oh!

Releases

Packages

Languages

ilma-sheriff/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis of Popular programming language Python

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages