Text Analysis performed on a custom dataset using RStudio.
The idea for this project sprung from a popular YouTube video by Glamour. People between the ages of 5 and 75 were asked a question about their biggest regret.
In order to convert the video into a workable dataset, I transcribed the responses provided by the individuals into an Excel sheet, which was then exported as a CSV file.
The resulting CSV file contained 3 columns, age, gender, and the response and 75 rows.
Age | Regret | Gender |
---|---|---|
5 | "I went to the play ground and I want to go again today" | F |
The entire code is done using R and RStudio. More details about necessary libraries can be found in the code, which remaind the same for most text analysis and sentiment analysis.
- SnowballC: An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary.
- wordcloud: Functionality to create pretty word clouds, visualize differences and similarity between documents, and avoid over-plotting in scatter plots with text.
- syuzhet: Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users.
For the prupose of this project and due to the limited amount of data available, I have performed text analysis on the entire dataset. Depedning on the size, type, and genre of dataset at hand text analysis can be performed by splitting the data into personalized categories (eg: age groups, gender, genre etc.)
The final results produced include a wordcloud of the most frequently appearing terms in the term document matrix as well as a sentiment analysis graph which shows the percentage of occurence of the 8 most common emotions.
To read more details about NLP/ Text Analysis in R, please refer the article here
© Akshaya Parthasarathy, 2022
Feedback is always welcome, drop a message on