Skip to content

iaks23/R_Text_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

star-useful view-repo view-profile

R Text Analysis 📑

Text Analysis performed on a custom dataset using RStudio.

Table of Contents 🕹

📂 The Dataset

What is your biggest regret?

The idea for this project sprung from a popular YouTube video by Glamour. People between the ages of 5 and 75 were asked a question about their biggest regret.

Screenshot

In order to convert the video into a workable dataset, I transcribed the responses provided by the individuals into an Excel sheet, which was then exported as a CSV file.

The resulting CSV file contained 3 columns, age, gender, and the response and 75 rows.

Age Regret Gender
5 "I went to the play ground and I want to go again today" F

🚨 Pre-Requisites

The entire code is done using R and RStudio. More details about necessary libraries can be found in the code, which remaind the same for most text analysis and sentiment analysis.

  • SnowballC: An R interface to the C 'libstemmer' library that implements Porter's word stemming algorithm for collapsing words to a common root to aid comparison of vocabulary.
  • wordcloud: Functionality to create pretty word clouds, visualize differences and similarity between documents, and avoid over-plotting in scatter plots with text.
  • syuzhet: Extracts sentiment and sentiment-derived plot arcs from text using a variety of sentiment dictionaries conveniently packaged for consumption by R users.

📊 Text Analysis

For the prupose of this project and due to the limited amount of data available, I have performed text analysis on the entire dataset. Depedning on the size, type, and genre of dataset at hand text analysis can be performed by splitting the data into personalized categories (eg: age groups, gender, genre etc.)

The final results produced include a wordcloud of the most frequently appearing terms in the term document matrix as well as a sentiment analysis graph which shows the percentage of occurence of the 8 most common emotions.

To read more details about NLP/ Text Analysis in R, please refer the article here


© Akshaya Parthasarathy, 2022

Feedback is always welcome, drop a message on

LINKEDIN INSTAGRAM REDDIT

About

Text analysis on a custom dataset using R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages