Skip to content

Clustering personalities and labelling based on sampled survey responses (Hierarchical and K-Means).

License

Notifications You must be signed in to change notification settings

guptasaumya/personality-clustering-labeling

Repository files navigation

Personlaity Clustering and Labeling

Clustering personalities and labelling based on sampled survey responses (Hierarchical and K-Means).

Description

An interactive online personality test was conducted (2016-2018), based on the “Big-Five Factor Markers” from the International Personality Item Pool (IPIP). “Big-Five Factor Markers” represent the five main personality traits suggested for overall grouping for several personality traits - extraversion, neuroticism, agreeableness, conscientiousness, and openness to experience. In the test, they recorded the answers of the test participants for research use with approval. Results of statistical analyses on such data could reveal plenty of insights on applied psychology. For problem statement 2 in the report, I perform clustering analysis on the same dataset to group the participants based on their responses to the test. The data set is vast, so I do a sampling of an appropriate subset and build clusters on that, validate by assigning cluster labels and visualising results. Lastly, I use the already built clusters and the assigned labels to make predictions for unseen observations.

This analysis was done as part of the Statistical Learning course held at Dalarna University for the master in the data science program.

Getting Started

Usage

The analysis is presented in AnalysisReport.pdf in the Problem 2 section and the code for analysis in problem2.R. The code is in R; hence, R and optional tools like RStudio are required to run the code. In addition, the images used in the report are present separately for easy access to the information.

Note: Problem 1 in the report answers problems related to concepts of collinearity problem in linear regression and derivation of QDA decision rule from Bayes' rule. problem1.R contains the code for this problem. These can be ignored.

However, this repository does not provide the data used for analysis.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Releases

No releases published

Packages

No packages published

Languages