Clustering personalities and labelling based on sampled survey responses (Hierarchical and K-Means).
An interactive online personality test was conducted (2016-2018), based on the “Big-Five Factor Markers” from the International Personality Item Pool (IPIP). “Big-Five Factor Markers” represent the five main personality traits suggested for overall grouping for several personality traits - extraversion, neuroticism, agreeableness, conscientiousness, and openness to experience. In the test, they recorded the answers of the test participants for research use with approval. Results of statistical analyses on such data could reveal plenty of insights on applied psychology. For problem statement 2 in the report, I perform clustering analysis on the same dataset to group the participants based on their responses to the test. The data set is vast, so I do a sampling of an appropriate subset and build clusters on that, validate by assigning cluster labels and visualising results. Lastly, I use the already built clusters and the assigned labels to make predictions for unseen observations.
This analysis was done as part of the Statistical Learning
course held at Dalarna University for the master in the data science program.
The analysis is presented in AnalysisReport.pdf
in the Problem 2
section and the code for analysis in problem2.R
. The code is in R; hence, R and optional tools like RStudio are required to run the code. In addition, the images used in the report are present separately for easy access to the information.
Note: Problem 1
in the report answers problems related to concepts of collinearity problem in linear regression and derivation of QDA decision rule from Bayes' rule. problem1.R
contains the code for this problem. These can be ignored.
However, this repository does not provide the data used for analysis.
This project is licensed under the MIT License - see the LICENSE file for details.