# Analysing a Psychometric Assessment in R

## Prepared by

**Author:** Shinin Varongchayakul

**Date:** 31 May 2025

**Language:** R

## Dataset

**Name:** SD3

**Source:** http://openpsychometrics.org/_rawdata/SD3.zip

**Retrived Date:** 31 May 2025

## Additional Info

**SD-3 on Open-Source Psychometrics:** https://openpsychometrics.org/tests/SD3/

**SD-3 Score Summary:** https://openpsychometrics.org/tests/SD3/results.php

## 1. Install & Load Packages

In [None]:
# Install packages
install.packages("readr") # import data
install.packages("dplyr") # data manupulation
install.packages("ggplot2") # data visualisation
install.packages("psych") # EFA
install.packages("lavaan") # CFA

Installing package into 'C:/Users/sam_h/AppData/Local/R/win-library/4.5'
(as 'lib' is unspecified)



also installing the dependencies 'bit', 'prettyunits', 'bit64', 'progress', 'clipr', 'hms', 'vroom', 'tzdb'




package 'bit' successfully unpacked and MD5 sums checked
package 'prettyunits' successfully unpacked and MD5 sums checked
package 'bit64' successfully unpacked and MD5 sums checked
package 'progress' successfully unpacked and MD5 sums checked
package 'clipr' successfully unpacked and MD5 sums checked
package 'hms' successfully unpacked and MD5 sums checked
package 'vroom' successfully unpacked and MD5 sums checked
package 'tzdb' successfully unpacked and MD5 sums checked
package 'readr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\sam_h\AppData\Local\Temp\Rtmp2xwg59\downloaded_packages


"package 'dplyr' is in use and will not be installed"
"package 'ggplot2' is in use and will not be installed"
"package 'psych' is in use and will not be installed"
"package 'lavaan' is in use and will not be installed"


In [None]:
# Load packages
library(readr)
library(dplyr)
library(ggplot2)
library(psych)
library(lavaan)

## 2. Load & Prepare the Dataset

### 2.1 Load the Dataset

In [13]:
# Load the dataset
sd3 <- read_tsv("sd3_dataset.csv")

[1mRows: [22m[34m18192[39m [1mColumns: [22m[34m29[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m  (1): country
[32mdbl[39m (28): M1, M2, M3, M4, M5, M6, M7, M8, M9, N1, N2, N3, N4, N5, N6, N7, N8...

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [22]:
# Preview the dataset
head(sd3,
     n = 10)

M1,M2,M3,M4,M5,M6,M7,M8,M9,N1,⋯,P2,P3,P4,P5,P6,P7,P8,P9,country,source
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<fct>
4,4,4,4,4,4,4,3,4,2,⋯,2,3,2,4,4,2,4,4,GB,front page of website
2,1,5,2,2,1,2,2,3,1,⋯,5,1,5,4,1,1,3,2,US,front page of website
3,3,3,5,1,1,5,5,3,2,⋯,1,3,1,3,1,4,3,1,US,front page of website
5,5,4,5,5,5,5,5,5,5,⋯,5,5,2,5,5,1,1,5,GB,Other
4,4,2,5,5,5,4,1,4,3,⋯,1,3,1,4,3,1,4,1,GB,Other
4,2,2,4,2,3,5,2,2,2,⋯,1,4,4,2,2,5,1,5,IT,front page of website
4,4,4,2,4,4,4,3,5,4,⋯,4,4,2,4,3,2,4,3,GB,front page of website
5,5,5,5,5,5,5,4,5,4,⋯,1,2,5,5,5,4,5,3,GB,front page of website
5,3,4,4,4,4,4,2,4,3,⋯,4,4,3,4,4,5,2,1,US,front page of website
5,5,5,3,5,5,5,5,5,5,⋯,4,5,5,5,5,1,3,5,US,front page of website


In [23]:
# View the structure
glimpse(sd3)

Rows: 18,192
Columns: 29
$ M1      [3m[90m<dbl>[39m[23m 4[90m, [39m2[90m, [39m3[90m, [39m5[90m, [39m4[90m, [39m4[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m2[90m, [39m5[90m, [39m4[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m5[90m,[39m…
$ M2      [3m[90m<dbl>[39m[23m 4[90m, [39m1[90m, [39m3[90m, [39m5[90m, [39m4[90m, [39m2[90m, [39m4[90m, [39m5[90m, [39m3[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m1[90m, [39m2[90m, [39m3[90m, [39m5[90m, [39m3[90m, [39m2[90m, [39m4[90m, [39m4[90m, [39m3[90m,[39m…
$ M3      [3m[90m<dbl>[39m[23m 4[90m, [39m5[90m, [39m3[90m, [39m4[90m, [39m2[90m, [39m2[90m, [39m4[90m, [39m5[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m2[90m, [39m3[90m, [39m4[90m, [39m4[90m, [39m3[90m, [39m4[90m, [39m0[90m, [39m3[90m, [39m4[90m,[39m…
$ M4      [3m[90m<dbl>[39m[23m 4[90m, [39m

### 2.2 Correct the Data Types

In [16]:
# Convert `country` and `source` to factor

## Convert `country` to factor
sd3$country <- as.factor(sd3$country)

## Convert `source` to factor
sd3$source <- factor(sd3$source,
                     levels = c(1, 2, 3),
                     labels = c("front page of website", "Google search", "Other"))

## Check the results
glimpse(sd3)

Rows: 18,192
Columns: 29
$ M1      [3m[90m<dbl>[39m[23m 4[90m, [39m2[90m, [39m3[90m, [39m5[90m, [39m4[90m, [39m4[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m5[90m, [39m2[90m, [39m5[90m, [39m4[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m5[90m,[39m…
$ M2      [3m[90m<dbl>[39m[23m 4[90m, [39m1[90m, [39m3[90m, [39m5[90m, [39m4[90m, [39m2[90m, [39m4[90m, [39m5[90m, [39m3[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m1[90m, [39m2[90m, [39m3[90m, [39m5[90m, [39m3[90m, [39m2[90m, [39m4[90m, [39m4[90m, [39m3[90m,[39m…
$ M3      [3m[90m<dbl>[39m[23m 4[90m, [39m5[90m, [39m3[90m, [39m4[90m, [39m2[90m, [39m2[90m, [39m4[90m, [39m5[90m, [39m4[90m, [39m5[90m, [39m5[90m, [39m4[90m, [39m2[90m, [39m3[90m, [39m4[90m, [39m4[90m, [39m3[90m, [39m4[90m, [39m0[90m, [39m3[90m, [39m4[90m,[39m…
$ M4      [3m[90m<dbl>[39m[23m 4[90m, [39m

### 2.3 Handle Missing Values

In [20]:
# Check for missing values
anyNA(sd3)

In [21]:
# Check for mising values in each column
colSums(is.na(sd3))

**Comments:**

I choose to keep the records as is because:
1. Only one column, `source`, contains missing values.
2. There are only two missing values.
3. `source` is not a main feature.

### 2.4 Handle Out-of-Bound Scores

In [29]:
# Check for out-of-bound scores

## Define the columns to check
items <- sd3 |>
  select(-last_col(),
         -last_col(offset = 1))

## Check the items
colnames(items)

### 2.5 Reverse-Score the Items

In [18]:
# Reverse-score items

## Define the items to reverse-score
rs_items <- c("N2", "N6", "N8", "P2", "P7")

## Check the distribution of the items before reverse-scoring
summary(sd3[, rs_items])

       N2              N6              N8              P2       
 Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.:2.000  
 Median :3.000   Median :3.000   Median :2.000   Median :3.000  
 Mean   :3.116   Mean   :3.095   Mean   :2.597   Mean   :3.205  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
       P7       
 Min.   :0.000  
 1st Qu.:2.000  
 Median :4.000  
 Mean   :3.537  
 3rd Qu.:5.000  
 Max.   :5.000  

In [19]:
## Loop through the items
for (x in rs_items) {
    sd3[[x]] <- 6 - sd3[[x]]
}

## Check the results
summary(sd3[, rs_items])

       N2              N6              N8              P2       
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :3.000   Median :3.000   Median :4.000   Median :3.000  
 Mean   :2.884   Mean   :2.905   Mean   :3.403   Mean   :2.795  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000  
 Max.   :6.000   Max.   :6.000   Max.   :6.000   Max.   :6.000  
       P7       
 Min.   :1.000  
 1st Qu.:1.000  
 Median :2.000  
 Mean   :2.463  
 3rd Qu.:4.000  
 Max.   :6.000  