# Dissertation Part 1: Dataset Model Building
The first thing we need to do is load some packages that we'll need.

In [1]:
library(rdydisstools)
loadpackages(c('dplyr', 'mirt', 'foreach', 'doParallel'))
setwd('~/dissertation')

In [2]:
d <- read.csv("sourcedata/hexaco.csv", header=TRUE, stringsAsFactors=FALSE, row.names=NULL)

Some of the country codes are missing in the data. Unfortunately, these values are set to " ", which complicates the dataset. Let's find them and set them properly to NA

In [3]:
x <- which(d[,243]== " ")
d[x,243] <- NA
rm(x)

Next let's do some basic cleaning. First we'll remove obeservations where respondents reported inadequate understanding of instructions or attention. Then, min time to completion was 0 seconds and max time to completion was 786816 seconds (about 9.5 days). Neither of these seem like very valid time frames to complete this survey, so we're going to impose some time bounds, removing response sets that took < 17.5 minutes or more than about 2 hours. I note that these time bounds are somewhat arbitrary, but they seemed like reasonable values after looking at the data. 

In [4]:

test1 <- d[,"V1"] >= 5 # Participant claims to understand the instructions
test2 <- d[,"V2"] >= 5 # Participant claims to have answered accurately.
test3 <- d[,"elapse"] < 6000 # Around 2 hours
test4 <- d[,"elapse"] > 1037.456 # less than 17.5 minutes
d <- d[which(test1 & test2 & test3 & test4),]

The HSinc1 colname has an extra character or two in it, so let's get rid of that.

In [5]:
colnames(d)[1] <- "HSinc1"

Some of the scales have reverse coded items. Let's select out those constructs so that we can code them properly.

In [6]:
HSincerity <- d[,1:10]
HFairness <- d[,11:20]
EAnxiety <- d[,51:60]
EDependence <- d[,61:70]
XLiveliness <- d[,111:120]
AForgiveness <- d[,121:130]
APatience <- d[,151:160]
CPerfectionism <- d[,181:190]
OInquisitiveness <- d[,211:220]
OUnconventionality <- d[,231:240]

Now just a vector of names for the overall constructs

In [7]:
n <- c("HSincerity", "HFairness", "EAnxiety", "EDependence", "XLiveliness",
       "AForgiveness", "APatience", "CPerfectionism", "OInquisitiveness", 
       "OUnconventionality")

We need to define keys for the negatively coded variables. I should note that in this case, negative is defined in reference to the construct name, and not the social desirability of the construct. Dependence is absent because that scale consists wholly of positive items.

In [8]:
key <- list()
key[["HSinc"]] <- c(2:10)
key[["HFair"]] <- c(6:10)
key[["EAnxi"]] <- c(6:10)
key[["XLive"]] <- c(9,10)
key[["AForg"]] <- c(5:10)
key[["APati"]] <- c(6:10)
key[["CPerf"]] <- c(9,10)
key[["OInqu"]] <- c(7:10)
key[["OUnco"]] <- c(6:10)

Alright, now we're ready to reverse code all the items and compute factor scores for each person. There's almost certainly a better way to do this, but this was already coded, so I used it.

In [9]:
for (i in 1:length(n)){
  name <- substr(n[i],1,5)
  if(!is.null(key[[name]])) {  
    tmp <- reverseCode(eval(as.name(n[i])),key[[name]],7)
  } else tmp <- eval(as.name(n[i]))
  tmp <- rowMeans(tmp)
  assign(name, tmp) 
}

Now let's compile those scores into a matrix

In [10]:
factors <- cbind(HSinc,HFair,EAnxi,EDepe,XLive,AForg,APati,CPerf,OInqu,OUnco)
fcorr <- cor(factors)

It's time to estimate the item parameters for each construct using the graded response model. eval(as.name()) replaces itself with the "name" for the current value of x. Then we need to extract the item parameters for each construct and bind it into one dataset..

In [11]:
registerDoParallel(24)
ipar <- foreach(c=1:length(n), .combine=rbind) %dopar% {
  y <- mirt(eval(as.name(n[c])), 1)
  as.data.frame(coef(y, simplify=T)$items) 
}

Makes the rownames for our item parameters, because rbind.fill doesn't do that for us.

In [12]:
rows <- makeRownames(n)
rownames(ipar) <- rows

Now let's store these parameters in a dataModel object and save it to our dataModel library

In [13]:
hexaco <- dataModel(fcorr, ipar, 5)
save(hexaco, file='datamodels/hexaco.RData')