# Factor analysis of nonverbal behavior



In [None]:
req <- readLines("requirements.txt")
proj_version <- "R version 3.6.1 (2019-07-05)"
p<-.libPaths()
`%notin%` <- Negate(`%in%`)
if (proj_version != version$version.string){
  print(paste("You have ", version$version.string, ". This project was created with R version 3.6.1 (2019-07-05). Some packages may not install or work properly"))
} else {
  print("You have the same version of R as was used in this project")
}

In [None]:
for (i in 1:length(req)){
  pkg<-req[[i]]
  if (pkg %notin% rownames(installed.packages())){
    install.packages(pkg, p)
  }
  if (pkg %in% rownames(installed.packages()) & pkg %notin% loadedNamespaces()){
    library(pkg, character.only = T)
    print(paste("Attaching package:", pkg))
  }
  if (pkg %notin% rownames(installed.packages())){
    print(paste("Error installing ", pkg, ". Check Warnings."))
  }
}

In [None]:
messy_data<-read.delim("NIS_data/data.csv", na.strings = c("", "NA"))

In [None]:
NIS_clean<-messy_data[1:5000, ]%>% #take just the first 5,000 observations of this massive dataset
  select(c("Q1", "Q2", "Q3","Q4", "Q5", "Q6", "Q7",  "Q8",  "Q9",  "Q10", "Q11", "Q12", "Q13", "Q14", "Q15", "Q16", "Q17", "Q18", "Q19", "Q20", "Q21", "Q22", "Q23", "Q24", "Q25", "Q26", "VCL1", "VCL2", "VCL3", "VCL4", "VCL5", "VCL6", "VCL7", "VCL8", "VCL9",  "VCL10",  "VCL11", "VCL12",  "VCL13", "VCL14", "VCL15",  "VCL16"))%>% #select the actual columns, as well as the word recognition columns 
  filter_at(vars(c("VCL6", "VCL9","VCL12")), all_vars(.==0))%>% #these three variables are not real words. This keeps only observations where only all three were marked as unknown.
  filter_at(vars(starts_with("Q")), all_vars(.!=0))%>% #0 is not a valid option in the 1-5 likert scale for NIS items. This keeps only observations where none of the Qs were answered with 0.
  select(starts_with("Q"))%>% #retains only the NIS questions
  mutate_all(as.numeric)

In [None]:
mvn(NIS_clean[1:length(NIS_clean), ], mvnTest = "hz", multivariatePlot = "qq")$multivariateNormality #takes only 5,000 obs

### Running the factor analysis
The polychoric correlation matrix is an estimates the theorized correlations between pairs of ordinal variables. This matrix will be used in place of a correlation matrix in the factor analysis. 

In [None]:
NIS_m<-data.matrix(NIS_clean) #save data as matrix
pcor<-polycor::hetcor(NIS_m, ML = T)$correlations #extract the polychoric correlation matrix

To determine how many factors to extract, parallel analysis compares the scree polt of successive eigenvalues of the observed data to a random matrix of the same size. 

In [None]:
psych::fa.parallel(pcor, n.obs = 3633, fm = "pa", fa = "fa")

The following code chunk performs and prints the factor analysis.
-The factor method is Principal Axis ("pa"). This method is best for non-normally distributed data.
-The rotation is "oblimin", a standard oblique rotation, meaning that the it allows the extracted factors to be correlated with each other. I chose "oblimin" because factors in this dataset are all related to a common construct and are likely to be correlated. The purpose of rotations is to "obtain simple structure in order to enhance interpretability of the solution" (Norris). 

In [None]:
fac <- psych::fa(pcor, nfactors=8, fm="pa", n.obs=3633, rotate="oblimin")
print(fac)

The above model is a good fit: root mean square of residuals is 0.01, the Tucker Lewis index is 0.95, and RMSEA index is < 0.05

### Interpreting the factor analysis

Next I explored which questions ended up in which factor.

The following code saves the loadings to a dataFrame and renames the columns. 

In [None]:
fa_loadings <- fac$loadings%>%
  as.table()%>%
  as.data.frame()%>%
  filter(Freq>=0.3 | Freq<=-0.3)
colnames(fa_loadings)<-c("Question", "Factor", "Loading")
levels(fa_loadings$Factor)<-c("PA1","PA2","PA3","PA4","PA5","PA6","PA7","PA8")

This sets up a list that allows you to find the text of each question using questions$Q#

In [None]:
#set up empty list:
questions<-vector(mode = "list", length =26)

#set up an empty list of names:
qnames<-vector(mode = "character", length = 26)

#read in the the part of the codebook that lists the questions to a temporary file
temp<-readLines("NIS_data/codebook.txt")[9:34]

#the first item of each line in temp is the question # and the 2nd item is the question text:
for (i in 1:26){
  qnames[i]<-strsplit(temp[i], '\t')[[1]][1]
  questions[i]<-strsplit(temp[i], '\t')[[1]][2]
  names(questions)<-qnames
}
questions

In [None]:
This adds a column to the fa_loadings DataFrame that is the text of each question.

In [None]:
fa_loadings$Question_txt<-factor(NA,levels = c("I use my hands and arms to gesture while talking to people. ","I touch others on the shoulder or arm while talking to them. ","I use a monotone or dull voice while talking to people. ","I look over or away from others while talking to them. ","I move away from others when they touch me while we are talking. ","I have a relaxed body position when I talk to people. ", "I frown while talking to people. ","I avoid eye contact while talking to people. ", "I have a tense body position while talking to people. ","I sit close or stand close to people while talking with them. ","My voice is monotonous or dull when I talk to people. ","I use a variety of vocal expressions when I talk to people. ", "I gesture when I talk to people. ","I am animated when I talk to people. ","I have a bland facial expression when I talk to people. ","I move closer to people when I talk to them. ","I look directly at people while talking to them. ","I am stiff when I talk to people. ","I have a lot of vocal variety when I talk to people. ","I avoid gesturing while I am talking to people. ","I lean toward people when I talk to them. ","I maintain eye contact with people when I talk to them. ","I try not to sit or stand close to people when I talk with them. ","I lean away from people when I talk to them. ","I smile when I talk to people. ","I avoid touching people when I talk to them."))

for (i in unique(fa_loadings$Question)){
  fa_loadings$Question_txt[fa_loadings$Question==i]<-as.character(questions[i])
}
fa_loadings<-arrange(fa_loadings, Factor)
fa_loadings

![title](img/loading_plot.png)