# Does your dorm matter for your well-being?

We build models to predict:
1. Spring well-being from fall well-being
1. Spring well-being from fall well-being and demographic items (age, family income, family education, race, gender)
1. Spring well-being from fall well-being and demographic items plus random effects by dorm.

# Results:
- Random effect model does not improve

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Configuration" data-toc-modified-id="Configuration-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Configuration</a></span></li><li><span><a href="#Import-and-load" data-toc-modified-id="Import-and-load-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Import and load</a></span></li><li><span><a href="#Quick-summary-of-whole-dorm-well-beings" data-toc-modified-id="Quick-summary-of-whole-dorm-well-beings-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Quick summary of whole-dorm well-beings</a></span></li><li><span><a href="#Base-model,-minimal-predictors" data-toc-modified-id="Base-model,-minimal-predictors-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Base model, minimal predictors</a></span></li><li><span><a href="#Model-with-demographic-covariates" data-toc-modified-id="Model-with-demographic-covariates-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Model with demographic covariates</a></span></li></ul></div>

## Configuration

In [1]:
DATA_FILE = '../data/2018-2019/postprocessed/final_for_analysis_R.csv'

IMPUTE_MISSING = TRUE
INCLUDE_FALL_WB_AS_PREDICTOR = TRUE
INCLUDE_DEMOS_AS_PREDICTOR = TRUE
# DV = 'Wellbeing_fall'
DV = 'Wellbeing_spring'

if (INCLUDE_FALL_WB_AS_PREDICTOR) {
    stopifnot(DV == 'Wellbeing_spring')
}

## Import and load

In [2]:
library(car)
library(tidyverse)
library(hexbin)
library(mice)
library(nlme)
library(lme4)
library(lmerTest)

options(width=200)

Loading required package: carData

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.2.1     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 2.1.3     [32m✔[39m [34mdplyr  [39m 0.8.3
[32m✔[39m [34mtidyr  [39m 1.0.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[31m✖[39m [34mdplyr[39m::[32mrecode()[39m masks [34mcar[39m::recode()
[31m✖[39m [34mpurrr[39m::[32msome()[39m   masks [34mcar[39m::some()

Loading required package: lattice

Registered S3 methods overwritten by 'lme4':
  method                          from
  cooks.distance.influence.merMod car 
  influen

In [8]:
df = read.csv(DATA_FILE, na.strings=c("", " ", "NA"))
df = df[,c('NID', 'Age', 'ParentEducationMax',
           'FinclAid', 'FmlyIncome', 'Gender', 'Race',
           'Wellbeing_fall', 'Wellbeing_spring')]
dim(df)
head(df)

Unnamed: 0_level_0,NID,Age,ParentEducationMax,FinclAid,FmlyIncome,Gender,Race,Wellbeing_fall,Wellbeing_spring
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<fct>,<dbl>,<dbl>
1,7,18,4.0,0,87500.0,M,white,-2.06354788,-0.76535414
2,11,18,3.5,1,,F,south_asian,-0.01143413,-0.04997158
3,9,18,4.0,1,125000.0,M,white,0.919656,0.66541099
4,4,18,4.0,0,200000.0,F,east_asian,0.65342017,0.48656535
5,5,18,2.5,1,125000.0,M,south_asian,0.6983916,-0.04997158
6,13,18,4.0,1,45000.0,F,east_asian,0.04290417,-0.1393944


In [9]:
if (IMPUTE_MISSING) {
    print("Imputing missing values")
    imp = mice(df)
    df = complete(imp)
    head(df)
} else {
    df = na.omit(df)
}

[1] "Imputing missing values"

 iter imp variable
  1   1  ParentEducationMax  FinclAid  FmlyIncome  Race
  1   2  ParentEducationMax  FinclAid  FmlyIncome  Race
  1   3  ParentEducationMax  FinclAid  FmlyIncome  Race
  1   4  ParentEducationMax  FinclAid  FmlyIncome  Race
  1   5  ParentEducationMax  FinclAid  FmlyIncome  Race
  2   1  ParentEducationMax  FinclAid  FmlyIncome  Race
  2   2  ParentEducationMax  FinclAid  FmlyIncome  Race
  2   3  ParentEducationMax  FinclAid  FmlyIncome  Race
  2   4  ParentEducationMax  FinclAid  FmlyIncome  Race
  2   5  ParentEducationMax  FinclAid  FmlyIncome  Race
  3   1  ParentEducationMax  FinclAid  FmlyIncome  Race
  3   2  ParentEducationMax  FinclAid  FmlyIncome  Race
  3   3  ParentEducationMax  FinclAid  FmlyIncome  Race
  3   4  ParentEducationMax  FinclAid  FmlyIncome  Race
  3   5  ParentEducationMax  FinclAid  FmlyIncome  Race
  4   1  ParentEducationMax  FinclAid  FmlyIncome  Race
  4   2  ParentEducationMax  FinclAid  FmlyIncome  Rac

Unnamed: 0_level_0,NID,Age,ParentEducationMax,FinclAid,FmlyIncome,Gender,Race,Wellbeing_fall,Wellbeing_spring
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<fct>,<dbl>,<dbl>
1,7,18,4.0,0,87500,M,white,-2.06354788,-0.76535414
2,11,18,3.5,1,62500,F,south_asian,-0.01143413,-0.04997158
3,9,18,4.0,1,125000,M,white,0.919656,0.66541099
4,4,18,4.0,0,200000,F,east_asian,0.65342017,0.48656535
5,5,18,2.5,1,125000,M,south_asian,0.6983916,-0.04997158
6,13,18,4.0,1,45000,F,east_asian,0.04290417,-0.1393944


## Quick summary of whole-dorm well-beings

In [11]:
df %>% group_by(NID) %>%
    summarize(wb_fall = mean(Wellbeing_fall),
              wb_spring = mean(Wellbeing_spring))

NID,wb_fall,wb_spring
<dbl>,<dbl>,<dbl>
1,0.17958135,0.43067609
2,0.10039586,0.23049091
4,-0.0421375,0.00871215
5,0.05322856,0.01841058
7,0.06902635,0.1716415
8,-0.15094904,-0.45876161
9,-0.09749154,-0.34804765
10,-0.35117851,-0.30147326
11,0.14734051,0.15272015
13,-0.30415259,-0.49261454


## Base model, minimal predictors

In [12]:
equation = paste(DV, ' ~  1')
if (INCLUDE_FALL_WB_AS_PREDICTOR) {
    equation = paste(equation, '+ Wellbeing_fall')
}
model1 = lm(as.formula(equation), df)
summary(model1)


Call:
lm(formula = as.formula(equation), data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.69297 -0.48662  0.06646  0.49158  2.76843 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -3.286e-16  5.373e-02    0.00        1    
Wellbeing_fall  6.433e-01  5.387e-02   11.94   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7675 on 202 degrees of freedom
Multiple R-squared:  0.4139,	Adjusted R-squared:  0.411 
F-statistic: 142.6 on 1 and 202 DF,  p-value: < 2.2e-16


## Model with demographic covariates

In [13]:
names(df)

In [14]:
if (INCLUDE_DEMOS_AS_PREDICTOR) {
    equation = paste(equation, '+ Age + ParentEducationMax + FinclAid + FmlyIncome + Gender + Race')
    model2 = lm(as.formula(equation), df)
    summary(model2)
} else {
    model2 = model1
}


Call:
lm(formula = as.formula(equation), data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5015 -0.4384  0.0441  0.4755  2.7326 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -1.064e-01  1.056e+00  -0.101   0.9198    
Wellbeing_fall      6.174e-01  5.649e-02  10.930   <2e-16 ***
Age                -1.370e-02  5.295e-02  -0.259   0.7961    
ParentEducationMax  4.165e-02  9.796e-02   0.425   0.6712    
FinclAid           -2.560e-02  1.455e-01  -0.176   0.8605    
FmlyIncome          1.514e-06  1.078e-06   1.404   0.1618    
GenderM             1.907e-01  1.148e-01   1.662   0.0982 .  
Genderother         2.555e-02  3.668e-01   0.070   0.9445    
Raceeast_asian     -1.227e-02  2.214e-01  -0.055   0.9559    
Racehispanic        1.670e-01  2.777e-01   0.601   0.5483    
Raceother_or_mixed -1.330e-01  2.251e-01  -0.591   0.5553    
Racesouth_asian    -2.575e-01  3.004e-01  -0.857   0.3923    
Racewhite           1.359e-01  2

In [15]:
anova(model1, model2)

Unnamed: 0_level_0,Res.Df,RSS,Df,Sum of Sq,F,Pr(>F)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,202,118.9815,,,,
2,191,112.0058,11.0,6.975638,1.081394,0.3781997


In [16]:
model3 = lmer(as.formula(paste(equation, '+ (1|NID)')), data=df, REML=TRUE)
summary(model3)

“Some predictor variables are on very different scales: consider rescaling”
“Some predictor variables are on very different scales: consider rescaling”

Correlation matrix not shown by default, as p = 13 > 12.
Use print(obj, correlation=TRUE)  or
    vcov(obj)        if you need it




Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
Formula: as.formula(paste(equation, "+ (1|NID)"))
   Data: df

REML criterion at convergence: 510.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2072 -0.5403  0.0714  0.5989  3.5186 

Random effects:
 Groups   Name        Variance Std.Dev.
 NID      (Intercept) 0.01112  0.1054  
 Residual             0.57730  0.7598  
Number of obs: 204, groups:  NID, 11

Fixed effects:
                     Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)        -4.503e-01  1.090e+00  9.226e+01  -0.413    0.681    
Wellbeing_fall      6.123e-01  5.625e-02  1.871e+02  10.884   <2e-16 ***
Age                 4.818e-03  5.483e-02  8.334e+01   0.088    0.930    
ParentEducationMax  4.248e-02  9.778e-02  1.901e+02   0.434    0.665    
FinclAid           -2.599e-02  1.453e-01  1.902e+02  -0.179    0.858    
FmlyIncome          1.455e-06  1.073e-06  1.851e+02   1.357    0.176    
GenderM   

In [17]:
model4 = lmer(as.formula(paste(equation, '+ (1|NID)')), data=df, REML=FALSE)
anova(model4, model2)#, refit=FALSE)

“Some predictor variables are on very different scales: consider rescaling”
“Some predictor variables are on very different scales: consider rescaling”


Unnamed: 0_level_0,Df,AIC,BIC,logLik,deviance,Chisq,Chi Df,Pr(>Chisq)
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
model2,14,484.6148,531.0685,-228.3074,456.6148,,,
model4,15,486.5946,536.3664,-228.2973,456.5946,0.0201803,1.0,0.8870347
