# Do freshmen fare better when placed in all-frosh dorms?
Answer: No. But higher-well-being freshmen are more likely to be initially placed in all-frosh dorms.

**Author: Everett Wetchler (`everett.wetchler@gmail.com`)**

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-and-load" data-toc-modified-id="Import-and-load-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import and load</a></span></li><li><span><a href="#De-dup-and-merge" data-toc-modified-id="De-dup-and-merge-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>De-dup and merge</a></span></li><li><span><a href="#Test-for-linear-effect-of-being-in-an-all-frosh-dorm" data-toc-modified-id="Test-for-linear-effect-of-being-in-an-all-frosh-dorm-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Test for linear effect of being in an all-frosh dorm</a></span></li><li><span><a href="#Test-for-t1-difference-in-frosh-vs-non-frosh-students" data-toc-modified-id="Test-for-t1-difference-in-frosh-vs-non-frosh-students-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Test for t1 difference in frosh vs non-frosh students</a></span></li></ul></div>

## Import and load

In [15]:
library(car)
library(tidyverse)
library(hexbin)
library(mice)
library(nlme)
library(lme4)
library(lmerTest)

# Display more data in the Jupyter notebook
options(repr.matrix.max.cols=500, repr.matrix.max.rows=100)

# Set default plot size
options(repr.plot.width=6, repr.plot.height=6)

In [22]:
dorms = read.csv('../data/2019–2020/PID_dorm_info.csv', na.strings=c("", " ", "NA"))
dim(dorms)
dorms = dorms %>% arrange(PID)
head(dorms)

Unnamed: 0_level_0,PID,frosh,DID,HID,dormClass,dormType,participatingDorm
Unnamed: 0_level_1,<int>,<int>,<fct>,<fct>,<fct>,<fct>,<int>
1,1000,0,53,6,upper,vegetarian,0
2,1001,1,30,5,frosh,no type,1
3,1001,1,30,5,frosh,no type,1
4,1002,0,46,7,upper,no type,0
5,1002,0,46,7,upper,no type,0
6,1003,0,42,8,upper,no type,0


In [7]:
df = read.csv('../data/2019–2020/postprocessed/df_Rcleaned_full.csv', na.strings=c("", " ", "NA"))
df = df[,2:ncol(df)]  # Remove extraneous first column
dim(df)
head(df)

Unnamed: 0_level_0,PID,gender,race,dorm,life_satisfaction_t1,empathy,loneliness_t1,stress_t1,BFI_E,BFI_A,BFI_C,BFI_N,BFI_O,intl_student,family_income,life_satisfaction_t2,loneliness_t2,stress_t2,parent_education_highest,wellbeing_composite_t1,wellbeing_composite_t2,degree_in_UNION,degree_out_UNION,empathy_UNION,degree_in_INTIMATE,degree_out_INTIMATE,empathy_INTIMATE,degree_in_ACQUAINTANCE,degree_out_ACQUAINTANCE,empathy_ACQUAINTANCE,degree_in_CloseFrds,degree_out_CloseFrds,empathy_CloseFrds,degree_in_NegEmoSupp,degree_out_NegEmoSupp,empathy_NegEmoSupp,degree_in_PosEmoSupp,degree_out_PosEmoSupp,empathy_PosEmoSupp,degree_in_Responsive,degree_out_Responsive,empathy_Responsive,degree_in_EmpSupp,degree_out_EmpSupp,empathy_EmpSupp,degree_in_PosAff,degree_out_PosAff,empathy_PosAff,degree_in_NegAff,degree_out_NegAff,empathy_NegAff,degree_in_Gossip,degree_out_Gossip,empathy_Gossip,degree_in_Liked,degree_out_Liked,empathy_Liked,degree_in_StudyWith,degree_out_StudyWith,empathy_StudyWith
Unnamed: 0_level_1,<int>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>
1,1001,male,south_asian,Rinconada,4.833333,6.375,1.333333,2.5,6.5,6.0,5.5,3.5,6.5,0,70000,6.166667,2.0,1.5,5,-0.05862651,0.83993983,3,0,,1,0,,1,0,,1,0,,0,0,,2,0,,0,0,,0,0,,1,0,,0,0,,1,0,,1,0,,1,0,
2,1004,female,south_asian,Donner,2.833333,6.0,2.0,2.0,4.0,5.5,4.5,3.0,5.5,1,10000,3.0,2.666667,3.0,1,-0.98138408,-1.95722956,7,6,5.479167,5,4,5.625,0,3,5.916667,4,4,5.625,5,1,4.75,4,4,5.4375,2,2,5.5,3,1,4.75,2,1,4.75,1,0,,1,1,4.75,1,1,4.75,3,2,5.5625
3,1047,female,other_or_mixed,Twain,5.333333,6.5,2.0,2.5,5.5,6.5,7.0,3.0,6.5,0,210000,5.166667,2.0,3.0,5,-0.4469044,-0.66253028,6,4,6.4375,4,4,6.4375,0,1,6.125,4,4,6.4375,4,3,6.541667,3,4,6.4375,2,2,6.5625,1,2,6.5625,3,4,6.4375,1,1,6.5,4,3,6.541667,1,2,6.0,2,1,6.625
4,1078,female,east_asian,Loro,5.5,6.25,1.333333,2.0,4.5,6.5,6.5,1.5,6.5,0,130000,5.5,2.333333,3.0,6,0.61446613,-0.79801599,1,0,,1,0,,1,0,,1,0,,0,0,,1,0,,1,0,,0,0,,0,0,,0,1,5.0,0,0,,0,0,,0,0,
5,1097,male,east_asian,Otero,6.0,5.0,2.0,2.0,3.0,4.0,5.5,1.5,4.5,0,90000,5.833333,2.0,2.5,6,0.22618824,-0.04126834,6,4,5.8125,3,1,5.625,2,0,,2,1,5.625,1,1,5.625,1,1,6.625,0,1,5.625,0,1,4.875,1,1,6.125,0,0,,0,1,5.625,0,1,6.0,3,1,6.625
6,1105,female,white,Larkin,6.666667,5.75,2.0,2.0,4.5,5.0,5.0,2.0,4.5,0,170000,6.666667,2.0,2.0,4,0.48041399,0.64021289,6,9,5.847222,4,5,6.025,2,2,5.875,3,4,5.8125,2,3,6.125,2,2,5.375,2,2,6.3125,1,4,6.1875,2,1,6.875,1,0,,2,3,5.75,1,1,5.125,4,2,6.1875


## De-dup and merge

In [23]:
dorms = dorms[!duplicated(dorms$PID),]
dim(dorms)

In [25]:
df = merge(df, dorms, by="PID")
dim(df)
head(df)

Unnamed: 0_level_0,PID,gender,race,dorm,life_satisfaction_t1,empathy,loneliness_t1,stress_t1,BFI_E,BFI_A,BFI_C,BFI_N,BFI_O,intl_student,family_income,life_satisfaction_t2,loneliness_t2,stress_t2,parent_education_highest,wellbeing_composite_t1,wellbeing_composite_t2,degree_in_UNION,degree_out_UNION,empathy_UNION,degree_in_INTIMATE,degree_out_INTIMATE,empathy_INTIMATE,degree_in_ACQUAINTANCE,degree_out_ACQUAINTANCE,empathy_ACQUAINTANCE,degree_in_CloseFrds,degree_out_CloseFrds,empathy_CloseFrds,degree_in_NegEmoSupp,degree_out_NegEmoSupp,empathy_NegEmoSupp,degree_in_PosEmoSupp,degree_out_PosEmoSupp,empathy_PosEmoSupp,degree_in_Responsive,degree_out_Responsive,empathy_Responsive,degree_in_EmpSupp,degree_out_EmpSupp,empathy_EmpSupp,degree_in_PosAff,degree_out_PosAff,empathy_PosAff,degree_in_NegAff,degree_out_NegAff,empathy_NegAff,degree_in_Gossip,degree_out_Gossip,empathy_Gossip,degree_in_Liked,degree_out_Liked,empathy_Liked,degree_in_StudyWith,degree_out_StudyWith,empathy_StudyWith,frosh,DID,HID,dormClass,dormType,participatingDorm
Unnamed: 0_level_1,<int>,<fct>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<int>,<dbl>,<int>,<fct>,<fct>,<fct>,<fct>,<int>
1,1001,male,south_asian,Rinconada,4.833333,6.375,1.333333,2.5,6.5,6.0,5.5,3.5,6.5,0,70000,6.166667,2.0,1.5,5,-0.05862651,0.83993983,3,0,,1,0,,1,0,,1,0,,0,0,,2,0,,0,0,,0,0,,1,0,,0,0,,1,0,,1,0,,1,0,,1,30,5,frosh,no type,1
2,1004,female,south_asian,Donner,2.833333,6.0,2.0,2.0,4.0,5.5,4.5,3.0,5.5,1,10000,3.0,2.666667,3.0,1,-0.98138408,-1.95722956,7,6,5.479167,5,4,5.625,0,3,5.916667,4,4,5.625,5,1,4.75,4,4,5.4375,2,2,5.5,3,1,4.75,2,1,4.75,1,0,,1,1,4.75,1,1,4.75,3,2,5.5625,1,35,4,frosh,no type,1
3,1047,female,other_or_mixed,Twain,5.333333,6.5,2.0,2.5,5.5,6.5,7.0,3.0,6.5,0,210000,5.166667,2.0,3.0,5,-0.4469044,-0.66253028,6,4,6.4375,4,4,6.4375,0,1,6.125,4,4,6.4375,4,3,6.541667,3,4,6.4375,2,2,6.5625,1,2,6.5625,3,4,6.4375,1,1,6.5,4,3,6.541667,1,2,6.0,2,1,6.625,1,76,4,frosh,no type,1
4,1078,female,east_asian,Loro,5.5,6.25,1.333333,2.0,4.5,6.5,6.5,1.5,6.5,0,130000,5.5,2.333333,3.0,6,0.61446613,-0.79801599,1,0,,1,0,,1,0,,1,0,,0,0,,1,0,,1,0,,0,0,,0,0,,0,1,5.0,0,0,,0,0,,0,0,,1,32,1,four-class,no type,1
5,1097,male,east_asian,Otero,6.0,5.0,2.0,2.0,3.0,4.0,5.5,1.5,4.5,0,90000,5.833333,2.0,2.5,6,0.22618824,-0.04126834,6,4,5.8125,3,1,5.625,2,0,,2,1,5.625,1,1,5.625,1,1,6.625,0,1,5.625,0,1,4.875,1,1,6.125,0,0,,0,1,5.625,0,1,6.0,3,1,6.625,1,48,5,frosh,no type,1
6,1105,female,white,Larkin,6.666667,5.75,2.0,2.0,4.5,5.0,5.0,2.0,4.5,0,170000,6.666667,2.0,2.0,4,0.48041399,0.64021289,6,9,5.847222,4,5,6.025,2,2,5.875,3,4,5.8125,2,3,6.125,2,2,5.375,2,2,6.3125,1,4,6.1875,2,1,6.875,1,0,,2,3,5.75,1,1,5.125,4,2,6.1875,1,77,4,frosh,no type,1


In [35]:
df$dormIsFrosh = df$dormClass == "frosh"

In [42]:
df %>% group_by(dormIsFrosh) %>% summarize(
    count=n(),
    life_satisfaction_t1=mean(life_satisfaction_t1, na.rm=TRUE),
    life_satisfaction_t2=mean(life_satisfaction_t2, na.rm=TRUE),
    loneliness_t1=mean(loneliness_t1, na.rm=TRUE),
    loneliness_t2=mean(loneliness_t2, na.rm=TRUE),
    stress_t1=mean(stress_t1, na.rm=TRUE),
    stress_t2=mean(stress_t2, na.rm=TRUE),
    wellbeing_composite_t1=mean(wellbeing_composite_t1, na.rm=TRUE),
    wellbeing_composite_t2=mean(wellbeing_composite_t2, na.rm=TRUE),)

dormIsFrosh,count,life_satisfaction_t1,life_satisfaction_t2,loneliness_t1,loneliness_t2,stress_t1,stress_t2,wellbeing_composite_t1,wellbeing_composite_t2
<lgl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
False,275,4.909091,4.75365,1.854545,1.985455,2.089091,2.143636,-0.13813624,-0.1457443
True,427,5.160617,5.039813,1.773614,1.88,2.016393,2.064706,0.08896362,0.09396221


## Test for linear effect of being in an all-frosh dorm

In [38]:
summary(lm(life_satisfaction_t2 ~ life_satisfaction_t1 + dormIsFrosh, data=df))


Call:
lm(formula = life_satisfaction_t2 ~ life_satisfaction_t1 + dormIsFrosh, 
    data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9581 -0.4674  0.0500  0.5419  3.4249 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           0.94823    0.13886   6.829 1.86e-11 ***
life_satisfaction_t1  0.77610    0.02650  29.289  < 2e-16 ***
dormIsFroshTRUE       0.08645    0.06315   1.369    0.171    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.811 on 698 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.5573,	Adjusted R-squared:  0.556 
F-statistic: 439.3 on 2 and 698 DF,  p-value: < 2.2e-16


In [39]:
summary(lm(loneliness_t2 ~ loneliness_t1 + dormIsFrosh, data=df))


Call:
lm(formula = loneliness_t2 ~ loneliness_t1 + dormIsFrosh, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.18621 -0.36712 -0.00394  0.32939  1.54286 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.97139    0.07480  12.987   <2e-16 ***
loneliness_t1    0.54680    0.03694  14.804   <2e-16 ***
dormIsFroshTRUE -0.06105    0.03868  -1.578    0.115    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4983 on 697 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.2454,	Adjusted R-squared:  0.2432 
F-statistic: 113.3 on 2 and 697 DF,  p-value: < 2.2e-16


In [40]:
summary(lm(stress_t2 ~ stress_t1 + dormIsFrosh, data=df))


Call:
lm(formula = stress_t2 ~ stress_t1 + dormIsFrosh, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.31030 -0.34443 -0.05634  0.40161  1.45158 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      1.08254    0.08225  13.162   <2e-16 ***
stress_t1        0.50792    0.03650  13.914   <2e-16 ***
dormIsFroshTRUE -0.04205    0.03963  -1.061    0.289    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.511 on 697 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.2209,	Adjusted R-squared:  0.2186 
F-statistic: 98.79 on 2 and 697 DF,  p-value: < 2.2e-16


In [41]:
summary(lm(wellbeing_composite_t2 ~ wellbeing_composite_t1 + dormIsFrosh, data=df))


Call:
lm(formula = wellbeing_composite_t2 ~ wellbeing_composite_t1 + 
    dormIsFrosh, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.32098 -0.44382  0.07806  0.49213  2.23838 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)            -0.05101    0.04506  -1.132    0.258    
wellbeing_composite_t1  0.66368    0.02827  23.478   <2e-16 ***
dormIsFroshTRUE         0.08621    0.05793   1.488    0.137    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7429 on 696 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared:  0.4496,	Adjusted R-squared:  0.448 
F-statistic: 284.3 on 2 and 696 DF,  p-value: < 2.2e-16


## Test for t1 difference in frosh vs non-frosh students

In [50]:
t.test(df$life_satisfaction_t1, df$dormIsFrosh)


	Welch Two Sample t-test

data:  df$life_satisfaction_t1 and df$dormIsFrosh
t = 93.474, df = 940.45, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 4.360315 4.547330
sample estimates:
mean of x mean of y 
5.0620845 0.6082621 


In [51]:
t.test(df$loneliness_t1, df$dormIsFrosh)


	Welch Two Sample t-test

data:  df$loneliness_t1 and df$dormIsFrosh
t = 44.857, df = 1399.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.144707 1.249405
sample estimates:
mean of x mean of y 
1.8053181 0.6082621 


In [52]:
t.test(df$stress_t1, df$dormIsFrosh)


	Welch Two Sample t-test

data:  df$stress_t1 and df$dormIsFrosh
t = 52.814, df = 1392.8, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 1.383250 1.489969
sample estimates:
mean of x mean of y 
2.0448718 0.6082621 


In [53]:
t.test(df$wellbeing_composite_t1, df$dormIsFrosh)


	Welch Two Sample t-test

data:  df$wellbeing_composite_t1 and df$dormIsFrosh
t = -14.481, df = 1017.5, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6906883 -0.5258359
sample estimates:
   mean of x    mean of y 
5.499500e-17 6.082621e-01 
