## Social desirability experiment: replicating Daoust findings - data analysis

Experiment specifications: <br>
* Daoust experiment (variables: visit, over, outdoors):
  + Have you done? Yes/No/Unsure <br>
  + Some people XX have you done? Yes/Only when neccessary/No/Unsure <br><br>
  
* SD benchmark (variables: blame):
  + Individuals are more to blame? Agree/disagree <br>
  + Social conditions are more to blame? Agree/disagree <br>

In [1]:
import numpy as np
import pandas as pd

import utility as util

#### 1) Daoust experiment

In [2]:
df_dst = pd.read_csv('../output/df_dst.csv')

In [3]:
df_dst.shape

(4633, 8)

In [4]:
df_dst.head()

Unnamed: 0,id,condition,visit,over,outdoors,sex,marital,age_group
0,1,B,9,9,1,1.0,3.0,7.0
1,2,A,9,1,9,1.0,1.0,7.0
2,3,A,9,9,9,1.0,1.0,7.0
3,4,A,9,9,1,1.0,3.0,7.0
4,5,B,9,9,9,1.0,1.0,7.0


In [5]:
# numeric -> character display
SD_cols = ['visit', 'over', 'outdoors']

for col in SD_cols:
    df_dst[col] = df_dst[[col]].replace([1, 2, 9], ["1. Yes", "2. Only when necessary/occasionally", "9. No"])

In [6]:
for col in SD_cols:
    util.crosstab_chisq(col, 'condition', df_dst, chisqtest=False)

condition,A,B
visit,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,61.6,62.5
2. Only when necessary/occasionally,0.0,9.3
9. No,38.4,28.2
Total n,2310.0,2323.0


-----

condition,A,B
over,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,64.4,67.7
2. Only when necessary/occasionally,0.0,10.5
9. No,35.6,21.8
Total n,2310.0,2323.0


-----

condition,A,B
outdoors,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,68.2,68.7
2. Only when necessary/occasionally,0.0,9.2
9. No,31.8,22.1
Total n,2310.0,2323.0


-----

In [7]:
# combine top two response options into "yes"
combine_top_two = {"1. Yes": "1. Yes", 
                   "2. Only when necessary/occasionally": "1. Yes", 
                   "9. No": "9. No"}

In [8]:
for col in SD_cols:
    df_dst[f'{col}_r'] = df_dst[col].map(combine_top_two)

In [9]:
recoded_cols = ['visit_r', 'over_r', 'outdoors_r']

In [10]:
for col in recoded_cols:
    util.crosstab_chisq(col, 'condition', df_dst)

condition,A,B
visit_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,61.6,71.8
9. No,38.4,28.2
Total n,2310.0,2323.0


*Chi-squared statistic = 54.3, degree of freedom = 1, p = 0.0*

-----

condition,A,B
over_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,64.4,78.2
9. No,35.6,21.8
Total n,2310.0,2323.0


*Chi-squared statistic = 107.1, degree of freedom = 1, p = 0.0*

-----

condition,A,B
outdoors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,68.2,77.9
9. No,31.8,22.1
Total n,2310.0,2323.0


*Chi-squared statistic = 54.8, degree of freedom = 1, p = 0.0*

-----

#### 2) Benchmark experiment

In [11]:
df_benchmark = pd.read_csv('../output/df_benchmark.csv')

In [12]:
df_benchmark.shape

(4633, 7)

In [13]:
df_benchmark.head()

Unnamed: 0,id,condition,blame,sex,marital,age_group,education
0,1,B,2,1.0,3.0,7.0,5.0
1,2,A,2,1.0,1.0,7.0,5.0
2,3,B,2,1.0,1.0,7.0,3.0
3,4,A,1,1.0,3.0,7.0,3.0
4,5,B,2,1.0,1.0,7.0,5.0


In [14]:
util.crosstab_chisq('blame', 'condition', df_benchmark)

condition,A,B
blame,Unnamed: 1_level_1,Unnamed: 2_level_1
1,79.9,22.2
2,20.1,77.8
Total n,2338.0,2295.0


*Chi-squared statistic = 1542.1, degree of freedom = 1, p = 0.0*

-----