## Social desirability experiment data analysis

This notebook documents the analysis of the social desirability experiment

2 X 2 fully-crossed between-subject experiment: <br>
* Question wording:
  + no excuse statement
  + with excuse statement <br>
  
* Response options:
  + Yes/ No <br>
  + Yes with any reason/ Yes only when neccessary/ No <br>

* Conditions:
  + Control: no excuse statement, Yes/No
  + Condition A: with excuse statement, Yes/No
  + Condition B: no excuse statement, Yes with any reason/ Yes only when neccessary/ No
  + Condition C: with excuse statement, Yes with any reason/ Yes only when neccessary/ No <br><br>
  
* To test the overall effect of:
  + excuse statement: Control + B vs. A + C
  + response options: Control + A vs. B + C

In [1]:
import numpy as np
import pandas as pd

import utility as util

In [2]:
df = pd.read_csv('../output/SD_experiment_df.csv')

In [3]:
df.shape

(639, 12)

In [4]:
df.columns

Index(['ID', 'vaccine', 'mandate', 'gender', 'marital', 'age_group',
       'education', 'gone_to_friend', 'had_visitors', 'had_close_contact',
       'gone_outside', 'condition'],
      dtype='object')

In [5]:
df['condition'].value_counts()

Control    163
B          162
C          157
A          157
Name: condition, dtype: int64

In [6]:
conditions = ['Control', 'A', 'B', 'C']

In [7]:
# numeric -> character display
display_change_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in display_change_cols:
    df[col] = df[[col]].replace([1, 2, 3, 4], ["1. Yes/Yes any time", "2. Yes only when neccessary", "3. No", "4. Unsure"])

In [8]:
pd.crosstab(df['gone_to_friend'], df['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,99,92,78,64
2. Yes only when neccessary,0,0,27,34
3. No,64,63,57,58
4. Unsure,0,2,0,1


In [39]:
pd.crosstab(df['had_visitors'], df['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,83,90,72,70
2. Yes only when neccessary,0,0,27,31
3. No,80,66,62,54
4. Unsure,0,1,1,2


In [40]:
pd.crosstab(df['had_close_contact'], df['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,116,126,79,70
2. Yes only when neccessary,0,0,50,52
3. No,41,30,32,31
4. Unsure,6,1,1,4


In [41]:
pd.crosstab(df['gone_outside'], df['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,132,135,125,116
2. Yes only when neccessary,0,0,14,14
3. No,27,21,23,25
4. Unsure,4,1,0,2


#### TO-DO: 1) Fisher's exact test for overall contingency tables, 2) show percentages in these tables (re-write a function for this purpose), 3) do subgroup contrast effect - effect of question wording and effect of response options