## Social desirability experiment data analysis

This notebook documents the analysis of the social desirability experiment

2 X 2 fully-crossed between-subject experiment: <br>
* Question wording:
  + no excuse statement
  + with excuse statement <br>
  
* Response options:
  + Yes/ No <br>
  + Yes with any reason/ Yes only when neccessary/ No <br>

* Conditions:
  + Control: no excuse statement, Yes/No
  + Condition A: with excuse statement, Yes/No
  + Condition B: no excuse statement, Yes with any reason/ Yes only when neccessary/ No
  + Condition C: with excuse statement, Yes with any reason/ Yes only when neccessary/ No <br><br>
  
* To test the overall effect of:
  + excuse statement: Control + B vs. A + C
  + response options: Control + A vs. B + C

In [1]:
import numpy as np
import pandas as pd

import utility as util

import rpy2.robjects.numpy2ri
from rpy2.robjects.packages import importr
rpy2.robjects.numpy2ri.activate()
stats = importr('stats')

In [2]:
df = pd.read_csv('../output/SD_experiment_df.csv')

In [3]:
df.shape

(639, 12)

In [4]:
df.columns

Index(['ID', 'vaccine', 'mandate', 'gender', 'marital', 'age_group',
       'education', 'gone_to_friend', 'had_visitors', 'had_close_contact',
       'gone_outside', 'condition'],
      dtype='object')

In [5]:
df['condition'].value_counts().sort_index()

A          157
B          162
C          157
Control    163
Name: condition, dtype: int64

In [6]:
conditions = ['Control', 'A', 'B', 'C']

In [7]:
# numeric -> character display
display_change_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in display_change_cols:
    df[col] = df[[col]].replace([1, 2, 3, 4], ["1. Yes/Yes any time", "2. Yes only when neccessary", "3. No", "4. Unsure"])

In [8]:
util.crosstab_percent_table('gone_to_friend', 'condition', df, conditions)

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,60.7,58.6,48.1,40.8
2. Yes only when neccessary,0.0,0.0,16.7,21.7
3. No,39.3,40.1,35.2,36.9
4. Unsure,0.0,1.3,0.0,0.6
Total n,163.0,157.0,162.0,157.0


In [None]:
%%time
stats.fisher_test(pd.crosstab(df['gone_to_friend'], df['condition']).values, workspace = 2e9)[0][0]

In [13]:
util.crosstab_percent_table('had_visitors', 'condition', df, conditions)

condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,50.9,57.3,44.4,44.6
2. Yes only when neccessary,0.0,0.0,16.7,19.7
3. No,49.1,42.0,38.3,34.4
4. Unsure,0.0,0.6,0.6,1.3
Total n,163.0,157.0,162.0,157.0


In [14]:
util.crosstab_percent_table('had_close_contact', 'condition', df, conditions)

condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,71.2,80.3,48.8,44.6
2. Yes only when neccessary,0.0,0.0,30.9,33.1
3. No,25.2,19.1,19.8,19.7
4. Unsure,3.7,0.6,0.6,2.5
Total n,163.0,157.0,162.0,157.0


In [15]:
util.crosstab_percent_table('gone_outside', 'condition', df, conditions)

condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,81.0,86.0,77.2,73.9
2. Yes only when neccessary,0.0,0.0,8.6,8.9
3. No,16.6,13.4,14.2,15.9
4. Unsure,2.5,0.6,0.0,1.3
Total n,163.0,157.0,162.0,157.0


#### TO-DO: 1) Fisher's exact test for overall contingency tables, 3) do subgroup contrast effect - effect of question wording and effect of response options