## Social desirability experiment data analysis

This notebook documents the analysis of the social desirability experiment

2 X 2 fully-crossed between-subject experiment: <br>
* Question wording:
  + no excuse statement
  + with excuse statement <br>
  
* Response options:
  + Yes/ No <br>
  + Yes with any reason/ Yes only when neccessary/ No <br>

* Conditions:
  + Control: no excuse statement, Yes/No
  + Condition A: with excuse statement, Yes/No
  + Condition B: no excuse statement, Yes with any reason/ Yes only when neccessary/ No
  + Condition C: with excuse statement, Yes with any reason/ Yes only when neccessary/ No <br><br>
  
* To test the overall effect of:
  + excuse statement: Control + B vs. A + C
  + response options: Control + A vs. B + C

In [1]:
import numpy as np
import pandas as pd

import utility as util

In [2]:
df = pd.read_csv('../output/SD_experiment_df.csv')

In [3]:
df.shape

(639, 12)

In [4]:
df.columns

Index(['ID', 'vaccine', 'mandate', 'gender', 'marital', 'age_group',
       'education', 'gone_to_friend', 'had_visitors', 'had_close_contact',
       'gone_outside', 'condition'],
      dtype='object')

In [5]:
df['condition'].value_counts().sort_index()

A          157
B          162
C          157
Control    163
Name: condition, dtype: int64

In [6]:
# numeric -> character display
display_change_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in display_change_cols:
    df[col] = df[[col]].replace([1, 2, 3, 4], ["1. Yes/Yes any time", "2. Yes only when neccessary", "3. No", "4. Unsure"])

In [7]:
# combine top two response options into "yes"
combine_top_two = {"1. Yes/Yes any time": "1. Yes", 
                   "2. Yes only when neccessary": "1. Yes", 
                   "3. No": "2. No",
                   "4. Unsure": "3. Unsure"}

In [8]:
# create recodes that reflect the combinations
SD_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in SD_cols:
    df[f'{col}_r'] = df[col].map(combine_top_two)

In [9]:
conditions = ['Control', 'A', 'B', 'C']

In [10]:
# create a version of the data that excludes "unsure"
df_no_miss = df[(df['gone_to_friend'] != '4. Unsure') & (df['had_visitors'] != '4. Unsure') & 
                (df['had_close_contact'] != '4. Unsure') & (df['gone_outside'] != '4. Unsure')]

In [11]:
df_no_miss.shape

(620, 16)

#### 1) original crosstabs

In [12]:
for col in SD_cols:
    util.crosstab_percent_table(col, 'condition', df, conditions)
    print('\n')

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,60.7,58.6,48.1,40.8
2. Yes only when neccessary,0.0,0.0,16.7,21.7
3. No,39.3,40.1,35.2,36.9
4. Unsure,0.0,1.3,0.0,0.6
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,50.9,57.3,44.4,44.6
2. Yes only when neccessary,0.0,0.0,16.7,19.7
3. No,49.1,42.0,38.3,34.4
4. Unsure,0.0,0.6,0.6,1.3
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,71.2,80.3,48.8,44.6
2. Yes only when neccessary,0.0,0.0,30.9,33.1
3. No,25.2,19.1,19.8,19.7
4. Unsure,3.7,0.6,0.6,2.5
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,81.0,86.0,77.2,73.9
2. Yes only when neccessary,0.0,0.0,8.6,8.9
3. No,16.6,13.4,14.2,15.9
4. Unsure,2.5,0.6,0.0,1.3
Total n,163.0,157.0,162.0,157.0






#### 2) crosstabs excluding unsure 

In [13]:
for col in SD_cols:
    util.crosstab_percent_table(col, 'condition', df_no_miss, conditions)
    print('\n')

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,60.8,59.4,47.8,41.7
2. Yes only when neccessary,0.0,0.0,16.8,22.5
3. No,39.2,40.6,35.4,35.8
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,52.9,58.1,44.7,46.4
2. Yes only when neccessary,0.0,0.0,16.8,20.5
3. No,47.1,41.9,38.5,33.1
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,73.2,81.3,49.1,45.0
2. Yes only when neccessary,0.0,0.0,31.1,34.4
3. No,26.8,18.7,19.9,20.5
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,82.4,86.5,77.0,74.2
2. Yes only when neccessary,0.0,0.0,8.7,9.3
3. No,17.6,13.5,14.3,16.6
Total n,153.0,155.0,161.0,151.0






#### 3) crosstabs with top two answers combined

In [14]:
recoded_cols = ['gone_to_friend_r', 'had_visitors_r', 'had_close_contact_r', 'gone_outside_r']

In [15]:
for col in recoded_cols:
    util.crosstab_percent_table(col, 'condition', df, conditions)
    print('\n')

condition,Control,A,B,C
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,60.7,58.6,64.8,62.4
2. No,39.3,40.1,35.2,36.9
3. Unsure,0.0,1.3,0.0,0.6
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,50.9,57.3,61.1,64.3
2. No,49.1,42.0,38.3,34.4
3. Unsure,0.0,0.6,0.6,1.3
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,71.2,80.3,79.6,77.7
2. No,25.2,19.1,19.8,19.7
3. Unsure,3.7,0.6,0.6,2.5
Total n,163.0,157.0,162.0,157.0






condition,Control,A,B,C
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,81.0,86.0,85.8,82.8
2. No,16.6,13.4,14.2,15.9
3. Unsure,2.5,0.6,0.0,1.3
Total n,163.0,157.0,162.0,157.0






#### 4) crosstabs with top two answers combined and unsure excluded

In [16]:
for col in recoded_cols:
    util.crosstab_percent_table(col, 'condition', df_no_miss, conditions)
    print('\n')

condition,Control,A,B,C
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,60.8,59.4,64.6,64.2
2. No,39.2,40.6,35.4,35.8
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,52.9,58.1,61.5,66.9
2. No,47.1,41.9,38.5,33.1
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,73.2,81.3,80.1,79.5
2. No,26.8,18.7,19.9,20.5
Total n,153.0,155.0,161.0,151.0






condition,Control,A,B,C
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes,82.4,86.5,85.7,83.4
2. No,17.6,13.5,14.3,16.6
Total n,153.0,155.0,161.0,151.0




