## Social desirability experiment data analysis

This notebook documents the analysis of the social desirability experiment

2 X 2 fully-crossed between-subject experiment: <br>
* Question wording:
  + no excuse statement
  + with excuse statement <br>
  
* Response options:
  + Yes/ No <br>
  + Yes with any reason/ Yes only when neccessary/ No <br>

* Conditions:
  + Control: no excuse statement, Yes/No
  + Condition A: with excuse statement, Yes/No
  + Condition B: no excuse statement, Yes with any reason/ Yes only when neccessary/ No
  + Condition C: with excuse statement, Yes with any reason/ Yes only when neccessary/ No <br><br>
  
* To test the overall effect of:
  + excuse statement: Control + B vs. A + C
  + response options: Control + A vs. B + C

In [1]:
import numpy as np
import pandas as pd

import utility as util

In [2]:
df = pd.read_csv('../output/SD_experiment_df.csv', skipinitialspace=True)

In [3]:
df.shape

(637, 12)

In [4]:
df.columns

Index(['ID', 'vaccine', 'mandate', 'gender', 'marital', 'age_group',
       'education', 'gone_to_friend', 'had_visitors', 'had_close_contact',
       'gone_outside', 'condition'],
      dtype='object')

In [5]:
df['condition'].value_counts().sort_index()

A          157
B          160
C          157
Control    163
Name: condition, dtype: int64

In [6]:
# numeric -> character display
display_change_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in display_change_cols:
    df[col] = df[[col]].replace([1, 2, 3, 4], ["1. Yes/Yes any time", "2. Yes only when neccessary", "3. No", "4. Unsure"])

In [7]:
# combine top two response options into "yes"
combine_top_two = {"1. Yes/Yes any time": "1. Yes", 
                   "2. Yes only when neccessary": "1. Yes", 
                   "3. No": "2. No",
                   "4. Unsure": "3. Unsure"}

In [8]:
# create recodes that reflect the combinations
SD_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in SD_cols:
    df[f'{col}_r'] = df[col].map(combine_top_two)

In [9]:
conditions = ['Control', 'A', 'B', 'C']

In [10]:
# create two separate factor variables
excuse_map = {'Control': 'no_excuse',
              'B': 'no_excuse',
              'A': 'with_excuse', 
              'C': 'with_excuse'}

In [11]:
response_map = {'Control': 'two_response',
                'A': 'two_response',
                'B': 'three_response', 
                'C': 'three_response'}

In [12]:
df['excuse_statement_condition'] = df['condition'].map(excuse_map)

In [13]:
df['response_set_condition'] = df['condition'].map(response_map)

In [14]:
# use variable label for table display
gender_map = {1.0: '1. Woman',
              2.0: '2. Man',
              np.nan: np.nan}

marital_map = {1.0: '1. Married',
               7.0: '2. Not married',
               8.0: np.nan,
               np.nan: np.nan}

age_group_map = {2.0: '19-25',
                 3.0: '26-35',
                 4.0: '36-45', 
                 5.0: '46-55',
                 6.0: '56-65',
                 7.0: '66+',
                 np.nan: np.nan}

education_map = {4.0: '1. Less than high school diploma',
                 5.0: '2. High school diploma',
                 6.0: '3. Some college',
                 7.0: '4. Bachelor degree',
                 8.0: '5. Graduate degree',
                 np.nan: np.nan}

In [15]:
df['gender'] = df['gender'].map(gender_map)
df['marital'] = df['marital'].map(marital_map)
df['age_group'] = df['age_group'].map(age_group_map)
df['education'] = df['education'].map(education_map)

In [16]:
# create a version of the data that excludes "unsure"
df_no_miss = df[(df['gone_to_friend'] != '4. Unsure') & (df['had_visitors'] != '4. Unsure') & 
                (df['had_close_contact'] != '4. Unsure') & (df['gone_outside'] != '4. Unsure')]

In [17]:
df_no_miss.shape

(618, 18)

#### 1) original crosstabs

In [18]:
#for col in SD_cols:
    #util.crosstab_percent_table(col, 'condition', df, conditions)
    #print('\n')

#### 2) crosstabs excluding unsure 

In [19]:
for col in SD_cols:
    util.crosstab_percent_table(col, 'condition', df_no_miss, conditions)
    print('\n')

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,60.8,59.4,48.4,41.7
2. Yes only when neccessary,0.0,0.0,15.7,22.5
3. No,39.2,40.6,35.8,35.8
Total n,153.0,155.0,159.0,151.0






condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,52.9,58.1,45.3,46.4
2. Yes only when neccessary,0.0,0.0,16.4,20.5
3. No,47.1,41.9,38.4,33.1
Total n,153.0,155.0,159.0,151.0






condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,73.2,81.3,49.1,45.0
2. Yes only when neccessary,0.0,0.0,30.8,34.4
3. No,26.8,18.7,20.1,20.5
Total n,153.0,155.0,159.0,151.0






condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,82.4,86.5,77.4,74.2
2. Yes only when neccessary,0.0,0.0,8.8,9.3
3. No,17.6,13.5,13.8,16.6
Total n,153.0,155.0,159.0,151.0






#### 3) crosstabs with top two answers combined

In [20]:
recoded_cols = ['gone_to_friend_r', 'had_visitors_r', 'had_close_contact_r', 'gone_outside_r']

In [21]:
#for col in recoded_cols:
    #util.crosstab_percent_table(col, 'condition', df, conditions)
    #print('\n')

#### 4) crosstabs with top two answers combined and unsure excluded

In [22]:
#for col in recoded_cols:
    #util.crosstab_percent_table(col, 'condition', df_no_miss, conditions)
    #print('\n')

### Testing the effect of excuse statement

In [23]:
excuse_statement_order = ['no_excuse', 'with_excuse']

In [24]:
for col in recoded_cols:
    util.crosstab_percent_table(col, 'excuse_statement_condition', df_no_miss, excuse_statement_order, chisq_test=True)
    print('\n')

excuse_statement_condition,no_excuse,with_excuse
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,62.5,61.8
2. No,37.5,38.2
Total n,312.0,306.0


'*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.916*'





excuse_statement_condition,no_excuse,with_excuse
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,57.4,62.4
2. No,42.6,37.6
Total n,312.0,306.0


'*Chi-squared statistic = 1.4, degree of freedom = 1, p = 0.231*'





excuse_statement_condition,no_excuse,with_excuse
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,76.6,80.4
2. No,23.4,19.6
Total n,312.0,306.0


'*Chi-squared statistic = 1.1, degree of freedom = 1, p = 0.295*'





excuse_statement_condition,no_excuse,with_excuse
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,84.3,85.0
2. No,15.7,15.0
Total n,312.0,306.0


'*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.904*'





### Testing the effect of response set

In [25]:
response_set_order = ['two_response', 'three_response']

In [26]:
for col in SD_cols:
    util.crosstab_percent_table(col, 'response_set_condition', df_no_miss, response_set_order, chisq_test=False)
    print('\n')

response_set_condition,two_response,three_response
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,60.1,45.2
2. Yes only when neccessary,0.0,19.0
3. No,39.9,35.8
Total n,308.0,310.0






response_set_condition,two_response,three_response
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,55.5,45.8
2. Yes only when neccessary,0.0,18.4
3. No,44.5,35.8
Total n,308.0,310.0






response_set_condition,two_response,three_response
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,77.3,47.1
2. Yes only when neccessary,0.0,32.6
3. No,22.7,20.3
Total n,308.0,310.0






response_set_condition,two_response,three_response
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,84.4,75.8
2. Yes only when neccessary,0.0,9.0
3. No,15.6,15.2
Total n,308.0,310.0






In [27]:
for col in recoded_cols:
    util.crosstab_percent_table(col, 'response_set_condition', df_no_miss, response_set_order, chisq_test=True)
    print('\n')

response_set_condition,two_response,three_response
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,60.1,64.2
2. No,39.9,35.8
Total n,308.0,310.0


'*Chi-squared statistic = 1.0, degree of freedom = 1, p = 0.33*'





response_set_condition,two_response,three_response
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,55.5,64.2
2. No,44.5,35.8
Total n,308.0,310.0


'*Chi-squared statistic = 4.5, degree of freedom = 1, p = 0.034*'





response_set_condition,two_response,three_response
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,77.3,79.7
2. No,22.7,20.3
Total n,308.0,310.0


'*Chi-squared statistic = 0.4, degree of freedom = 1, p = 0.529*'





response_set_condition,two_response,three_response
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,84.4,84.8
2. No,15.6,15.2
Total n,308.0,310.0


'*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.973*'





### Demographic distribution

In [28]:
demographic_cols = ['gender', 'marital', 'age_group', 'education']

In [29]:
pd.crosstab(df_no_miss['gender'], df_no_miss['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Woman,99,86,89,87
2. Man,53,69,68,63


In [30]:
pd.crosstab(df_no_miss['marital'], df_no_miss['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
marital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Married,105,111,123,118
2. Not married,45,42,34,32


In [31]:
pd.crosstab(df_no_miss['age_group'], df_no_miss['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
age_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
19-25,7,4,5,0
26-35,9,9,10,12
36-45,14,14,16,16
46-55,18,25,34,27
56-65,59,50,49,48
66+,45,51,43,47


In [32]:
pd.crosstab(df_no_miss['education'], df_no_miss['condition']).reindex(conditions, axis="columns")

condition,Control,A,B,C
education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Less than high school diploma,2,0,1,2
2. High school diploma,27,18,18,23
3. Some college,56,64,57,63
4. Bachelor degree,34,43,49,32
5. Graduate degree,33,27,31,30
