## Social desirability experiment data analysis

This notebook documents the analysis of the social desirability experiment

2 X 2 fully-crossed between-subject experiment: <br>
* Question wording:
  + no excuse statement
  + with excuse statement <br> <br>
  
* Response options:
  + Yes/ No <br>
  + Yes with any reason/ Yes only when neccessary/ No <br>

* Conditions:
  + Control: no excuse statement, Yes/No
  + Condition A: with excuse statement, Yes/No
  + Condition B: no excuse statement, Yes with any reason/ Yes only when neccessary/ No
  + Condition C: with excuse statement, Yes with any reason/ Yes only when neccessary/ No <br><br>
  
* To test the overall effect of:
  + excuse statement: Control + B vs. A + C
  + response options: Control + A vs. B + C

In [2]:
import numpy as np
import pandas as pd

import utility as util

In [3]:
df = pd.read_csv('../output/SD_experiment_df.csv', skipinitialspace=True)

In [4]:
df.shape

(637, 12)

In [5]:
df.columns

Index(['ID', 'vaccine', 'mandate', 'gender', 'marital', 'age_group',
       'education', 'gone_to_friend', 'had_visitors', 'had_close_contact',
       'gone_outside', 'condition'],
      dtype='object')

In [6]:
df['condition'].value_counts().sort_index()

A          157
B          160
C          157
Control    163
Name: condition, dtype: int64

In [7]:
# numeric -> character display
display_change_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in display_change_cols:
    df[col] = df[[col]].replace([1, 2, 3, 4], ["1. Yes/Yes any time", "2. Yes only when neccessary", "3. No", "4. Unsure"])

In [8]:
# combine top two response options into "yes"
combine_top_two = {"1. Yes/Yes any time": "1. Yes", 
                   "2. Yes only when neccessary": "1. Yes", 
                   "3. No": "2. No",
                   "4. Unsure": "3. Unsure"}

In [9]:
# create recodes that reflect the combinations
SD_cols = ['gone_to_friend', 'had_visitors', 'had_close_contact', 'gone_outside']

for col in SD_cols:
    df[f'{col}_r'] = df[col].map(combine_top_two)

In [10]:
recoded_cols = ['gone_to_friend_r', 'had_visitors_r', 'had_close_contact_r', 'gone_outside_r']

In [11]:
# create two separate factor variables
excuse_map = {'Control': 'no_excuse',
              'B': 'no_excuse',
              'A': 'with_excuse', 
              'C': 'with_excuse'}

response_map = {'Control': 'two_response',
                'A': 'two_response',
                'B': 'three_response', 
                'C': 'three_response'}

In [12]:
df['excuse_statement_condition'] = df['condition'].map(excuse_map)
df['response_set_condition'] = df['condition'].map(response_map)

In [13]:
# use variable label for table display
gender_map = {1.0: '1. Woman',
              2.0: '2. Man',
              np.nan: np.nan}

marital_map = {1.0: '1. Married',
               7.0: '2. Not married',
               8.0: np.nan,
               np.nan: np.nan}

age_group_map = {2.0: '19-25',
                 3.0: '26-35',
                 4.0: '36-45', 
                 5.0: '46-55',
                 6.0: '56-65',
                 7.0: '66+',
                 np.nan: np.nan}

education_map = {4.0: '1. Less than high school diploma',
                 5.0: '2. High school diploma',
                 6.0: '3. Some college',
                 7.0: '4. Bachelor degree',
                 8.0: '5. Graduate degree',
                 np.nan: np.nan}

In [14]:
df['gender'] = df['gender'].map(gender_map)
df['marital'] = df['marital'].map(marital_map)
df['age_group'] = df['age_group'].map(age_group_map)
df['education'] = df['education'].map(education_map)

In [15]:
# create a version of the data that excludes "unsure"
df_no_miss = df[(df['gone_to_friend'] != '4. Unsure') & (df['had_visitors'] != '4. Unsure') & 
                (df['had_close_contact'] != '4. Unsure') & (df['gone_outside'] != '4. Unsure')]

In [16]:
df_no_miss.shape

(618, 18)

### Full crosstab tables

In [17]:
conditions = ['Control', 'A', 'B', 'C']

In [18]:
for col in SD_cols:
    util.crosstab_chisq(col, 'condition', df_no_miss, conditions, chisqtest=False)

condition,Control,A,B,C
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,60.8,59.4,48.4,41.7
2. Yes only when neccessary,0.0,0.0,15.7,22.5
3. No,39.2,40.6,35.8,35.8
Total n,153.0,155.0,159.0,151.0


-----

condition,Control,A,B,C
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,52.9,58.1,45.3,46.4
2. Yes only when neccessary,0.0,0.0,16.4,20.5
3. No,47.1,41.9,38.4,33.1
Total n,153.0,155.0,159.0,151.0


-----

condition,Control,A,B,C
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,73.2,81.3,49.1,45.0
2. Yes only when neccessary,0.0,0.0,30.8,34.4
3. No,26.8,18.7,20.1,20.5
Total n,153.0,155.0,159.0,151.0


-----

condition,Control,A,B,C
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Yes/Yes any time,82.4,86.5,77.4,74.2
2. Yes only when neccessary,0.0,0.0,8.8,9.3
3. No,17.6,13.5,13.8,16.6
Total n,153.0,155.0,159.0,151.0


-----

### Testing the effect of question wording

Question wording:
  + no excuse statement
  + with excuse statement

In [19]:
excuse_statement_order = ['no_excuse', 'with_excuse']

In [20]:
for col in recoded_cols:
    util.crosstab_chisq(col, 'excuse_statement_condition', df_no_miss, excuse_statement_order)

excuse_statement_condition,no_excuse,with_excuse
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,62.5,61.8
2. No,37.5,38.2
Total n,312.0,306.0


*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.916*

-----

excuse_statement_condition,no_excuse,with_excuse
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,57.4,62.4
2. No,42.6,37.6
Total n,312.0,306.0


*Chi-squared statistic = 1.4, degree of freedom = 1, p = 0.231*

-----

excuse_statement_condition,no_excuse,with_excuse
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,76.6,80.4
2. No,23.4,19.6
Total n,312.0,306.0


*Chi-squared statistic = 1.1, degree of freedom = 1, p = 0.295*

-----

excuse_statement_condition,no_excuse,with_excuse
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,84.3,85.0
2. No,15.7,15.0
Total n,312.0,306.0


*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.904*

-----

### Testing the effect of response set

Response set:
  + two response set: Yes/ No <br>
  + three response set: Yes with any reason/ Yes only when neccessary/ No <br>

In [21]:
response_set_order = ['two_response', 'three_response']

In [22]:
for col in SD_cols:
    util.crosstab_chisq(col, 'response_set_condition', df_no_miss, response_set_order, chisqtest=False)

response_set_condition,two_response,three_response
gone_to_friend,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,60.1,45.2
2. Yes only when neccessary,0.0,19.0
3. No,39.9,35.8
Total n,308.0,310.0


-----

response_set_condition,two_response,three_response
had_visitors,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,55.5,45.8
2. Yes only when neccessary,0.0,18.4
3. No,44.5,35.8
Total n,308.0,310.0


-----

response_set_condition,two_response,three_response
had_close_contact,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,77.3,47.1
2. Yes only when neccessary,0.0,32.6
3. No,22.7,20.3
Total n,308.0,310.0


-----

response_set_condition,two_response,three_response
gone_outside,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes/Yes any time,84.4,75.8
2. Yes only when neccessary,0.0,9.0
3. No,15.6,15.2
Total n,308.0,310.0


-----

In [23]:
for col in recoded_cols:
    util.crosstab_chisq(col, 'response_set_condition', df_no_miss, response_set_order)

response_set_condition,two_response,three_response
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,60.1,64.2
2. No,39.9,35.8
Total n,308.0,310.0


*Chi-squared statistic = 1.0, degree of freedom = 1, p = 0.33*

-----

response_set_condition,two_response,three_response
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,55.5,64.2
2. No,44.5,35.8
Total n,308.0,310.0


*Chi-squared statistic = 4.5, degree of freedom = 1, p = 0.034*

-----

response_set_condition,two_response,three_response
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,77.3,79.7
2. No,22.7,20.3
Total n,308.0,310.0


*Chi-squared statistic = 0.4, degree of freedom = 1, p = 0.529*

-----

response_set_condition,two_response,three_response
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,84.4,84.8
2. No,15.6,15.2
Total n,308.0,310.0


*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.973*

-----

### Comparing control vs. C

In [24]:
df_no_miss_controlC = df_no_miss[df_no_miss["condition"].isin(["Control", "C"])]

In [30]:
conditions_control_C = ['Control', 'C']

In [31]:
for col in recoded_cols:
    util.crosstab_chisq(col, 'condition', df_no_miss_controlC, conditions_control_C)

condition,Control,C
gone_to_friend_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,60.8,64.2
2. No,39.2,35.8
Total n,153.0,151.0


*Chi-squared statistic = 0.3, degree of freedom = 1, p = 0.615*

-----

condition,Control,C
had_visitors_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,52.9,66.9
2. No,47.1,33.1
Total n,153.0,151.0


*Chi-squared statistic = 5.6, degree of freedom = 1, p = 0.018*

-----

condition,Control,C
had_close_contact_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,73.2,79.5
2. No,26.8,20.5
Total n,153.0,151.0


*Chi-squared statistic = 1.3, degree of freedom = 1, p = 0.25*

-----

condition,Control,C
gone_outside_r,Unnamed: 1_level_1,Unnamed: 2_level_1
1. Yes,82.4,83.4
2. No,17.6,16.6
Total n,153.0,151.0


*Chi-squared statistic = 0.0, degree of freedom = 1, p = 0.92*

-----

### Demographic distributions

In [23]:
demographic_cols = ['gender', 'marital', 'age_group', 'education']

In [24]:
for col in demographic_cols:
    util.crosstab_chisq(col, 'condition', df_no_miss, conditions)

condition,Control,A,B,C
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Woman,65.1,55.5,56.7,58.0
2. Man,34.9,44.5,43.3,42.0
Total n,152.0,155.0,157.0,150.0


*Chi-squared statistic = 3.5, degree of freedom = 3, p = 0.315*

-----

condition,Control,A,B,C
marital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Married,70.0,72.5,78.3,78.7
2. Not married,30.0,27.5,21.7,21.3
Total n,150.0,153.0,157.0,150.0


*Chi-squared statistic = 4.5, degree of freedom = 3, p = 0.213*

-----

condition,Control,A,B,C
age_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
19-25,4.6,2.6,3.2,0.0
26-35,5.9,5.9,6.4,8.0
36-45,9.2,9.2,10.2,10.7
46-55,11.8,16.3,21.7,18.0
56-65,38.8,32.7,31.2,32.0
66+,29.6,33.3,27.4,31.3
Total n,152.0,153.0,157.0,150.0


*Chi-squared statistic = 14.5, degree of freedom = 15, p = 0.488*

-----

condition,Control,A,B,C
education,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1. Less than high school diploma,1.3,0.0,0.6,1.3
2. High school diploma,17.8,11.8,11.5,15.3
3. Some college,36.8,42.1,36.5,42.0
4. Bachelor degree,22.4,28.3,31.4,21.3
5. Graduate degree,21.7,17.8,19.9,20.0
Total n,152.0,152.0,156.0,150.0


*Chi-squared statistic = 11.0, degree of freedom = 12, p = 0.533*

-----