## Descriptive analysis of FB and UAS items

This notebook runs on the full FB survey data (n=1777). Please note that some analyses may run on subsets of this data. <br>
Author: Rosalynn Yang <br>
Date: 11/29/2020

In [1]:
import numpy as np
import pandas as pd
from IPython.display import display, Markdown

import utility as util

In [2]:
fb = pd.read_csv("../output/fb_numeric.csv")
uas = pd.read_csv("../output/uas_numeric.csv")

In [3]:
fb.shape

(1777, 43)

In [4]:
uas.shape

(6407, 17)

In [5]:
# manually exclude those <18 yrs old
fb = fb.drop(fb[fb['Q11'] == 1].index)

In [6]:
fb.shape

(1773, 43)

### Crosstabs of FB items and Image

*Mental health: Q1, Q2 <br>
Data privary: Q3, Q3_1, Q4 <br>
COVID19: Q5, Q6 <br>
Finance: Q8*

In [7]:
fb_items_cols = ['Q1', 'Q2', 'Q3', 'Q3_1', 'Q4', 'Q5', 'Q6', 'Q8']
col_names = ['Neutral(%)', 'COVID(%)', 'Data Privacy(%)', 'Finance(%)', 'Mental Health(%)']

In [8]:
for col in fb_items_cols:
    util.crosstab_chisq(col, 'Image', fb, col_names)

#### Crosstab of Q1 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,56.6,40.0,43.5,44.4,26.7
2.0,24.9,29.5,33.9,35.1,34.7
3.0,8.7,10.3,11.3,7.9,14.4
4.0,9.8,20.2,11.3,12.6,24.2
Total n,366.0,613.0,62.0,151.0,236.0


*Chi-squared statistic = 68.9, degree of freedom = 12, p = 0.0*

-----

#### Crosstab of Q2 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,57.1,47.4,50.0,53.6,32.6
2.0,27.6,30.4,29.0,29.1,36.0
3.0,7.1,9.5,11.3,7.3,13.6
4.0,8.2,12.7,9.7,9.9,17.8
Total n,366.0,612.0,62.0,151.0,236.0


*Chi-squared statistic = 42.0, degree of freedom = 12, p = 0.0*

-----

#### Crosstab of Q3 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,42.9,57.3,41.4,45.2,39.8
2.0,15.8,7.1,17.2,8.2,13.3
3.0,16.9,14.6,17.2,20.5,21.2
4.0,11.9,11.5,13.8,15.1,14.2
5.0,12.4,9.5,10.3,11.0,11.5
Total n,177.0,295.0,29.0,73.0,113.0


*Chi-squared statistic = 22.6, degree of freedom = 16, p = 0.125*

-----

#### Crosstab of Q3_1 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q3_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,55.4,59.3,72.7,47.9,31.5
2.0,11.4,9.4,6.1,13.7,17.6
3.0,14.9,15.2,9.1,16.4,19.4
4.0,12.6,7.4,3.0,12.3,20.4
5.0,5.7,8.8,9.1,9.6,11.1
Total n,175.0,297.0,33.0,73.0,108.0


*Chi-squared statistic = 39.3, degree of freedom = 16, p = 0.001*

-----

#### Crosstab of Q4 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,54.5,59.7,67.7,47.3,37.8
2.0,13.1,6.7,3.2,8.2,16.7
3.0,11.6,14.7,9.7,19.9,21.2
4.0,12.8,10.3,9.7,11.0,13.5
5.0,8.0,8.6,9.7,13.7,10.8
Total n,352.0,593.0,62.0,146.0,222.0


*Chi-squared statistic = 58.9, degree of freedom = 16, p = 0.0*

-----

#### Crosstab of Q5 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q5,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,38.2,41.1,53.2,29.7,20.5
2.0,11.4,8.4,9.7,3.9,9.8
3.0,14.1,11.8,8.1,17.4,10.3
4.0,29.9,29.1,22.6,40.6,49.1
5.0,6.4,9.5,6.5,8.4,10.3
Total n,361.0,608.0,62.0,155.0,234.0


*Chi-squared statistic = 70.5, degree of freedom = 16, p = 0.0*

-----

#### Crosstab of Q6 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q6,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,56.8,55.8,50.0,56.1,55.6
2.0,26.9,30.6,37.1,31.0,25.6
3.0,16.3,13.5,12.9,12.9,18.8
Total n,361.0,607.0,62.0,155.0,234.0


*Chi-squared statistic = 8.3, degree of freedom = 8, p = 0.407*

-----

#### Crosstab of Q8 and Image

Unnamed: 0_level_0,Neutral(%),COVID(%),Data Privacy(%),Finance(%),Mental Health(%)
Q8,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,8.5,11.8,14.3,6.6,9.7
2.0,88.1,84.4,85.7,90.1,86.7
3.0,3.4,3.8,0.0,3.3,3.5
Total n,354.0,610.0,63.0,151.0,226.0


*Chi-squared statistic = 8.6, degree of freedom = 8, p = 0.379*

-----

### UAS weighted vs. FB unweighted (all)

In [9]:
fb_uas_cols = ['Q1', 'Q2', 'Q5', 'Q6', 'Q8']
uas_cols = ['cr027a', 'cr027c', 'cr030', 'cr018a', 'ei002']

In [10]:
fb_freq_tables = {}
for col in fb_uas_cols:
    fb_freq_tables[col] = util.freq_percent_table(col, fb)

In [11]:
uas_freq_tables = {}
for col in uas_cols:
    uas_freq_tables[col] = util.weighted_freq_percent_table(col, 'final_weight', uas)

In [12]:
comparison_tables = {}
for fb_uas_col, uas_col in zip(fb_uas_cols, uas_cols):
    temp_df = pd.concat([uas_freq_tables[uas_col], fb_freq_tables[fb_uas_col]], axis=1, ignore_index=True)
    temp_df.columns = ['UAS Sample (%)', 'Facebook Ad Image Sample (%)']
    comparison_tables[fb_uas_col] = temp_df

In [13]:
for fb_uas_col in fb_uas_cols:
    display(Markdown(f"#### Comparing responses from {fb_uas_col}"))
    display(comparison_tables[fb_uas_col])

#### Comparing responses from Q1

Unnamed: 0_level_0,UAS Sample (%),Facebook Ad Image Sample (%)
cr027a,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,59.6,42.6
2.0,29.2,30.0
3.0,6.2,10.4
4.0,4.9,17.0
Total n,6287.57,1428.0


#### Comparing responses from Q2

Unnamed: 0_level_0,UAS Sample (%),Facebook Ad Image Sample (%)
cr027c,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,68.5,48.2
2.0,22.8,30.4
3.0,4.9,9.4
4.0,3.9,12.0
Total n,6288.34,1427.0


#### Comparing responses from Q5

Unnamed: 0_level_0,UAS Sample (%),Facebook Ad Image Sample (%)
cr030,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,16.8,36.3
2.0,9.0,8.9
3.0,19.3,12.6
4.0,43.1,33.6
5.0,11.8,8.6
Total n,6309.19,1420.0


#### Comparing responses from Q6

Unnamed: 0_level_0,UAS Sample (%),Facebook Ad Image Sample (%)
cr018a,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,45.6,55.8
2.0,39.8,29.2
3.0,14.5,15.0
Total n,6340.35,1419.0


#### Comparing responses from Q8

Unnamed: 0_level_0,UAS Sample (%),Facebook Ad Image Sample (%)
ei002,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,7.2,10.2
2.0,90.0,86.4
3.0,2.8,3.4
Total n,6277.19,1404.0


### FB pooled vs. FB(control)

In [14]:
fb['Image'].value_counts().sort_index()

1    466
2    747
3     76
4    183
5    301
Name: Image, dtype: int64

In [15]:
fb_control = fb.loc[fb['Image']==1,:]
fb_others= fb.loc[fb['Image']>1,:]

In [16]:
fb_control_freq_tables = {}
for col in fb_uas_cols:
    fb_control_freq_tables[col] = util.freq_percent_table(col, fb_control)

In [17]:
fb_others_freq_tables = {}
for col in fb_uas_cols:
    fb_others_freq_tables[col] = util.freq_percent_table(col, fb_others)

In [18]:
fb_comparison_tables = {}
for fb_uas_col in fb_uas_cols:
    temp_df = pd.concat([fb_others_freq_tables[fb_uas_col], fb_control_freq_tables[fb_uas_col]], axis=1, ignore_index=True)
    temp_df.columns = ['All Other Images (%)', 'Neutral Image (%)']
    fb_comparison_tables[fb_uas_col] = temp_df

In [19]:
for fb_uas_col in fb_uas_cols:
    display(Markdown(f"#### Comparing responses from {fb_uas_col}"))
    display(fb_comparison_tables[fb_uas_col])

#### Comparing responses from Q1

Unnamed: 0,All Other Images (%),Neutral Image (%)
1.0,37.9,56.6
2.0,31.7,24.9
3.0,10.9,8.7
4.0,19.5,9.8
Total n,1062.0,366.0


#### Comparing responses from Q2

Unnamed: 0,All Other Images (%),Neutral Image (%)
1.0,45.1,57.1
2.0,31.4,27.6
3.0,10.2,7.1
4.0,13.3,8.2
Total n,1061.0,366.0


#### Comparing responses from Q5

Unnamed: 0,All Other Images (%),Neutral Image (%)
1.0,35.6,38.2
2.0,8.1,11.4
3.0,12.1,14.1
4.0,34.8,29.9
5.0,9.3,6.4
Total n,1059.0,361.0


#### Comparing responses from Q6

Unnamed: 0,All Other Images (%),Neutral Image (%)
1.0,55.5,56.8
2.0,30.0,26.9
3.0,14.6,16.3
Total n,1058.0,361.0


#### Comparing responses from Q8

Unnamed: 0,All Other Images (%),Neutral Image (%)
1.0,10.8,8.5
2.0,85.8,88.1
3.0,3.4,3.4
Total n,1050.0,354.0


### UAS weighted demographics

In [20]:
uas_demographics = ['gender', 'maritalstatus', 'age_cat', 'education']

In [21]:
uas_demo_tables = {}
for col in uas_demographics:
    uas_demo_tables[col] = util.weighted_freq_percent_table(col, 'final_weight', uas)

In [22]:
for col in uas_demographics:
    display(Markdown(f"#### Weighted {col} from UAS"))
    display(uas_demo_tables[col])

#### Weighted gender from UAS

Unnamed: 0_level_0,final_weight
gender,Unnamed: 1_level_1
0 Female,51.7
1 Male,48.3
Total n,6407.0


#### Weighted maritalstatus from UAS

Unnamed: 0_level_0,final_weight
maritalstatus,Unnamed: 1_level_1
1 Married (spouse lives with me),54.7
2 Married (spouse lives elsewhere),1.1
3 Separated,2.0
4 Divorced,12.8
5 Widowed,4.4
6 Never married,24.9
Total n,6401.92


#### Weighted age_cat from UAS

Unnamed: 0_level_0,final_weight
age_cat,Unnamed: 1_level_1
1.0,6.7
2.0,19.6
3.0,21.4
4.0,16.0
5.0,18.7
6.0,17.7
Total n,6404.06


#### Weighted education from UAS

Unnamed: 0_level_0,final_weight
education,Unnamed: 1_level_1
1 Less than 1st grade,0.1
10 Some college-no degree,16.7
11 Assoc. college degree-occ/voc prog,5.7
12 Assoc. college degree-academic prog,5.3
13 Bachelor's degree,19.2
14 Master's degree,11.0
15 Professional school degree,1.9
16 Doctorate degree,2.2
2 Up to 4th grade,0.1
3 5th or 6th grade,0.3
