In [1]:
library(tidyr)
library(car)
library(emmeans)

# increase the number of characters that can be printed
options("width"=200)

Loading required package: carData



In [2]:
df <- read.csv("data/data.csv")

# only consider the cases with one face
df  <- subset(df, df$has_faces == 1)

# Number of faces in the person queries

**RQ:** are there more male than female faces in the "person" and "intelligent person" query?

**Operazionalization:** if you search for "person" (or "intelligent person") and the picture has a face, is more more likely to be male or female? The expectation should be 50/50.


In [3]:
# filter by the query type person, i.e. "person" and "intelligent person" queries
dfp <- subset(df, df$query_type =="person")

## 1. Anova table

In [4]:
# better labels for the Anova output
dfp$Female <- dfp$has_faces_female
dfp$Male <- dfp$has_faces_male

# convert to long format for Anova test
dfp_l <- gather(dfp, faces, measurement, Female:Male)

# fit the model
fit <- glm(measurement ~ faces*query*engine+region+browser+wave, data=dfp_l, family = binomial())

# display general anova table (factors and interactions only)
print(Anova(fit, type="III"))

# full summary of the fit (including contrasts against the intercept)
# summary(fit)

Analysis of Deviance Table (Type III tests)

Response: measurement
                   LR Chisq Df Pr(>Chisq)    
faces                 2.615  1     0.1058    
query                23.493  1  1.254e-06 ***
engine               37.166  3  4.243e-08 ***
region                0.000  2     1.0000    
browser               0.000  1     1.0000    
wave                  0.000  1     1.0000    
faces:query          46.987  1  7.145e-12 ***
faces:engine         74.360  3  4.969e-16 ***
query:engine         80.105  3  < 2.2e-16 ***
faces:query:engine  160.230  3  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


## 2. Contrast tests

### 2.1 Accross search engines

In [5]:
# full table of contrasts by search engine including. The interactions include those across 
# queries, but we only report the ones within the query term (i.e. "person Female / person Male" 
# and intelligent person Female / person Male")
pairs_engine <- pairs(emmeans(fit, ~ query * faces | engine),type = "response")

# print the pairs
print(pairs_engine)

engine = Baidu:
 contrast                                            odds.ratio      SE  df null z.ratio p.value
 intelligent person Female / person Female               0.3980 0.07642 Inf    1  -4.798  <.0001
 intelligent person Female / intelligent person Male     0.7209 0.14613 Inf    1  -1.614  0.3704
 intelligent person Female / person Male                 1.8113 0.34778 Inf    1   3.094  0.0106
 person Female / intelligent person Male                 1.8113 0.34778 Inf    1   3.094  0.0106
 person Female / person Male                             4.5511 0.82221 Inf    1   8.388  <.0001
 intelligent person Male / person Male                   2.5126 0.48242 Inf    1   4.798  <.0001

engine = Bing:
 contrast                                            odds.ratio      SE  df null z.ratio p.value
 intelligent person Female / person Female               0.8009 0.07342 Inf    1  -2.422  0.0730
 intelligent person Female / intelligent person Male     0.1463 0.01441 Inf    1 -19.511  <.000

### 2.2 Accross interactions

In [6]:
# full table of contrasts by interactions of search engine and query
# we only report the ones within the query term (i.e. "person Female / person Mal" and
# intelligent person Female / person Male")
pairs_interaction <- pairs(emmeans(fit, ~ query * faces  | engine,type = "response"), interaction = "pairwise")

# print the pairs
print(pairs_interaction)

engine = Baidu:
 query_pairwise              faces_pairwise odds.ratio     SE  df null z.ratio p.value
 intelligent person / person Female / Male       0.158 0.0430 Inf    1  -6.786  <.0001

engine = Bing:
 query_pairwise              faces_pairwise odds.ratio     SE  df null z.ratio p.value
 intelligent person / person Female / Male       0.641 0.0831 Inf    1  -3.427  0.0006

engine = Google:
 query_pairwise              faces_pairwise odds.ratio     SE  df null z.ratio p.value
 intelligent person / person Female / Male       0.152 0.0250 Inf    1 -11.473  <.0001

engine = Yandex:
 query_pairwise              faces_pairwise odds.ratio     SE  df null z.ratio p.value
 intelligent person / person Female / Male       2.253 0.3675 Inf    1   4.981  <.0001

Results are averaged over the levels of: region, browser, wave 
Tests are performed on the log odds ratio scale 


## 3 Confidence intervals

### 3.1 Accross search engines

In [7]:
# the confidence intervals here correspond to the ones in section 2.1
confint(pairs_engine)[,c(1,2,6,7)]

Unnamed: 0_level_0,contrast,engine,asymp.LCL,asymp.UCL
Unnamed: 0_level_1,<fct>,<fct>,<dbl>,<dbl>
1,intelligent person Female / person Female,Baidu,0.24302855,0.65177628
2,intelligent person Female / intelligent person Male,Baidu,0.42826218,1.21349181
3,intelligent person Female / person Male,Baidu,1.10604994,2.96630627
4,person Female / intelligent person Male,Baidu,1.10604994,2.96630627
5,person Female / person Male,Baidu,2.8612088,7.2391125
6,intelligent person Male / person Male,Baidu,1.53426878,4.11474287
7,intelligent person Female / person Female,Bing,0.63282371,1.01356314
8,intelligent person Female / intelligent person Male,Bing,0.1136266,0.18848394
9,intelligent person Female / person Male,Bing,0.14438643,0.23125676
10,person Female / intelligent person Male,Bing,0.14438643,0.23125676


### 3.2 Accross interactions

In [8]:
# the confidence intervals here correspond to the ones in section 2.2
confint(pairs_interaction)[,c(1,2,3,7,8)]

Unnamed: 0_level_0,query_pairwise,faces_pairwise,engine,asymp.LCL,asymp.UCL
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<dbl>,<dbl>
1,intelligent person / person,Female / Male,Baidu,0.09303114,0.2697015
2,intelligent person / person,Female / Male,Bing,0.49755452,0.8268494
3,intelligent person / person,Female / Male,Google,0.11059626,0.2102788
4,intelligent person / person,Female / Male,Yandex,1.63680014,3.1022071
