# Replication: Dion, Sumner, Mitchell (2018): Gendered Citation Patterns across Political Science and Social Science Methodology Fields

The following Notebook will guide the reader through the replication of a quantitative analysis by Dion, Sumner and Mitchell from their article **Gendered Citation Patterns across Political Science and Social Science Methodology Fields**. It will allow an interactive exploration of the authors' analysis of citation patterns in social science journals.

In the first section, I describe the main argument of the authors. Further on, I will guide through the analysis of the dataset used in the article and show the necessary steps to reproduce the Tables 1 to 3 in the article. I will conclude this replication with short comments on the argument and the analysis.

## Summary

In the publication in **Political Analysis**, Dion, Sumner and Mitchell aim to explain the gender biased citation behaviour in the social sciences. Building on prior research adressing the gender citation gap in different scientific disciplines. While other researchers examine differences in citation behaviour of male and female researchers over entire academic careers, Dion et al. approach the topic from a network-approach, in other words, they analyse "who cites whom?", rather than meremly asking "who is cited the most?"

Their aim is to model the probability with which the authorship of a referenced article in a social science journal is entirely composed of female researchers vs. solely male or researchers of both sexes. And the main explanatory factor of this variable is the sex of the author(s) citing the respective article (male vs. female vs. mixed sex). A presence of such an effect could already be shown by other researchers.

The distinct argument of article by Dion et al. is to analyse whether this effect is differently pronounced when there are more female researchers active within a field of social science. They hypothesise that the effect of the sex of citing authors on the sex of the cited author will be more strongly pronounced in fields, and that is in journals, that have a smaller share of active female researchers. They select five journals (APSR, Politics \& Gender, Econometrica, Political Analysis, Sociological Methods \& Research) as representatives of fields with higher share of female (Politics \& Gender) and higher share of male  researchers (Political Analysis). Via including methodological journals from Economics (Econometrica), Sociology (SMR) and Political Science (Political Analysis) they aim to show these patterns also across social science disciplines with economics being a field more dominated by men than political science and sociology.

They argue that varying proportions of female researchers being more likely to be cited by the same sex between the journals indicate that having more active female researchers in a field will also reduce the gender citation bias (as operationalised by the probability citing a female researcher as a function of the sex of the citing authors).

In the following, I guide through the replication of the analysis by Dion et al.

## Replication

We will require a range of R packages to execute the replicating code.


In [11]:
# load necessary packages
library(tidyverse)
library(MASS)
library(foreign)
library(IRdisplay)
library(optimx)
library(rms)
library(kableExtra)

# to hide a message when summarising data:
options(dplyr.summarise.inform = FALSE)

### Descriptive Analysis

Dion et al. collected data from the [Web of Science](https://www.webofscience.com/wos/woscc/basic-search). One row of the data is one reference to an article published in one of the 5 journals between 2007 and 2016:

- **newartid**: Name of the published article
- **newjnlid**: Name of the journal in which **newartid** is published
- **authorteam**: Whether authors of **newartid** are all-male, all-female or mixed ("Male", "Female", "Mixed")
- **refteam**: Whether authors citing **newartid** are all-male, all-female or mixed ("Male", "Female", "Mixed")
- **reffemonly**: Whether authors citing **newartid** are all-female (0,1)
- **refauthcomplete**: Whether the sex of all authors of a reference could be determined (0,1)


In [10]:
# Data can be found under
# https://www.cambridge.org/core/journals/political-analysis/article/gendered-citation-patterns-across-political-science-and-social-science-methodology-fields/5E8E92DB7454BCAE41A912F9E792CBA7#supplementary-materials-tab

df <- read.dta("Data/DSM2018PAreplication.dta")

df %>%
  group_by(newjnlid) %>%
  slice(1)

newartid,newjnlid,authorteam,refteam,reffemonly,refauthcomplete
<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>
"APSR Deliberation, Democracy, and the Rule of Reason in Aristotle's Politics",APSR,Male,Male,0.0,1
"Politics & Gender The Roll Call Behavior of Men and Women in the U.S. House of Representatives, 1937-2008",Politics & Gender,Mixed,,,0
Political Analysis Proportionally Difficult: Testing for Nonproportional Hazards in Cox Models,Political Analysis,Male,,,0
Econometrica Instrumental Variable Models for Discrete Outcomes,Econometrica,Male,Male,0.0,1
Soc. Methods & Res. A smoothing cohort model in age-period-cohort analysis with applications to homicide arrest rates and lung cancer mortality rates,Soc. Methods & Res.,Male,,,1


#### Table 1

Table 1 in Dion et al. summarises the number of original articles as well as the authors' sex by journal. We can use the following additional data provided in the replication file to analyse this.

In [14]:
# Data can be found under
# https://www.cambridge.org/core/journals/political-analysis/article/gendered-citation-patterns-across-political-science-and-social-science-methodology-fields/5E8E92DB7454BCAE41A912F9E792CBA7#supplementary-materials-tab

df_articles <- read.dta("Data/DSM2018PAreplication_articlesonly.dta")

##Table 1

df_articles %>%
  group_by(newjnlid, authorteam) %>%
  na.omit() %>%
  summarise(N = n()) %>%
  mutate(Percent = round(100 * N / sum(N),2)) %>%
  kable("html", caption = "Table 1: Distribution of author genders by article, 2007–2016.")  %>%
  as.character() %>%
  display_html()


newjnlid,authorteam,N,Percent
APSR,Male only,324,69.83
APSR,Female only,67,14.44
APSR,Mixed,73,15.73
Politics & Gender,Male only,27,7.94
Politics & Gender,Female only,266,78.24
Politics & Gender,Mixed,47,13.82
Political Analysis,Male only,220,74.58
Political Analysis,Female only,8,2.71
Political Analysis,Mixed,67,22.71
Econometrica,Male only,465,76.99


When comparing the above data to Table 1 in Dion et al., we can see that numbers add up and we are able to replicate the data. Again we see the pattern of differently distributed authorship gender across different journals (APSR vs. Politics & Gender vs. Political Analysis) and fields (Econometrica vs. Political Analysis vs. Sociological Methods & Research).

#### Table 2

Table 2 in Dion et al. (2018) summarises the distribution of sexes of referenced authors across the various journals. See below the code to replicate Table 2.

In [16]:
##Table 2

df %>%
  filter(refauthcomplete == 1 & !is.na(refteam)) %>% # remove all the rows in which we don't have a complete reference's gender or no reference authors at all.
  group_by(newjnlid, refteam) %>% # group data by journal and sex of referenced authors
  summarise(N = n()) %>% # count how many references there are per journal and sex of referenced authors
  mutate(Percent = round(100 * N / sum (N),2))%>% # calculate percentages 
  kable("html", caption = "Table 2: Distribution of reference author genders, 2007–2016.")  %>% # create nice table
  as.character() %>% # make table readable by markdown
  display_html() # show it 

newjnlid,refteam,N,Percent
APSR,Male,11617,74.24
APSR,Female,2203,14.08
APSR,Mixed,1828,11.68
Politics & Gender,Male,1649,27.98
Politics & Gender,Female,3405,57.77
Politics & Gender,Mixed,840,14.25
Political Analysis,Male,4650,78.93
Political Analysis,Female,322,5.47
Political Analysis,Mixed,919,15.6
Econometrica,Male,9226,84.88


In this Table 2, we can see how many references from articles in each journal are citing all-male, mixed or all-female-written articles. Again, we are able to exactly replicate numbers from Dion et al. Table 2. We can see that in different journals, the distribution of cited authors' sex varies. In Political Analysis, only 5.47% of citations are by all-female authors whereas in Politics & Gender, almost 60% of cited authors are all-female.

### Replicating bivariate analysis

Dion et al. apply a logistic regression with clustered and robust standard errors to analyse the effect between articles' authors' sex and their referenced authors' sex. They analyse 6 different models to test their hypotheses. They expect effects of different sizes for journals with a different distribution of authors' gender to examine the presence and strength of the "Mathilda effect" as described in their article: the more balanced the authorship of a journal, the less present that effect between citing and cited authors' gender will be.




I will now presume to replicate Table 3 in the article, which consists of 6 separate logistic regressions with robust standard errors. Each model aims to explain whether a citation was authored by females only as explained by the gender of the original articles' authors (Male vs. mixed vs. female).

In [14]:
# load source code necessary for analysis
source("sources/logistic_function.R")
source("sources/execute_logistic_per_journal.R")


models <- do.call("cbind", lapply(unique(df$newjnlid), logistic_per_journal) )
pooled <- logistic_per_journal("Pooled")

In [15]:
cbind.data.frame(models, pooled) %>% rownames_to_column(" ") %>%
  kable("html", caption = "Table 3: Logistic Regression Estimates: Effect of gender of citing author on gender of cited authors (1=female)")  %>%
  as.character() %>%
  display_html()

Unnamed: 0,APSR,Politics & Gender,Political Analysis,Econometrica,Soc. Methods & Res.,Pooled
Intercept,-2.07 (0.05),-0.01 (0.11),-2.84 (0.09),-3.18 (0.06),-2.46 (0.1),-2.02 (0.05)
Female,0.99 (0.16),0.53 (0.12),0.42 (0.38),1.14 (0.22),0.76 (0.28),0.86 (0.1)
Mixed,0.21 (0.13),-0.15 (0.16),-0.08 (0.16),0.07 (0.14),0.06 (0.18),0.11 (0.08)
P&G,,,,,,1.73 (0.1)
PA,,,,,,-0.89 (0.09)
Econ,,,,,,-1.14 (0.07)
SMR,,,,,,-0.47 (0.1)
Pseudo R2,-0.026,-0.0165,-7e-04,-0.0106,-0.0078,-0.2796
NullLL,-6359,-4007,-1249,-1951,-1185,-18566
LL,-6198,-3942,-1248,-1931,-1175,-14509


## Comments

- Differentiation between journal and field
- 