-
Notifications
You must be signed in to change notification settings - Fork 0
/
data_analysis.Rmd
216 lines (157 loc) · 6.13 KB
/
data_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
title: "Linked fate lit. review"
author: "Jae Yeon Kim"
output:
html_document:
number_sections: true
toc: yes
pdf_document:
toc: yes
---
# Setup
```{r include=FALSE}
# Import packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
tidyverse, # the tidyverse framework
patchwork, # arranging ggplots
desc, # descriptive stat analysis
ggpubr, # arranging ggplots
ggthemes, # fancy ggplot themes
broom, # modeling
ggfortify, # extended version of ggplot
ggsci, # color palette
Hmisc, # capitalization
stargazer, # model reports
conflicted, # resovling conflicts
here # reproducibility
)
# Resolve conflicts
conflicted::conflict_prefer("select", "dplyr")
conflicted::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("count", "dplyr")
conflicted::conflict_prefer("mutate", "dplyr")
devtools::install_github("jaeyk/makereproducible",
dependencies = TRUE)
library(makereproducible)
```
```{r}
# load data
lit_review <- read_csv(make_here("/home/jae/linked_fate_review/raw_data/linked_fate_review.csv"))
```
# Data Collection
I used two search queries: (1) “linked fate” + “dawson” and (2) “linked fate” + “group consciousness” + “solidarity.” I culled the initial 103 results for each query, sorted by relevance, removed duplicates, and excluded non-empirical research in the data.
## The total N of articles
First, let's look at the total N of articles.
```{r echo=FALSE}
cat(("The total number of articles is"), nrow(lit_review))
```
Some articles are lit reviews, or law review articles or theoretical pieces that don't discuss data. Let's exclude them.
```{r echo=FALSE}
cat("The total number of empirical articles is",
length(subset(lit_review, is.na(Survey) == FALSE)$Author))
```
```{r echo=FALSE}
cat("The percentage of empirical articles in the data is",
length(subset(lit_review, is.na(Survey) == FALSE)$Author)/length(lit_review$Author))
```
```{r echo=FALSE}
cat("The total number of journals in the data is",
length(unique(lit_review$Journal)))
```
## Overall patterns
The resulting sample (N = 89) is not an exhaustive list of published articles on linked fate. For instance, the search engine tends to miss recent publications because the results are sorted by relevance. This selection bias partly explains why the number of articles on linked fate appears to dip slightly in the last few years.
### Publication trend on linked fate
```{r echo=FALSE}
lit_review %>%
group_by(Pub.year) %>%
count() %>%
ggplot(aes(x = Pub.year, y = cumsum(n))) +
geom_point() +
geom_line() +
theme_pubr() +
labs(title = "Publication trend on linked fate", x = "Publication year", y = "Cumulative count") +
scale_x_continuous(breaks = c(2000, 2005, 2010, 2015)) +
scale_y_continuous(breaks = scales::pretty_breaks())
ggsave(here("output/pub_trend.png"), width = 7)
ggsave(here("output/figure1.png"), width = 7)
```
### Publication on linked fate by subject group
```{r echo=FALSE}
lit_review %>%
gather(group, value, Black, Black_immigrants, Asian, Latinx, Whites, Others) %>%
mutate(group = recode(group,
"Latinx" = "Latino")) %>%
filter(value == 1) %>%
group_by(Pub.year, group) %>%
dplyr::summarize(n = n()) %>%
mutate(prop = n / sum(n),
prop = round(prop,2)) %>%
mutate(group = recode(group,
'Black_immigrants' = "Black immigrants",
'Asian' = 'Asian Americans')) %>%
filter(Pub.year >= 2002) %>%
ggplot(aes(x = Pub.year, y = prop, fill = group)) +
geom_col(position = "fill") +
labs(title = "Publication trend on linked fate sorted by research subjects",
x = "Publication year", y = "Percentage",
fill = "Research subjects") +
scale_x_continuous(breaks = c(2000, 2005, 2010, 2015)) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
theme_pubr() +
scale_fill_npg()
ggsave(here("output/pub_trend_sub_groups.png"), width = 8)
```
```{r}
lit_review %>%
mutate(total = Black + Black_immigrants + Asian + Latinx + Whites + Others) %>%
gather(group, value, Black, Black_immigrants, Asian, Latinx, Whites, Others, total) %>%
filter(value >= 1) %>%
group_by(group) %>%
count()
lit_review %>%
mutate(total = Black + Black_immigrants + Asian + Latinx + Whites + Others) %>%
gather(group, value, Black, Black_immigrants, Asian, Latinx, Whites, Others, total) %>%
filter(value >= 2) %>%
group_by(group) %>%
count()
```
## Journal patterns
General social science categories include journals like APSR, AJSP, JOP, and Political Research Quarterly (PQR) and American Politics Research (APR).
```{r echo=FALSE}
lit_review %>%
group_by(Journal_type) %>%
count() %>%
ggplot(aes(x = reorder(Journal_type, n), y = n)) +
geom_col() +
coord_flip() +
theme_pubr() +
labs(title = "Articles by journal type",
x = "Journal type", y = "Count")
ggsave("/home/jae/linked_fate_review/output/pub_trend_journals.png")
```
## Data patterns
Note that these counts are not mutually exclusive. For instance, some studies use surveys plus experiments or qualitative evidence (interviews). Since methods follow trends, I did line rather than bar plotting.
1. Survey: relying on survey data
2. Experiment: relying on experimental data
3. Qualitative: relying on qualitative evidence (e.g., interviews)
```{r echo=FALSE}
lit_review %>%
gather(data_types, values, Survey, Experiment, Qualitative) %>%
filter(values == 1) %>%
group_by(Pub.year, data_types) %>%
dplyr::summarize(n = n()) %>%
mutate(prop = n / sum(n),
prop = round(prop,2)) %>%
filter(Pub.year >= 2002) %>%
ggplot(aes(x = Pub.year, y = prop, fill = data_types)) +
geom_col(position = "fill") +
labs(title = "Publication trend by data type",
x = "Publication year", y = "Percentage",
fill = "Data types") +
scale_x_continuous(breaks = c(2000, 2005, 2010, 2015)) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
theme_pubr() +
scale_fill_npg()
ggsave(here("output/pub_trend_data_types.png"))
```