forked from OpenAPC/openapc-de
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
163 lines (106 loc) · 6.54 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
## About
The aim of this repository is:
- to share a copy of [Directory of Open Access Journals (DOAJ)](http://doaj.org/) journal master list (downloaded January 2014) -> [data/doajJournalList.csv](data/doajJournalList.csv)
- to release a [Bielefeld University](http://www.uni-bielefeld.de/) Paid APCs 2012/13 Dataset under an Open Database License -> [data/unibi12-13.csv](data/unibi12-13.csv)
- to demonstrate how reporting on fee-based Open Access publishing can be made more transparent and reproducible
This documentation is written in Markdown and embeds source code written in [R](http://www.r-project.org/).
You can run ```knit()``` to reproduce this file
```
require(knitr)
knit(README.Rmd)
```
## DOAJ
The current [DOAJ journal master list](http://doaj.org/faq#metadata) that can be downloaded as `csv` lacks information on the applicability of author processing charges (APC). Answering [Ulrich Herb's](https://twitter.com/scinoptica) concerns in [The prevalence of Open Access publication fees](http://www.scinoptica.com/pages/topics/the-prevalence-of-open-access-publication-fees.php), we decided to share our DOAJ copy that we used at [Bielefeld University Library](http://www.ub.uni-bielefeld.de/) for reporting Open Access Gold publications at Bielefeld University as part of the [DFG Open-Access Publishing Programme](http://www.dfg.de/en/research_funding/programmes/infrastructure/lis/funding_opportunities/open_access_publishing/index.html) in January 2014.
### Load DOAJ journal master list
```{r}
doaj <- read.csv("data/doajJournalList.csv", header = TRUE, sep =",")
```
This master list contains
```{r}
length(unique(doaj$ISSN))
```
registered Open Access journals. It provides information on
```{r}
colnames(doaj)
```
### A closer look on APCs
This table summarises whether a journal requires APCs or not:
```{r, results='asis', tab.cap ='DOAJ APCs'}
apc.table <- data.frame(table(unlist(doaj$Publication.fee)))
apc.table$Perc <- apc.table$Freq / sum(apc.table$Freq) * 100
colnames(apc.table) <-c("Fee", "Journals", "Share")
knitr::kable(apc.table[order(apc.table$Journals, decreasing = T),], row.names = FALSE, digits = 2)
```
You may also wish to explore the growth of DOAJ by APC applicability.
```{r simpleplot, echo=TRUE, fig.height=8/2.54, fig.width=18/2.54, warning = FALSE, message = FALSE}
#exclude journals with blank APC info
doaj <- doaj[!doaj$Publication.fee == "",]
doaj <- droplevels(doaj)
# transform to date object
doaj$Added.on.date <- as.Date(doaj$Added.on.date, format="%Y-%m-%d")
# relevel
doaj$Publication.fee <- factor (doaj$Publication.fee,
levels = c(rownames(data.frame(rev(sort(table(doaj$Publication.fee)))))))
#prepare df for ggplotting
tt <- data.frame(table(doaj$Publication.fee, doaj$Added.on.date))
tt$Cumsum<-ave(tt$Freq,tt$Var1,FUN=cumsum)
#plot
ggplot2::ggplot(tt, aes(as.Date(Var2),Cumsum, group = Var1)) +
geom_line(aes(colour = Var1, show_guide=FALSE)) +
xlab("Date added to DOAJ") + ylab("Cumulative Sum") +
scale_colour_brewer("Publication Fee",palette=2, type="qual") +
theme_bw(base_size= 16) +
opts(legend.key=theme_rect(fill="white",colour="white"))
```
This little exploration demonstrates that the DOAJ journal title master list is an excellent source (not only) to study the longitudinal growth of Open Access journal publishing. As we have shown, by reusing such a file, you can also write your study in a reproducible manner.
We hope that the DOAJ will continue its bulk download service in order to provide an convenient way to monitor Open Access journal publishing. At Bielefeld University Library, for instance, we use the `csv` download to answer our DFG reporting requirements.
## Bielefeld University Paid APCs 2012/13 Dataset
With the [Bielefeld University Paid APCs 2012/13 Dataset](data/unibi12-13.csv), we publish information on paid Article Processing Charges (APC) by the [Open Access Publication Fund of Bielefeld University](http://oa.uni-bielefeld.de/publikationsfonds.html). The fund is supported by the German Research Foundation (DFG) under its [Open-Access Publishing Programme](http://www.dfg.de/en/research_funding/programmes/infrastructure/lis/funding_opportunities/open_access_publishing/index.html).
### Load Data
```{r}
my.df <- read.csv("data/unibi12-13.csv", header = TRUE, sep =",")
head(my.df)
```
The dataset covers the accounting periods 2012 and 2013 (`Period`). For every single article, the `doi` is provided. Journal titles (`Journal`), publisher names (`Publisher`) and the International Standard Serial Numbers (`issn.1` and `issn.2`) are normalized according to CrossRef metadata store. In addition, `PUBID` references the copy stored in the institutional repository [PUB - Publications at Bielefeld University](http://pub.uni-bielefeld.de/en).
With this step, we follow Wellcome Trust example to share data on paid APCs. See:
Neylon, Cameron (2014): Wellcome Trust Article Processing Charges by Article 2012/13.
[https://github.com/cameronneylon/apcs](https://github.com/cameronneylon/apcs)
### Paid APCs
Column `EURO` contains the author processing charge for each paper including VAT.
In total, we paid
```{r}
sum(my.df$EURO)
```
for
```{r}
nrow(my.df)
```
articles.
DFG funding covered 75%.
```{r}
sum(my.df$EURO) * 0.75
```
#### APC per accounting period
Column `EURO` contains the author processing charge paid for each paper in Euro including VAT.
To obtain the yearly costs for 2012 and 2013:
```{r}
tapply(my.df$EURO, my.df$Period, sum)
```
#### APC per Publisher
To calculate APC paid per publisher and accounting period and print it as pretty markdown table:
```{r, results='asis', tab.cap ='Bielefeld University publication fund: APCs paid in EURO (incl VAT)'}
# relevel publishers according to sorting
my.df$Publisher <- factor (my.df$Publisher,
levels = c(rownames(data.frame(rev(sort(table(my.df$Publisher)))))))
# calculate and print
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), sum))
```
Mean APCs paid:
```{r, results='asis', tab.cap ='Bielefeld University publication fund: Mean APCs paid in EURO (incl VAT)'}
knitr::kable(tapply(my.df$EURO, list(my.df$Publisher, my.df$Period), mean))
```
### Licence
The Bielefeld University Paid APCs 2012/13 Dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
### Contact
Najko Jahn < najko.jahn [ at ] uni-bielefeld.de >
Dirk Pieper < dirk.pieper [ at ] uni-bielefeld.de >