Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Commit paper, code, data, and package.
- Loading branch information
Carl Vogel
committed
May 7, 2015
1 parent
02caf24
commit 5eea7e8
Showing
49 changed files
with
1,930 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
.Rproj.user | ||
.Rhistory | ||
.RData |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Academia Citation Advantage Analysis | ||
|
||
The `acadcites` package contains the data and functions used in Niyazov, et. al. "Open Access Meets Discoverability: Citations to Articles Posted to Academia.edu." | ||
|
||
## Installing the R Package | ||
The easiest way to install the package and its depdendencies is by using `install_local` from the `devtools` package. (http://cran.r-project.org/web/packages/devtools/index.html) | ||
|
||
- Clone the repo: | ||
|
||
```{R} | ||
git clone https://github.com/polynumeral/academia-citations | ||
cd academia-citations | ||
``` | ||
|
||
- From R: | ||
|
||
```{R} | ||
install.packages('devtools') | ||
devtools::install_local('acadcites_0.1.tar.gz') | ||
``` | ||
|
||
## Importing data | ||
The cleaned/combined dataset used for the analyses can be obtained by calling: | ||
|
||
```{R} | ||
library('acadcites') | ||
cites <- importData() | ||
``` | ||
|
||
or just `cites <- acadcites::importData()` without the `library` import. | ||
|
||
## Reproducing tables from the article | ||
|
||
Tables from the article can be reproduced with the `makeTable` function. | ||
|
||
```{R} | ||
# Make Table 1 from the article. | ||
makeTable(2, cites) | ||
# |Journal | # Articles| % Total| | ||
# |:------------------------------------------------------|----------:|-------:| | ||
# |Analytical Chemistry | 1,537| 3.44%| | ||
# |PLoS One | 492| 1.10%| | ||
# |Anesthesia and Analgesia | 430| 0.96%| | ||
# |Biological and Pharmaceutical Bulletin | 362| 0.81%| | ||
# |Analytical Methods: advancing methods and applications | 339| 0.76%| | ||
# |Analytical Biochemistry | 317| 0.71%| | ||
# |Applied Mechanics and Materials | 303| 0.68%| | ||
# |Bioconjugate Chemistry | 299| 0.67%| | ||
# |Applied Physics Letters | 190| 0.43%| | ||
# |BioEssays | 183| 0.41%| | ||
``` | ||
|
||
|
||
## Reproducing figures from the article | ||
The `makeFigure` function reproduces figures from the article. Like `makeTable`, | ||
it takes a figure number and a citations data frame. | ||
|
||
```{r} | ||
makeFigure(1, cites) | ||
``` | ||
|
||
|
||
## Package help | ||
|
||
See `help(package='acadcites')` for more help files on individual functions, or | ||
`vignette('acadcites')` for information similar to what's provided here. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
^.*\.Rproj$ | ||
^\.Rproj\.user$ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
.Rproj.user | ||
.Rhistory | ||
.RData | ||
inst/doc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
Package: acadcites | ||
Title: Manage data and models to study effect of Academia.edu on citations. | ||
Version: 0.1 | ||
Authors@R: "Carl Vogel <carl@polynumeral.com> [aut, cre]" | ||
Description: Manage data and run models to study the effect of posting to | ||
Academia.edu on article citations. | ||
Depends: | ||
R (>= 3.1.1) | ||
License: MIT | ||
LazyData: true | ||
Imports: | ||
dplyr, | ||
magrittr, | ||
stringr, | ||
reshape2, | ||
ggplot2, | ||
MASS, | ||
pscl, | ||
knitr, | ||
scales, | ||
stargazer, | ||
memisc, | ||
pander | ||
VignetteBuilder: knitr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
exportPattern("^[^\\.]") | ||
import(dplyr) | ||
importFrom(stringr, str_detect) | ||
importFrom(stringr, str_trim) | ||
importFrom(stringr, str_replace_all) | ||
importFrom(stringr, str_replace) | ||
importFrom(magrittr, use_series) | ||
importFrom(magrittr, set_colnames) | ||
importFrom(magrittr, set_names) | ||
importFrom(magrittr, extract) | ||
importFrom(ggplot2, ggplot) | ||
importFrom(ggplot2, aes) | ||
importFrom(ggplot2, geom_boxplot) | ||
importFrom(ggplot2, geom_histogram) | ||
importFrom(ggplot2, geom_point) | ||
importFrom(ggplot2, stat_quantile) | ||
importFrom(ggplot2, facet_wrap) | ||
importFrom(ggplot2, labs) | ||
importFrom(ggplot2, xlim) | ||
importFrom(ggplot2, scale_x_continuous) | ||
importFrom(ggplot2, scale_y_continuous) | ||
importFrom(ggplot2, scale_colour_manual) | ||
importFrom(ggplot2, position_jitter) | ||
importFrom(ggplot2, theme_bw) | ||
importFrom(memisc, mtable) | ||
importFrom(memisc, relabel) | ||
S3method(getModelSummary, lm) | ||
S3method(getModelSummary, glm) | ||
S3method(getModelSummary, zeroinfl) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
# Functions for comparing citations within Impact Factor buckets | ||
|
||
#' Group a variable into buckets based on its quantiles or those of another | ||
#' variable. | ||
#' | ||
#' @param x_quantile The variable to calculate quantile buckets from. | ||
#' @param x_bucket The variable to collect into the quantile buckets. | ||
#' @param nbuckets The number of quantile buckets to use. Specify this *or* | ||
#' `probs`, but not both. | ||
#' @param probs The vector of probabilities for the quantile bucket cut points. | ||
#' Specify this *or* `nbuckets`, but not both. | ||
#' @return A factor vector corresponding to `x_bucket` with bucket ranges. | ||
#' If an element of `x_bucket` is outside of the range of `x_quantile`, its | ||
#' bucket will be NA. | ||
#' | ||
quantileBuckets <- function(x_quantile, x_bucket=x_quantile, nbuckets=10, probs=NULL) { | ||
if (!is.null(nbuckets) & is.null(probs)) { | ||
breaks <- quantile(x_quantile, probs=0:nbuckets / nbuckets) | ||
cut(x_bucket, breaks, include.lowest=TRUE) | ||
} else if (!is.null(probs) & is.null(nbuckets)) { | ||
cut(x_bucket, quantile(x_quantile, probs), include.lowest=TRUE) | ||
} else { | ||
stop('Only specify nbuckets or probs, not both.') | ||
} | ||
} | ||
|
||
|
||
#' Compare on- and off-Academia citations within years and quantile groups | ||
#' of journal impact factors. | ||
#' | ||
#' @param cites_df A dataframe with citations and impact factors. | ||
#' @param summarizer (default mean) A function to summarize citations within groups. | ||
#' @param comparator (default `/`) A function with arguments (on, off), that compares | ||
#' on and off-Academia citation summaries. The default computes the on/off ratio. | ||
#' @param ... Extra parameters to `quantileBuckets` | ||
#' @return A dataframe with statistic by year, impact factor group, and on/off-source. | ||
compareByImpactFactorBuckets <- function(cites_df, summarizer=mean, | ||
comparator=`/`, ...) { | ||
|
||
# Find buckets based on distribution of on-Academia citations. | ||
bucketFactors <- function(x) { | ||
on_factors <- cites_df %>% filter(source=='on') %>% use_series(impact_factor) | ||
quantileBuckets(on_factors, x, ...) | ||
} | ||
|
||
cites_df %>% | ||
mutate(if_bucket = bucketFactors(impact_factor)) %>% | ||
filter(!is.na(if_bucket)) %>% | ||
group_by(if_bucket, year, source) %>% | ||
summarize(cites=summarizer(citations)) %>% | ||
reshape2::dcast(year + if_bucket ~ source, value.var='cites') %>% | ||
mutate(comparison = comparator(on, off)) | ||
} | ||
|
||
#' Average results over buckets, weighting by the number of on-Academia | ||
#' articles in the bucket. | ||
#' | ||
#' @param cites_df A dataframe of citations with impact factors. | ||
#' @return A dataframe of weighted average results by year. | ||
#' | ||
summarizeOverBuckets <- function(cites_df) { | ||
cite_ratios <- compareByImpactFactorBuckets(cites_df, | ||
summarizer=mean, | ||
comparator=`/`) | ||
counts <- compareByImpactFactorBuckets(cites_df, | ||
summarizer=length, | ||
comparator=`+`) | ||
|
||
weights <- counts %>% | ||
group_by(year) %>% | ||
mutate(weight = on / sum(on)) %>% | ||
ungroup %>% | ||
select(year, if_bucket, weight) | ||
|
||
cite_ratios %>% left_join(., weights, by=c('year', 'if_bucket')) %>% | ||
group_by(year) %>% summarize(wtd_avg = sum(weight * comparison)) | ||
|
||
} | ||
|
||
|
||
#' Boxplots on- and off-Academia citations within years and quantile groups | ||
#' of journal impact factors. | ||
#' | ||
#' @param cites_df A dataframe with citations and impact factors. | ||
#' @param ... Extra parameters to `quantileBuckets` | ||
#' | ||
#' @return A ggplot2 plot. | ||
plotByImpactFactorBuckets <- function(cites_df, ...) { | ||
|
||
# Find buckets based on distribution of on-Academia citations. | ||
bucketFactors <- function(x) { | ||
on_factors <- cites_df %>% filter(source=='on') %>% use_series(impact_factor) | ||
quantileBuckets(on_factors, x, ...) | ||
} | ||
|
||
# Add bucket variable to data | ||
df <- cites_df %>% | ||
mutate(if_bucket = bucketFactors(impact_factor)) %>% | ||
filter(!is.na(if_bucket)) | ||
|
||
p <- ggplot(df, aes(x=factor(year), y=citations, color=source)) + | ||
geom_boxplot() + | ||
facet_wrap(~if_bucket, ncol=2) + | ||
labs(x='Year', y='Citations (log scale)', | ||
title='Citations of On- and Off-Academia Articles By Year and Journal Impact Factor') + | ||
theme_bw() | ||
plotLogScale(p, xy='y') | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Reproduce figures | ||
# | ||
# Figures: | ||
# -------- | ||
# 1. Histograms over citations counts by off-/on-Academia | ||
# 2. Citations boxplots by impact factor bucket and year of publication | ||
# 3. Scatterplot of cites against impact factor | ||
# 4. Scatterplot of cites against impact factor by off-/on-Academia and year. | ||
|
||
## Function names that produce figures, listed in order | ||
## of their appearance in the paper. | ||
.figures_functions = list( | ||
'plotCiteDistributions', | ||
'plotByImpactFactorBuckets', | ||
'plotCitesImpactFactorScatter', | ||
'plotImpactFactorMedReg') | ||
|
||
|
||
#' Function to generate figures from the paper. | ||
#' | ||
#' Recreate a figure with a given citations dataset by specifying the table's | ||
#' caption number in the paper. | ||
#' | ||
#' @param n Figure caption number. | ||
#' @param cites_df A data frame with article citations and journal data, as produced by `importData`. | ||
#' @param ... Optional arguments passed to figure functions. | ||
#' | ||
#' @return Nothing. Renders a plot. | ||
#' | ||
makeFigure <- function(n, cites_df, ...) { | ||
eval(parse(text=.figures_functions[[n]]))(cites_df, ...) | ||
} | ||
|
||
|
||
plotCiteDistributions <- function(cites_df) { | ||
ggplot(cites_df, aes(x=citations)) + | ||
geom_histogram(binwidth=1, fill='steelblue', color='white') + | ||
xlim(0, 100) + | ||
facet_wrap(~source, scales='free_y') + | ||
theme_bw() | ||
} | ||
|
||
plotCitesImpactFactorScatter <- function(cites_df) { | ||
p <- ggplot(cites_df, aes(x=impact_factor, y=citations)) + | ||
geom_point(position=position_jitter(height=.1, width=.01), alpha=.3, size=.75) + | ||
geom_smooth(method='lm') + | ||
theme_bw() + | ||
labs(x='Impact Factor (log scale)', y='Citations (log scale)') | ||
plotLogScale(p, c('x', 'y')) | ||
} | ||
|
||
plotImpactFactorMedReg <- function(cites_df) { | ||
p <- ggplot(cites_df, aes(x=impact_factor, y=citations, color=source)) + | ||
geom_point(position=position_jitter(height=.1, width=.01), alpha=.3, size=.75) + | ||
facet_wrap(~year, ncol=2) + | ||
stat_quantile(quantiles=0.5) + | ||
scale_colour_manual(values=c('orange', 'purple')) + | ||
labs(x='Impact Factor (log scale)', y='Citations (log scale)') + | ||
theme_bw() | ||
plotLogScale(p, c('x', 'y')) | ||
} |
Oops, something went wrong.