Stephen Curry Journal Impact Factor Publication

library(ggplot2)
theme_set(cowplot::theme_cowplot())
library(dplyr)

Stephen Curry Journal Impact Factor Publication

Looking at the data presented in the preprint on the distribution of Journal Impact Factors (JIF) (Larivière et al., 2015), I wondered what happens if we do the stupid thing and log-transform them.

Data

Following the guide provided in the Appendix of the preprint, I went into Web Of Science and produced citation reports for 7 journals, and saved them. Unfortunately, the preprint shows data for publications published in 2013-2014, but the guide showed 2012-2013, so I ended up downloading the citation data for publications published in 2012-2013, which would have influenced the 2014 JIF. However, I think the conclusions are the same.

I exported the data as text, and those files are in the data folder.

all_files = dir(here::here("data"), full.names = TRUE)
all_data = purrr::map(all_files, function(in_file){
  message(basename(in_file))
  tmp_lines = readLines(in_file, n = 10)
  start_line = which(grepl("Title", tmp_lines))[1]
  
  tmp = read.table(in_file, header = TRUE, sep = ",", skip = start_line - 1)
  tmp$Issue = as.character(tmp$Issue)
  tmp
})

all_df = purrr::imap_dfr(all_data, function(.x, .y){
  .x$Ending.Page = as.integer(.x$Ending.Page)
  .x$Article.Number = as.integer(.x$Article.Number)
  .x
})

Original Plot

ggplot(all_df, aes(x = X2014)) + 
  geom_histogram() +
  facet_wrap(~ Source.Title, ncol = 2, scales = "free") +
  labs(caption = "Citations to papers published in 2012-2013 by papers in 2014",
       x = "2014 Citations")

These plots look a lot like those presented in the preprint. To my eye, in addition to being heavily tailed, they also look like they could be log-normal. An easy way to check would be to log the counts and replot them (in this case log + 1 to account for 0).

Log Plot

summary_df = dplyr::group_by(all_df, Source.Title) %>%
  dplyr::summarise(mean_log = mean(log1p(X2014)),
                   median_log = median(log1p(X2014))) %>%
  tidyr::pivot_longer(!Source.Title, names_to = "summary")
ggplot(all_df, aes(x = log1p(X2014))) + 
  geom_histogram() +
  geom_vline(data = summary_df,
             aes(xintercept = value, color = summary)) +
  facet_wrap(~ Source.Title, ncol = 2, scales = "free") +
  labs(caption = "Citations to papers published in 2012-2013 by papers in 2014",
       x = "2014 Citations (logged)") +
  theme(legend.position = c(0.8, 0.15))

Here we plotted the histogram of the log + 1 of the counts, and the mean and median of those logs. The fact that the mean and median are almost the same implies these distributions are fine.

Looking at the summary values, we can still see which journals on average get more citations.

sorted_summary = dplyr::filter(summary_df, summary %in% "mean_log") %>%
  dplyr::arrange(dplyr::desc(value))
knitr::kable(sorted_summary, digits = 2)

Source.Title	summary	value
NATURE	mean_log	3.32
SCIENCE	mean_log	3.11
EMBO JOURNAL	mean_log	2.22
NATURE COMMUNICATIONS	mean_log	2.14
PLOS BIOLOGY	mean_log	2.06
ELIFE	mean_log	1.98
PLOS GENETICS	mean_log	1.89

Conclusion

The simplest solution to JIF if it continues to be used is to log-transform the citation counts in a given year (using log + 1).

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The data files under data are licensed under the CC0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README_files/figure-gfm		README_files/figure-gfm
data		data
.gitignore		.gitignore
README.Rmd		README.Rmd
README.html		README.html
README.md		README.md
curry_jif_paper.Rproj		curry_jif_paper.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_files/figure-gfm

README_files/figure-gfm

data

data

.gitignore

.gitignore

README.Rmd

README.Rmd

README.html

README.html

README.md

README.md

curry_jif_paper.Rproj

curry_jif_paper.Rproj

Repository files navigation

Stephen Curry Journal Impact Factor Publication

Data

Original Plot

Log Plot

Conclusion

License

About

Releases

Packages

rmflight/curry_jif_paper

Folders and files

Latest commit

History

Repository files navigation

Stephen Curry Journal Impact Factor Publication

Data

Original Plot

Log Plot

Conclusion

License

About

Resources

Stars

Watchers

Forks