An R package with several million published p-values in tidy data sets.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
R
data
inst/doc
man
vignettes
.Rbuildignore
.gitignore
DESCRIPTION
NAMESPACE
README.md
tidypvals.Rproj

README.md

All the p-values with tidypvals

The p-value is the most widely-known statistic. P-values are reported in a large majority of scientific publications that measure and report data. R.A. Fisher is widely credited with inventing the p-value. If he was cited every time a p-value was reported his paper would have, at the very least, 3 million citations* - making it the most highly cited paper of all time.

The tidypvals package organizes a large subset of these published p-values. They have been collected and synthesized from thousands of studies across multiple fields. The resulting data sets can be easily merged, combined, and analyzed.

install

This package will (hopefully) end up on Bioconductor soon, but for now you can install it with the devtools package

install.packages('devtools')
library(devtools)
devtools::install_github('jtleek/tidypvals')

description

The currently available p-value data sets in this package are:

Each data set is "tidy" data frame and has the following columns:

  • pvalue - The reported p-value
  • year - The year of the publication where the p-value appeared
  • journal - The journal where the publication appeared
  • field - The field of the paper, using the categorization in Head et al. 2015.
  • abstract - Whether the p-value was in the abstract of the paper
  • operator - Whether the p-value was reported as "lessthan", "greaterthan", or "equals".
  • doi - When available the digital object identifier.
  • pmid - The pubmed ID for the paper when available

use

Load the library and then access each data set by name.

library(tidypvals)
jager2014

Data sets can be easily merged, but be careful to avoid duplicated p-values across different data sets. You can see how each data set was obtained and tidied by viewing the corresponding vignette.

vignette("jager-2014",package="tidypvals")