# Why Most Published Research Findings are False

Based on the essay by John Oiannidis: [_Why Most Published Research Findings are False_](https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124), PLoS Medicine, August 2005.


_R_ is the ratio of *true relationships* to *not true relationships*. For example, if we are doing a genome association study and there are 100,000 markers, and we expect 10 to be correlated with the condition, we would have:

In [None]:
R = 10 / (100000 - 10)

Thus, the a priori probability that a tested relationship (randomly selected) is true is _R / (R_ + 1).

In [None]:
p = R / (R + 1)

The probability of a Type I (false positive) error is &alpha;:

In [None]:
alpha = 0.05

The statistical power of the experiment, 1 - &Beta;, is the likelihood of finding an effect if there really is one. Typical experiments aim for statistical power of 0.8. The probability of a Type II (flase negative) error is &Beta;:

In [None]:
beta = 0.2

So, the probability the is a true relationship and the outcome of the experiment finds it is _p_(1 - &Beta;):

In [None]:
pyy = (1 - beta) * p

The probability that there is no true relationship and the outcome of the experiment is true is (1 - _p_)&alpha;:

In [None]:
pny = (1 - p) * alpha

In [None]:
pny, pyy

The positive predictive value of the experiment is the number of true positives / total number of positive outcomes:

In [None]:
PPV = pyy / (pny + pyy)

In [None]:
PPV

Yikes! The likelihood of the research finding being false is over 99%:

In [None]:
print(100 * (1 - PPV))

Let's try varying _R_ and hope things get better...

In [None]:
def compute_ppv(R, alpha, beta):
    p = R / (R + 1)
    pyy = (1 - beta) * p
    pny = (1 - p) * alpha
    return pyy / (pny + pyy)

In [None]:
compute_ppv(10 / (100000 - 10), 0.05, 0.20)

In [None]:
for R in [1/n for n in (10000, 1000, 100, 10, 2, 1)]:
        print("R = {0:1.8f}, PPV = {1:1.5f}".format(R, compute_ppv(R, alpha, beta)))