# $E$-values and Betting Scores

Core references: 

+ Shafer, G., 2021. Testing by betting: A strategy for statistical and scientific communication,
_Journal of the Royal Statistical Society Series A: Statistics in Society_, _184_, 407–431, https://doi.org/10.1111/rssa.12647

+ Vovk, V., and R. Wang, 2021. E-values: Calibration, combination and applications, _Ann. Statist. 49_ (3) 1736-1754. https://doi.org/10.1214/20-AOS2020

+ Lecture notes by V. Vovk. https://www.isibang.ac.in/~statmath/pcm2020/talk1.pdf, https://www.isibang.ac.in/~statmath/pcm2020/talk2.pdf

+ Wang, R., and A. Ramdas, 2021. False discovery rate control with e-values, https://arxiv.org/pdf/2009.02824.pdf

$E$-values are a way of quantifying evidence about a statistical hypothesis. 
They are closely related to $P$-values, but more general in many ways, and possibly easier to understand.
In particular (paraphrasing Shafer, 2021):

+ An $E$-value is the observed value of a nonnegative random variable whose expected value under the null is 1: $\mathbb{E}_0 E = 1$. In contrast, a $P$-value is the observed value of a nonnegative random variable whose probability distribution under the null is dominated by the uniform distribution: $\mathbb{P}_0 \{P \le x\} \le x$, $\forall x \in [0, 1]$. It is generally a much more straightforward to construct $E$-values than $P$-values.

+ $E$-values are like the returns on a bet. Most people know it's possible to win a bet with long odds by having good luck, or by identifying good bets where the payoff odds don't reflect the chance odds. Fewer people understand $P$-values. It's common to think that a small $P$-value means the alternative is true or that the probability that the null is true is small--two common misconceptions.

+ Any particular bet implies an alternative hypothesis. The betting score is the likelihood of the alternative divided by the likelihood of the null. Likelihood ratios have intuitive appeal.

+ Power calculations involve a fixed significance level, not a $P$-value, so there's no direct analog of power for $P$-values. In contrast, a bet also implies a target: a value for the betting score that might be expected if the alternative hypothesis is true. 

+ The validity of $P$-values generally requires pre-specifying the entire analysis, but betting scores can include deliberate attempts to improve one's "fortune" by using all currently available data to inform the next bet.  Betting scores thus may correspond better to how Science is conducted: a single hypothesis might be tested many times, and each experiment (including its design, what is measured, and the test used) might be informed by previous experiments.

+ Betting scores can often be combined by multiplication, which corresponds to "reinvesting" the winnings in future bets, and can always be combined using averages (with or without weights). In contrast, combining $P$-values is much more subtle.



## Warm-up: hypothesis tests as bets

See [hypothesis testing](./tests.ipynb).

Core idea: if you can make money betting against the null hypothesis (by making bets
that are expected to be break-even of the null hypothesis is true), that's evidence that the
null hypothesis is false.

In the typical setup for hypothesis testing, we observe data $X \sim \mathbb{P}$.
To test the null hypothesis test $\mathbb{P} = \mathbb{P}_0$, we choose a function $\phi(\cdot)$ with the property that $\mathbb{E}_{\mathbb{P}_0,U}\phi(X,U) = \alpha$, where $U$ is an auxilliary uniform random variable
independent of $X$, only needed for randomized tests.

We reject the hypothesis $\mathbb{P} = \mathbb{P}_0$ if $U \le \phi(X)$.

We can think of $\phi(X)$ as an "all-or-nothing" bet that pays $1/\alpha$ times the stake (which we will 
take to be \\$1) if $U \le \phi(X)$
and 0 otherwise. 
If the null is true, the expected value of the bet is \\$1.

A significance-level $\alpha$ test that pays $1/\alpha$ 

Two scenarios:
+ bet once in a while, don't reinvest your winnings
+ bet whenever you want, reinvest your winnings

The first is like $P$-values and standard tests of significance: all-or-nothing bets, no "combining evidence" across experiments.
The second leads to betting scores and $E$-values.

Multiple testing: suppose a hypothesis is tested 20 times at significance level 5%, producing one
"significant" result. From a testing perspective, we have to adjust for multiplicity to understand
how strong the evidence is that the null is false, and that adjustment requires knowing the dependence among the experiments. From an $E$-value perspective, the betting score is 1: \\$20 was wagered, and \\$20 was won.

**Definition.**
Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space,
and let $E$ be a random variable $E: \Omega \rightarrow [0, \infty]$ such that
$\mathbb{E}_\mathbb{P}(E) := \int_{\mathcal{X}} E d\mathbb{P} \le 1$. (Note that $E$ may take the value $\infty$, which
corresponds to the strongest possible evidence that the data do not come from $\mathbb{P}$.)
Then **$E$ is an e-variable for $\mathbb{P}$.**

Let $\mathcal{P}$ be a collection of probability distributions on the measurable space $(\Omega, \mathcal{A})$,
and let $E$ be a random variable $E: \Omega \rightarrow [0, \infty]$ such that for all $\mathbb{P} \in \mathcal{P}$,
$\mathbb{E}_\mathbb{P}(E) := \int_{\mathcal{X}} E d\mathbb{P} \le 1$.
Then **$E$ is an e-variable for $\mathcal{P}$.**

The set of all $E$-variables for a collection $\mathcal{P}$ of probability distributions is $\mathcal{E}(\mathcal{P})$.

The observed value of an $E$-variable is an $E$-value (or $e$-value).

**Definition.**
Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space,
and let $P$ be a random variable $P: \Omega \rightarrow [0, 1]$ such that
$\forall p \in [0, 1]$,
$\mathbb{P}(P \le p) \le p$.
Then **$P$ is a P-variable for $\mathbb{P}$.**

Let $\mathcal{P}$ be a collection of probability distributions on the measurable space $(\Omega, \mathcal{A})$,
and let $P$ be a random variable $P: \Omega \rightarrow [0, 1]$ such that for all $\mathbb{P} \in \mathcal{P}$,
$\forall p \in [0, 1]$,
$\mathbb{P}(P \le p) \le p$.
Then **$P$ is a P-variable for $\mathcal{P}$.**

The set of all $P$-variables for a collection $\mathcal{P}$ of probability distributions is $\mathcal{P}(\mathcal{P})$.


The observed value of a $P$-variable is a $P$-value.



**$P$ to $E$ calibration function**

Suppose $f : [0, 1] \rightarrow [0, \infty]$  is a ($p$-to-$e$) calibrator if, for any probability space 
$(\Omega, \mathcal{A}, \mathbb{P}) and any $P$-variable $P \in  \mathcal{P}_\mathbb{P}$,
$f(P) \in \mathcal{E}(\mathcal{P}) 

A calibrator $f$ *dominates* a calibrator $g$ if $f \ge g$; $f$ *strictly dominates* $g$ if $f \ge g$ and $f \ne g$.
A calibrator is *admissible* if it is not strictly dominated by any other calibrator.

The following proposition (Vovk & Wang, 2021 Proposition 2.2)
says that a calibrator is a nonnegative decreasing
function on $[0, 1]$ whose integral is at most 1.

**Proposition** . 
A decreasing function $f : [0, 1] \rightarrow [0, \infty]$ is a calibrator if
and only if $\int_0^1 fdp \le 1$. 
It is admissible if and only if it is upper semicontinuous,
$f(0) = \infty$, and $\int_0^1 fdp = 1$.

**$E$ to $P$ calibration function**
