# $E$-values and Betting Scores

Core references: 

+ Grünwald, P., R. de Heide, and W. Koolen, 2023. Safe Testing. https://arxiv.org/abs/1906.07801

+ Shafer, G., 2021. Testing by betting: A strategy for statistical and scientific communication,
_Journal of the Royal Statistical Society Series A: Statistics in Society_, _184_, 407–431, https://doi.org/10.1111/rssa.12647

+ Vovk, V., and R. Wang, 2021. E-values: Calibration, combination and applications, _Ann. Statist. 49_ (3) 1736-1754. https://doi.org/10.1214/20-AOS2020

+ Lecture notes by V. Vovk. https://www.isibang.ac.in/~statmath/pcm2020/talk1.pdf, https://www.isibang.ac.in/~statmath/pcm2020/talk2.pdf

+ Wang, R., and A. Ramdas, 2021. False discovery rate control with e-values, https://arxiv.org/pdf/2009.02824.pdf

$E$-values are a way of quantifying evidence about a statistical hypothesis. 
They are closely related to $P$-values, but more general in many ways, and possibly easier to understand.
In particular (paraphrasing Shafer, 2021):

+ An $E$-value is the observed value of a nonnegative random variable whose expected value under the null is 1: $\mathbb{E}_0 E = 1$. In contrast, a $P$-value is the observed value of a nonnegative random variable whose probability distribution under the null is dominated by the uniform distribution: $\mathbb{P}_0 \{P \le x\} \le x$, $\forall x \in [0, 1]$. It is generally a much more straightforward to construct $E$-values than $P$-values.

+ $E$-values are like the returns on a bet. Most people know it's possible to win a lot of money by "getting lucky"
and winning a bet with long odds, or by identifying bets where the payoff odds don't reflect the chance odds. Fewer people understand $P$-values. It's common to think that a small $P$-value means the alternative is true or that the probability that the null is true is small--two common misconceptions.

+ Any particular bet implies an alternative hypothesis. The betting score is the likelihood of the alternative divided by the likelihood of the null. Likelihood ratios have intuitive appeal.

+ Power calculations involve a fixed significance level, not a $P$-value, so there's no direct analog of power for $P$-values. In contrast, a bet also implies a target: a value for the betting score that might be expected if the alternative hypothesis is true. 

+ The validity of $P$-values generally requires pre-specifying the entire analysis, but betting scores can include arbitrarily complex strategies to "win" that use all currently available data to inform the next bet.  Betting scores thus may correspond better to how Science is conducted: a single hypothesis might be tested many times, and each experiment (including its design, what is measured, and the test used) might be informed by previous experiments--and the alternative may evolve from new information in other fields.

+ Betting scores can often be combined by multiplication, which corresponds to "reinvesting" the winnings in future bets, and can always be combined using averages (with or without weights). In contrast, combining $P$-values is much more subtle.



## Notation
In this chapter, we will use the notation $\mathbb{P}(\cdot) := \mathbb{E}_P (\cdot)$ to denote expectation with respect to the distribution $P$.
The expected value of the indicator function of a measurable set $A$ is the 
probability of $A$; generalizing from $\mathbb{E}_P (1_A) = \mathbb{P}(A)$ to the expectation of other functions.

Any distributions that appear in the same expression will be assumed to have a single dominating measure
$\mu$, so we can talk about densities with respect to $\mu$.
The density of $\mathbb{P}$ with respect to $\mu$ will be denoted $f_\mathbb{P}$, so that 
$d\mathbb{P}(\omega) = f_\mathbb{P}(\omega) d\mu(\omega)$.

Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space.
Let $\mathcal{I}$ be a totally ordered set with order relation $\le$.
Suppose that for all $i \in \mathcal{I}$, $\mathcal{F}_i$ is a sub-sigma-algebra of $\mathcal{A}$,
and that if $i < j$, $\mathcal{F}_i \subset \mathcal{F}_j$.
Then $\mathbb{F} := \{\mathcal{F}_i\}_{i \in \mathcal{I}}$ is a _filtration_ and $(\Omega, \mathcal{A}, \mathbb{F}, \mathbb{P})$ is
a _filtered probability space_.

Filtrations arise naturally in studying stochastic processes.
Let $\sigma(X)$ be the sigma-algebra generated by the random variable $X$ (the smallest sigma-algebra for which $X$ is measurable, i.e., the smallest sigma algebra that contains the pre-image $X^{-1}(B)$
of every Borel subset $B \subset \mathcal{B}$), and let  $\sigma(X_j : j \le i) := \sigma(\cup_{j \le i} \sigma(X_j))$.
As the process evolves, a richer and richer set of events becomes measurable.
Let $(X_i)_{i \in \mathbb{N}}$ be a stochastic process on the probability space $(\Omega, \mathcal{A}, \mathbb{P})$,
and define $\mathcal{F}_i := \sigma(X_j : j \le i)$.
Then $\mathbb{F} := \{\mathcal{F}_i\}_{i \in \mathcal{I}}$ is a filtration.

## Warm-up 1: betting as evidence

A _predictor_ or _forecaster_ claims that $Y$ is a random variable with probability distribution $\mathbb{P}_0$.
The value $y$ of $Y$ will be revealed (to the predictor and you) later.
The predictor backs up the claim by offering to sell you any payoff $S(y)$ for the price $\mathbb{E}_0 S(Y) =: \mathbb{P}_0 S(Y)$,
the expected value of $S(Y)$ (before it is observed), computed
on the assumption that the predictor is right--that $Y$ is indeed a random variable and has distribution $\mathbb{P}_0$.
The payoff is required to be nonnegative, so that all you risk is what you bet: the expected value of $S$ on the assumption that $Y \sim \mathbb{P}_0$.

If you buy the bet $S$, your payoff is $S(y)$ if $Y=y$, and your _betting score_ is $S(y)/\mathbb{P}_0 S(Y)$, the amount by which you multiplied your initial stake.
Without loss of generality, we may assume that $\mathbb{P}_0 S(Y) = 1$ and allow you to buy any multiple of the bet $S$ you can afford; then your (eventual) betting score is $S(y)$. 
You don't have to bet your whole fortune, but if you withhold a fraction $\beta$ of your current fortune and 
bet the remaining fraction $1-\beta$ on $S$, that is equivalent to betting your whole fortune on $S' = \beta + (1-\beta)S$, which also has expected value 1 under the null since $\mathbb{P}_0 S = 1$.
That is, betting only a fraction of your current fortune its just another bet that is expected to break even
under the predictor's hypothesis, so without loss of generality, we can assume that you bet your entire fortune
on some $S$.

It isn't necessary that *you* believe $Y$ is really a random variable: you can still
bet if you think the predictor's claim is wrong, that is, if you think you can make money betting on some
$S$ with $\mathbb{P}_0 S(Y) = 1$.

Now suppose there is a series of trials, $(Y_j)$, which might or might not be random; and if they are random,
they might or might not be independent.
The predictor is allowed to make a series of predictions, say $\mathbb{P}_{0j}$ for
$j = 1, \ldots$.
The predictor need not make a prediction for every trial, 
and the prediction for the $j$th trial, $\mathbb{P}_{0j}$, might depend on the outcome of previous trials, $\{Y_i \}_{i<j}$.
(This is much closer to how science is conducted than the assumption that trials are independent
and involve the same parameters.)
Shafer (2021) writes:

> The probabilistic predictions that can be associated with a scientific hypothesis usually go beyond a single comprehensive probability distribution. In some cases, a scientist may begin with a joint probability distribution P for a sequence of variables $Y_1 , \ldots, Y_N$  and formulate a plan for successive experiments that will allow her to observe them. But the scientific enterprise is usually more opportunistic. A scientist might perform an experiment that produces $Y_1$’s value $y_1$ and then decide whether it is worthwhile to perform the further experiment that would produce $Y_2$’s value $y_2$. Perhaps no one even thought about $Y_2$ at the outset. One scientist or team tests the hypothesis using $Y_1$, and then, perhaps because the result is promising but not conclusive, some other scientist or team comes up with the idea of further testing the hypothesis with a second variable $Y_2$ from a hitherto uncontemplated new experiment or database.


Suppose you start with \\$1, and you are allowed to bet on any or all of the predictions: before the $j$th 
trial the predictor offers the prediction $Y_j \sim \mathbb{P}_{j0}$, which can depend on previous trials.
You are allowed to buy any nonnegative $S(Y_j)$ for the price $\mathbb{P}_{j0} S(Y_j)$, which is assumed to
be \\$1. 
Your fortune after the first bet is settled is $S_1(y_1)$. 
The predictor now offers to sell you any $S_2(Y_2)$ for its expected value under the null $Y_2 \sim \mathbb{P}_{02}$,
again assumed to be \\$1.
If you bet your current fortune to by a multiple of $S_2(Y_2)$, then
your fortune when the second bet settles is $S_1(y_1)S_2(y_2)$, etc.: betting scores on successive bets multiply to give your current fortune.

If you end up making a lot of money, that is evidence that the predictor was wrong--or that you were very lucky. 
If you don't end up making a lot of money, maybe the predictor was right--or maybe you chose bad bets (you didn't bet
on the right alternative).
Regardless, it is not evidence that the predictor was right.
This is the same asymmetry involved in hypothesis tests: a large $P$-value is not evidence that the null is true.

## Warm-up 2: hypothesis tests as bets

See [hypothesis testing](./tests.ipynb).

Core idea: if you can make money betting against the null hypothesis (by making bets
that are expected to be break-even of the null hypothesis is true), that's evidence that the
null hypothesis is false.

In the typical setup for hypothesis testing, we observe data $X \sim \mathbb{P}$.
To test the null hypothesis test $\mathbb{P} = \mathbb{P}_0$, we choose a function $\phi(\cdot)$ with the property that $\mathbb{P}_{\mathbb{P}_0,U}\phi(X,U) = \alpha$, where $U$ is an auxilliary uniform random variable
independent of $X$, only needed for randomized tests.

We reject the hypothesis $\mathbb{P} = \mathbb{P}_0$ if $U \le \phi(X)$.

We can think of $\phi(X)$ as an "all-or-nothing" bet that pays $1/\alpha$ times the stake (which we will 
take to be \\$1) if $U \le \phi(X)$, and pays 0 otherwise. 
If the null is true, the expected value of such a bet is \\$1.
That is, $X$ plays the role of $Y$, above, and $S(Y)$ is $1/\alpha$ if $U \le \phi(Y)$ and zero otherwise.

Two scenarios:
+ bet once in a while, don't reinvest your winnings
+ bet whenever you want, reinvest your winnings

The first is like $P$-values and standard tests of significance: all-or-nothing bets, 
with no "combining evidence" across experiments.
The second leads to betting scores and $E$-values.

Multiple testing: suppose a hypothesis is tested 20 times at significance level 5%, producing one
"significant" result. From a testing perspective, we have to adjust for multiplicity to understand
how strong the evidence is that the null is false, and that adjustment requires knowing the dependence among the experiments. From an $E$-value perspective, the betting score is 1: \\$20 was wagered, and \\$20 was won.

## Betting scores for simple nulls are likelihood ratios, and vice versa

Suppose we have a nonnegative random variable $S(Y)$ with expected value $1$ under the null $Y \sim \mathbb{P}_0$,
i.e., $\int S(y) d\mathbb{P}_0(y) = 1$. 
Thus the measure $\mathbb{Q}$ defined by $d\mathbb{Q}(y) := S(y) d\mathbb{P}_0(y)$ is also a probability measure: $\mathbb{Q}(y) \ge 0$ and $\int d\mathbb{Q}(y) = 1$.
Hence, $S(y) = f_\mathbb{Q}(y)/f_{\mathbb{P}_0}(y)$ is the likelihood ratio of $\mathbb{Q}$ to $\mathbb{P}_0$.
The distribution $\mathbb{Q}$ is called _the alternative implied by $S$_.

Conversely, suppose $\mathbb{Q}$ is a probability distribution for $Y$.
Then $S(Y) := f_\mathbb{Q}(Y)/f_{\mathbb{P}_0}(Y)$ is a betting score, since it is nonnegative and
\begin{eqnarray}
\mathbb{P}_0 (f_\mathbb{Q}(Y)/f_{\mathbb{P}_0}(Y)) &=& \int (f_\mathbb{Q}(y)/f_{\mathbb{P}_0}(y)) d\mathbb{P}_0(y) \\
&=& \int (f_\mathbb{Q}(y)/f_{\mathbb{P}_0}(y)) f_{\mathbb{P}_0}(y) d\mu(y) \\
&=& \int f_\mathbb{Q}(y) d\mu(y) \\
&=& \int d\mathbb{Q}(y) \\
&=& 1.
\end{eqnarray}

As mentioned above, the betting formulation makes sense even if $Y$ isn't a random variable.
But suppose I think $Y \sim \mathbb{Q} \ne \mathbb{P}_0$.
What payoff function $S$ should I bet on?

If the goal is to grow my capital at the fastest rate (the Kelly criterion), I want to maximize
\begin{equation}
\mathbb{Q} \ln S = \mathbb{Q} \ln f_\mathbb{R}(Y)/f_{\mathbb{P}_0}(Y)
\end{equation}
for some measure $R$.
Gibbs' inequality says that
\begin{equation}
\mathbb{Q} f_\mathbb{Q}(Y)/f_{\mathbb{P}_0}(Y) \ge \mathbb{Q} f_\mathbb{R}(Y)/f_{\mathbb{P}_0}(Y)
\end{equation}
for any distribution $\mathbb{R}$ for $Y$ (dominated by $\mu$).

Thus the optimal payoff function to bet on is $S(Y) = f_\mathbb{Q}(Y)/f_{\mathbb{P}_0}(Y)$.

## Why maximize the expected log payoff?

This is connected to the idea of repeated betting, rather than one-shot bets.
As noted previously, if you maximize the expected return on a single bet, you risk going broke.
Maximizing the expected return puts all your money on the single outcome with the highest probability.

## Implied targets

If you bet on $S$, implicitly you are suggesting that $Y \sim \mathbb{Q}$, 
where $f_\mathbb{Q}(y) := S(y) f_{\mathbb{P}_0}(y)$.
Implicitly, you expect 
\begin{eqnarray}
\mathbb{Q} \ln S(Y) &=& \int \ln S(y) d\mathbb{Q}(y) \\
&=&  \int \ln S(y) S(y) d\mathbb{P}_0(y) \\
&=& \mathbb{P}_0 S(Y) \ln S(Y).
\end{eqnarray}

## Composite nulls

Suppose that the forecaster claims that $Y \sim \mathbb{P}$ for some (otherwise unspecified) $\mathbb{P} \in \mathcal{P}_0$, i.e., the null hypothesis is composite, rather than simple.
We can test such a hypothesis using betting by using nonnegative payoffs $S(Y)$ that have expected value no greater than 1
for any $\mathbb{P} \in \mathcal{P}_0$.

# Betting-based testing protocols for statistical models

Suppose a statistical model gives a probability distribution $\mathbb{P}_0$ for data $Y$

The statistician is the _Skeptic_. _Reality_ reveals the value of random variables.

**Protocol 1:**

+ Skeptic selects a random variable $S \ge 0$ such that $\mathbb{P}_0 S(Y) = 1$.  
+ Reality announces $y$
+ $K := S(y)$.

$\mathbb{P}_0 (K \ge 1/\alpha) \le \alpha$ for all $\alpha \in (0, 1]$, by Markov's inequality.

## E-values formalized

**Definition.**
Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space,
and let $E$ be a random variable $E: \Omega \rightarrow [0, \infty]$ such that
$\mathbb{P}_\mathbb{P}(E) := \int_{\mathcal{X}} E d\mathbb{P} \le 1$. (Note that $E$ may take the value $\infty$, which
corresponds to the strongest possible evidence that the data do not come from $\mathbb{P}$.)
Then **$E$ is an e-variable for $\mathbb{P}$.**

Let $\mathcal{P}$ be a collection of probability distributions on the measurable space $(\Omega, \mathcal{A})$,
and let $E$ be a random variable $E: \Omega \rightarrow [0, \infty]$ such that for all $\mathbb{P} \in \mathcal{P}$,
$\mathbb{P}_\mathbb{P}(E) := \int_{\mathcal{X}} E d\mathbb{P} \le 1$.
Then **$E$ is an e-variable for $\mathcal{P}$.**

The set of all $E$-variables for a collection $\mathcal{P}$ of probability distributions is $\mathcal{E}(\mathcal{P})$.

The observed value of an $E$-variable is an $E$-value (or $e$-value).

**Definition.**
Let $(\Omega, \mathcal{A}, \mathbb{P})$ be a probability space,
and let $P$ be a random variable $P: \Omega \rightarrow [0, 1]$ such that
$\forall p \in [0, 1]$,
$\mathbb{P}(P \le p) \le p$.
Then **$P$ is a P-variable for $\mathbb{P}$.**

Let $\mathcal{P}$ be a collection of probability distributions on the measurable space $(\Omega, \mathcal{A})$,
and let $P$ be a random variable $P: \Omega \rightarrow [0, 1]$ such that for all $\mathbb{P} \in \mathcal{P}$,
$\forall p \in [0, 1]$,
$\mathbb{P}(P \le p) \le p$.
Then **$P$ is a P-variable for $\mathcal{P}$.**

The set of all $P$-variables for a collection $\mathcal{P}$ of probability distributions is $\mathcal{P}(\mathcal{P})$.


The observed value of a $P$-variable is a $P$-value.



**$P$ to $E$ calibration function**

Suppose $f : [0, 1] \rightarrow [0, \infty]$  is a ($p$-to-$e$) calibrator if, for any probability space 
$(\Omega, \mathcal{A}, \mathbb{P}) and any $P$-variable $P \in  \mathcal{P}_\mathbb{P}$,
$f(P) \in \mathcal{E}(\mathcal{P}) 

A calibrator $f$ *dominates* a calibrator $g$ if $f \ge g$; $f$ *strictly dominates* $g$ if $f \ge g$ and $f \ne g$.
A calibrator is *admissible* if it is not strictly dominated by any other calibrator.

The following proposition (Vovk & Wang, 2021 Proposition 2.2)
says that a calibrator is a nonnegative decreasing
function on $[0, 1]$ whose integral is at most 1.

**Proposition** . 
A decreasing function $f : [0, 1] \rightarrow [0, \infty]$ is a calibrator if
and only if $\int_0^1 fdp \le 1$. 
It is admissible if and only if it is upper semicontinuous,
$f(0) = \infty$, and $\int_0^1 fdp = 1$.

Examples.

\begin{equation}
f^\kappa(p) := \kappa p^{\kappa−1}, \;\; \kappa \in (0, 1).
\end{equation}

\begin{equation}

\end{equation}

**$E$ to $P$ calibration function**


## Combining $E$-values

Suppose $(E_{jt})_{t \in \mathbb{N}}$ is an $E$-process for the filtration $(\mathcal{F}_{jt})$, $j = 1, \ldots, n$, 
and let $(\gamma_{jt})_{t \in \mathbb{N}}$ be predictable with respect to $(\mathcal{F}_{jt})$
and satisfy $\gamma_{jt} \ge 0$ and $\sum_{j=1}^n \gamma_{jt} \le 1$.
Define $\gamma_t \cdot E_t := \sum_{j=1}^n  \gamma_{jt} E_{jt}$.
\end{equation}
Then $(\gamma_t \cdot E_t)_{t \in \mathbb{N}}$ is an $E$-process.

Can adaptively bet more on $E$-processes that are growing large.