# Betting supermartingale tests: selecting the bets

See  [Waudby-Smith and Ramdas (2021)](https://arxiv.org/pdf/2006.04347.pdf) and [Stark (2023)](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-17/issue-1/ALPHA-Audit-that-learns-from-previously-hand-audited-ballots/10.1214/22-AOAS1646.short)


We have a stochastic process $(X_j)_{j \in \mathbb{N}}$ such that with probability 1, $0 \le X_j \le u_j$.
The distribution of the process is indexed by a parameter $\theta$.
We want to make inferences about $\theta$.

The chapter on [martingales](./martingales.ipynb) introduced betting martingales of the form:
$T_0 :=1$ and
\begin{equation}
T_j := T_{j-1} (1 + \lambda_j (X_j - \mu_j)),
\end{equation}
where $\lambda_j \in  (−u_j/(u_j − \mu_j), 1/\mu_j)$ and $(\lambda_j)_{j \in \mathbb{N}}$ is predictable.

It also introduced the ALPHA martingales, which are products of terms of the form
\begin{eqnarray}
\frac{X_j}{\mu_j} \cdot \frac{\eta_j-\mu_j}{u_j-\mu_j} + \frac{u_j-\eta_j}{u_j-\mu_j} &=&
1 + \frac{\eta_j/\mu_j - 1}{u_j-\mu_j} (X_j - \mu_j),
\end{eqnarray}
which is of the form $(1 + \lambda_j (X_j - \mu_j))$ with $\lambda_j = \frac{\eta_j/\mu_j - 1}{u_j-\mu_j}$.

This chapter addresses how to choose $\lambda_j$ or, equivalently, $\eta_j$.

The motivation for the ALPHA martingales was the SPRT for the Bernoulli parameter $p$.
Among all tests with a given significance level, the SPRT has the smallest expected sample size
to reject the (point) null when the (point) alternative is true.
In the Bernoulli SPRT, $u=1$, $\mu_j = \mu$ is the fixed null value of $p$,
and $\eta_j = \eta$ is the fixed alternative value of the Bernoulli $p$.

In general, how should we pick $\eta_j$ or $\lambda_j$?
The ALPHA approach suggests thinking of $\eta_j$ as an estimate of the true value of $p$ (analogous to the true
population mean, if we relax the binary restriction).
What makes an estimator "good" in this context might not be having low bias and variance. Indeed, we shall see
that using biased estimates can produce more powerful tests.

Recall the view of $(T_j)$ as a gambler's fortune.
The value of $\lambda_j$ is related to the fraction of the current "fortune" the gambler decides to risk on the next outcome.
We can reject the null if and when $T_j \ge 1/\alpha$, so to reject the null, we want the fortune to grow.
That suggests we think of choosing $\lambda_j$ to maximize our fortune, in some sense.
Let's investigate that point of view.

## The Kelly Criterion

Consider bets on a binary outcome, like tosses of a coin. Suppose that the tosses are independent Bernoulli($p$),
and for the moment, assume that $p > 1/2$ is known.
You have $1 to bet, at even odds, meaning that if you win, you get back the same amount as you wagered.

Suppose you are only allowed to bet once. How much should you bet to maximize your expected wealth? If you bet a fraction $\phi$, you keep $(1-\phi)$ "safe" and your expected wealth is
\begin{equation}
(1+\phi) p + (1-\phi)(1-p) = p + \phi p + 1 - p - \phi + \phi p = 1 + (2p-1) \phi.
\end{equation}
Since $2p>1$, this is an increasing function of $\phi$: you should bet your entire stake ($\phi = 1$).

Suppose you are allowed to bet arbitrarily many times (unless you go broke).
How should you bet to maximize your fortune?
(See [this study](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2856963) for how some financial experts
bet.)

If you bet your entire stake on any game, you could go broke. If you don't bet at all, your fortune won't grow. Somewhere in between, there's an optimal amount to bet.

Kelly (1956) proposed betting to maximize the expected growth rate of your fortune. In a very large number of games, by the weak law of large numbers, the fraction of wins will approach $p$, so if you always bet a fraction $\phi$ of your current stake, your stake is expected to be multiplied by $r = (1+\phi)^p(1-\phi)^{1-p}$ "per bet." Kelly suggests picking the
bet to maximize that rate of growth.
Maximizing $r$ is equivalent to maximizing $\log r$, which we can do by finding a stationary point:
\begin{eqnarray}
d/d\phi \log r &=& d/d\phi [p \log(1+\phi) + (1-p) \log(1-\phi)] \\
&=& \frac{p}{1+\phi} - \frac{1-p}{1-\phi}.
\end{eqnarray}
Setting this equal to zero and solving for $\phi$ yields
\begin{eqnarray}
p(1-\phi) &=& (1-p)(1+\phi) \\
p - p\phi &=& 1 + \phi - p - p\phi \\
\phi = 2p-1
\end{eqnarray}

There are more general versions of the Kelly criterion to deal with bets with payoff odds other than 1:1
and bets where, if you lose, you lose only a fraction of what you bet.

Recall that the Bernoulli SPRT minimizes the expected sample size among all sequential methods, and that it can be written as
a betting martingale.
One can show that the "bet" implicit in the Bernoulli SPRT is in fact the Kelly criterion bet based on the alternative.

## What if you don't know the true mean?

The Kelly criterion so far is called _a priori Kelly_ by Waudby-Smith and Ramdas, because it requires knowing the true
mean before starting.

Waudby-Smith and Ramdas (2022) examine several approaches that do not require knowing the true mean:

+ Growth rate adaptive to the particular alternative (GRAPA) 
+ Approximate GRAPA (aGRAPA) 
+ Lower-bound on the wealth (LBOW) 
+ Online Newton Step (ONS-m) 
+ Diversified Kelly betting (dKelly) 
+ Confidence Boundary (ConBo) 
+ Sequentially Rebalanced Portfolio (SRP) 

In their numerical experiments, they find that gridded Kelly and hedged gridded Kelly perform particularly well.

## gKelly and hgKelly

Waudby-Smith and Ramdas focus on constructing confidence sequences by inverting (super)martingale tests.
For that purpose, it's helpful if the tests are easy to invert, and if their inversion produces a connected interval
at each time step.

One method that performs particularly well is gridded Kelly (gKelly). We shall look at the special case of
sampling with replacement, so that the population mean does not change with each draw.
Suppose the hypothesized mean is $\mu$.
Pick $G$ equally spaced points on the interval $[−1/(1 − \mu), 1/\mu]$, $\lambda^1, \ldots, \lambda^G$.
Define the _gridded Kelly_ process 
\begin{equation}
T_j(\mu) := \frac{1}{G} \sum_{g=1}^G \prod_{i=0}^j (1 + \lambda^g(X_i - \mu)).
\end{equation}
If $\mathbb{E}X_i = \mu$, this is an average of martingales (with respect to $(X_i)$), and is thus
a martingale, by the linearity of conditional expectation.
The value of $G$ can be allowed to grow with time, provided everything is kept predictable.
This method can be thought of as a discrete approximation to the _Kaplan Kolmogorov_ method
described in Stark (2020), also stated in 
[Harold Kaplan's defunct website](http://web.archive.org/web/20131209044835/http://printmacroj.com/martMean.htm).

The hedged gridded Kelly method divides each of the intervals $[−1/(1 − \mu), 0]$, $[0, 1/\mu]$ into $G$ equally spaced points,
$\lambda^{1-}, \ldots, \lambda^{G-}$ and $\lambda^{1+}, \ldots, \lambda^{G+}$, then forms
\begin{equation}
T_j(\mu) := \frac{\gamma}{G} \sum_{g=1}^G \prod_{i=0}^j (1 + \lambda^{g-}(X_i - \mu)) + \frac{1-\gamma}{G} \sum_{g=1}^G \prod_{i=0}^j (1 + \lambda^{g+}(X_i - \mu))
\end{equation}
for $\gamma \in (0, 1)$.
Inverting tests based on this martingale yields connected intervals.

## ALPHA using an estimate of the population mean

See [Stark (2023)](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-17/issue-1/ALPHA-Audit-that-learns-from-previously-hand-audited-ballots/10.1214/22-AOAS1646.short).

Election audits: use the reported results as the alternative, or as the starting value for the alternative.

Use data too: Truncated shrinkage estimator.

Bayes estimate.

Fixed alternative can be quite powerful in some circumstances: see Spertus (2023).