Skip to content

CS7545_Sp24_Lecture_24: Optimal betting strategies, gambling and online portfolio optimization

Richard Zhao edited this page May 3, 2024 · 3 revisions

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 24: Optimal betting strategies, gambling and online portfolio optimization

April 18, 2024

Scribes: Matthew So

Part 1: Betting

Suppose that you are horse betting across $n$ horses, with a reward of $r_i$ for the $i$-th horse. This means that if you bet $1 on that horse, and the horse wins, you will receive 1 + $r_i$ dollars, but you will lose all of the money on all of your other bets.

As it turns out, the optimal betting strategy, assuming that you have perfect information about the probability that each horse will win, is to place bets proportional to the probability that each horse will win, regardless of how large the reward is. The whole proof was not covered in class.

This does not necessarily mean that this is the best strategy. There is always the possibility of setting aside some money to not bet. As it turns out, it is possible to still lose money over time, even using this betting strategy, depending on the rewards. Alas, there is no guarantee that you will always make money, even with perfect information.

Part 2: The Universal Portfolio

Suppose that there are $N$ stocks that you can invest in. You want to maximize your returns over time. Let $\vec{p}_t$ be how much each stock goes up proportionally on each stock (50% down for $p_t(i)$ means -0.5 for stock $i$ on day $t$, 100% up means 1). Let $\vec{w}_t$ be how much value you have in stocks on day $t$, and let $w_t(i)$ be how much value you have in stock $i$ on day $t$.

This means that on the $T$-th day, you have a total stock value of $\Pi_{t=1}^T (\vec{w}_t \cdot \vec{p}_t)$.

The Constant Rebalanced Portfolio (CRP) says that based on a distribution $\vec{b}$, you should redistribute your balance of stocks every day so that the distribution of value equals $\vec{b}$. Even if a stock's price doubles and halves every term, because a stock can go up infinitely but can only go down to a total loss, CRPs are generally good.

Cover's Universal Portfolio (CUP) proposes to combine multiple CRPs. This method achieves polynomially worse performance than the best CRP, which is exponential. This means that CUP is only polynomially worse than the best CRP. The difference between CUP and the best CRP worsens as the number of CRPs used increases.

Proof:

Define $r_t$ as the reward for each stock at time T and $w_t$ as the percent of each stock held at time T (note that $w_t$ sums to 1). Then CUP is designed to optimize

$$\frac{\Pi_{t=1}^T (1 + r_t^T w_t)}{\max_{w \in \Delta_n} \Pi_{t=1}^T (1 + r_t^T w)}$$

or in other words, the portfolio's wealth ratio.

We can treat this like an online convex optimization problem by noting that

$$- \log \text{ratio} := Reg_T = \sum_{t=1}^T - \log (1 + r_t^T w_t) - \min_{w \in \Delta_n} \sum_{t=1}^T - \log (1 + r_t^T w)$$

Theorem: There exists an algorithm s.t. for any $r_1, ..., r_T$, $Reg_T \leq n \log T$.

Algorithm: $\forall w \in \Delta_n$, invest in the average of all constant-weighted portfolios weighted by their performances. This is equivalent to buying an infinitesimally small part of every possible CRP, weighted by performance.

After $T$ steps, for any $w \in \Delta_n, \epsilon \in (0, 1)$, consider $B_\epsilon (w) = { \epsilon v + (1 - \epsilon) w: v \in \Delta_n }$.

Lemma 1: Let $w^*$ be the optimal portfolio. Then for all $w \in B_\epsilon (w^*)$, $(1 + w^T r_t) \geq (1 - \epsilon) (1 + w^{*T} r_t)$

Lemma 2: $$\frac{\text{val}(B_\epsilon(w^*))}{\text{val}(\Delta_n)} \geq \epsilon^{n-1}$$

Thus, the wealth of this strategy is at least $$\text{val} (B_\epsilon (w^*)) \Pi_{t=1}^T ((1 - \epsilon) (1 + r_t^T w^*) \geq \text{val} (\Delta_n) \epsilon^{n-1} (1 - \epsilon)^T \text{wealth}_{1:T} (w^*) $$

Letting $\epsilon = \frac{1}{T}$, and taking the negative log, we get that $$Reg_T = -\log \text{wealth} \leq (n-1) \log \frac{1}{\epsilon} - T \log (1 - \epsilon) \leq (n-1) \log T - T \log (1 - \frac{1}{T}) = O(n \log T)$$

Clone this wiki locally