In [4]:
import numpy as np
import random as rn

# Optimal stopping of a 100-die

Roll a 100-die, paying 1 for rolling, and win what is on the face or roll again. What is the optimal strategy?

Start saying that after 100 rolls one stops for sure, because there is no way to generate a positive return anymore.

One can think to stop at either a fixed number $r$ of rolls, pinning at $n$ the cost and the expected win at $50.5-r$, or fix a face value $n$ and accept only above it, regardless than the number of rolls. Therefore, the expected win $w_n$ is equal to:

$$w_n = \mathbb E[X_{i}-i|X_{i}\ge n]=\mathbb E[X_{i}|X_{i}\ge n]-\mathbb E[i|X_{i}\ge n]=\frac 1{100-(n-1)}\sum_{i=n}^{100} X_i-\frac {100}{100-(n-1)}.$$
One can brute-force compute the maximum expected win, and the corresponding strategy.

In [2]:
dim = 100

w=[]
for n in range(dim-20,dim-10):
    w.append(sum([i for i in range(n,dim+1)])/(dim-n+1)-dim/(dim-n+1))
    print(n,w[-1])

80 85.23809523809524
81 85.5
82 85.73684210526315
83 85.94444444444444
84 86.11764705882354
85 86.25
86 86.33333333333333
87 86.35714285714286
88 86.3076923076923
89 86.16666666666667


It is predicted to be for 87.

On _Introduction to stochastic processes_ (Lawler), optimal stopping problems are described in a very general framework. Each roll has a payoff equal to $f(k)$, where $k$ is the roll, and $f$ a generic payoff function. At the same time, the player has a value function that associates to each roll $k$ a value given by following the optimal strategy, $v(k)$. There exists also an expected value $u(k)$ that is the expected value of $v$ at the next roll, so that $u(k)=\sum_k \mathbb E[v(k)|k]\mathbb P(k)$.

The strategy reads in the following way: if $f(k)$ is larger than $u(k)$, the player should stop rolling, because the immediate gain is larger than the expected value under optimal conditions. Clearly, $v(k)=\max[f(k), u(k)]$.

Think that the system is a Markov chain, then there exists a transition matrix $P=\{P_{ij}=\mathbb P(X_{n+1}=j|X_n=i)\}$, and $u(k)=P v(k)=\sum_{j=1}^n P_{kj}v(j)$. So $v(k)=\max[f(k), Pv(k)]$.

If a cost $g(k)$ is included, one gets $v(k)=\max[f(k), Pv(k) - g(k)]$.

In the mentioned book the author characterises $v(k)$ and establishes that the following iterative method converges to the real value function:
$$ v_{n+1}(k) = \max[f(k), Pv_n(k) - g(k)].$$
Implementation:
1. $P$ is going to be a full matrix, with every element equal to $1/d$, where $d$ is the dimension of the dice, because it is the transition matrix of a Markov chain that can reach any state from any state with the same probability
2. $f(k)=k$
3. $g(k)=1$, because rolling always costs 1
4. start from $v(k)=\max f(k)$ for every $k$ being a non-absorbing state, otherwise $v(k)=f(k)$, that is a natural choice

The method confirms previous results.

In [6]:
f = np.linspace(1,dim,dim)
v = np.full(dim,max(f))
g = np.full(dim,1)
P = np.full((dim,dim),1/dim)

for i in range(1,200):
    v = np.maximum(f,P.dot(v)-g)

print('Strategy: exit if roll >=',np.argmax(f >= v)+1)
print('Expected win: %.2f' % P[1].dot(v))
print('Expected number of rolls (geometrical distribution): ',dim/sum(f >= v))

Strategy: exit if roll >= 87
Expected win: 87.36
Expected number of rolls (geometrical distribution):  7.142857142857143


One can just set up a Monte Carlo simulation. Play many games with different strategies, and record the results.

In [17]:
partite = 800000

tiri =  np.random.randint(100+1,size = partite*100)

i = 0
for Nmin in range(85,90):
    vittoria = 0
    costo = 0
    for k in range(0,partite):
        g = 0
        while True:
            tiro = tiri[i]
            i += 1
            g += 1
            if tiro >= Nmin:
                vittoria += tiro-(g-1)
                costo += g
                break
    costo /= partite
    vittoria /= partite
    print('Strategy: exit when rolled ',Nmin,' - average win: ',vittoria,' in ',costo,' rolls')

Strategy: exit when rolled  85  - average win:  87.18416  in  6.3165025  rolls
Strategy: exit when rolled  86  - average win:  87.252165  in  6.74376875  rolls
Strategy: exit when rolled  87  - average win:  87.2799375  in  7.21187625  rolls
Strategy: exit when rolled  88  - average win:  87.2288825  in  7.7748475  rolls
Strategy: exit when rolled  89  - average win:  87.09873625  in  8.40769  rolls
