# RPS Quest - Markdown Rendering Test

This notebook tests that the mathematical notation from the RPS Quest webpage renders correctly.

## State Representation

The bot policy operates on a Markovian state representation. Let $u_t$ and $b_t$ denote the human and bot cumulative scores at the start of round $t$. The state at time $t$ is

$$
  s_t = \big(x_{t-1}, x_{t-2}, x_{t-3}, \mathcal{H}_{t-1}, z_t\big),
$$

where:
- $x_{t-1}, x_{t-2}, x_{t-3}$ are the three most recent human moves,
- $\mathcal{H}_{t-1}$ aggregates historical statistics computed from rounds $1, \ldots, t-1$ (cumulative move frequencies, favored-move tendencies, lagged point values),
- $z_t = (u_t, b_t, w_t, t)$ are the deterministic, known quantities at time $t$: current scores, round multipliers, and step index.

## Round Multipliers

Round multipliers are generated per game and round via hash:

$$
  w_t^{(\ell)} = 1.0, \qquad w_t^{(m)} = \beta_t, \qquad w_t^{(h)} = \beta_t + 0.5,
$$

where $\beta_t \in \{1.1, 1.2, 1.3, 1.4, 1.5\}$ and $\{\ell, m, h\}$ is a permutation of moves.

## Model Output

Given $\phi(s_t)$, each model produces a categorical distribution over the next user move:

$$
  \mathbf{p}_t = \big(p_t^{(R)}, p_t^{(P)}, p_t^{(S)}\big) = \mathbb{P}(x_{t+1} \mid s_t),
$$

which drives both the greedy policy and telemetry.

## HJB Equation

The underlying control problem is a finite-horizon Markov decision process with a terminal condition at $\tau = 10$ points. An optimal controller would solve the discrete-time Hamilton-Jacobi-Bellman (HJB) equation backward in time:

$$
  V_t(s) = \max_{a \in \mathcal{A}} \Big[ r_t(s,a) + \gamma \, \mathbb{E}\big[ V_{t+1}(S_{t+1}) \mid s, a \big] \Big],
$$

with $V_T(s) = 0$ once either player hits the target score $\tau$ and discount $\gamma = 1$.

## Greedy Q-Value Approximation

For any candidate bot move $a$, let $b(a)$ denote the human action beaten by $a$ and $\ell(a)$ the action that defeats $a$. The approximation computes:

$$
  \widehat{Q}_t(a) = p_t^{(b(a))} \Big(w_t^{(a)} + B_t(a)\Big) - p_t^{(\ell(a))} \Big(w_t^{(\ell(a))} + L_t(a)\Big) + p_t^{(a)}\, C_t,
$$

where

$$
  B_t(a) = \mathbf{1}\{u_t + w_t^{(a)} \ge \tau\}\cdot \tau, \quad
  L_t(a) = \mathbf{1}\{b_t + w_t^{(\ell(a))} \ge \tau\}\cdot \tau, \quad
  C_t = \tfrac{1}{2}\,\frac{u_t - b_t}{\tau}.
$$

## Verification

✅ All equations should render properly above with:
- Greek letters ($\beta_t$, $\tau$, $\gamma$)
- Calligraphic fonts ($\mathcal{H}_{t-1}$, $\mathcal{A}$)
- Blackboard bold ($\mathbb{P}$, $\mathbb{E}$)
- Indicator functions ($\mathbf{1}$)
- Proper alignment in multi-line equations