# Win Probability Evolution During a Tennis Match

This notebook simulates a tennis match point-by-point and tracks how Player 1's 
probability of winning the match evolves over time. We compare two approaches:

1. **Static estimation**: Uses fixed initial guesses for serve-win probabilities
2. **Dynamic (Bayesian) estimation**: Updates serve-win probability estimates 
   after each point using observed outcomes

The key insight is that we don't know the *true* serve-win probabilities during 
a real match. Instead, we start with prior beliefs and refine them as we observe 
more data.

## Setup and Imports

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import numpy.typing as npt
import os, sys

# Add path to the src directory
PROJECT_ROOT = os.path.abspath('..')
SRC_DIR = os.path.join(PROJECT_ROOT, 'src')
if SRC_DIR not in sys.path:
    sys.path.append(SRC_DIR)

from tennis_lab.core.match_format import MatchFormat
from tennis_lab.montecarlo.match_simulation import simulateMatchWinProbabilityEvolution

## Simulating a Single Match

We simulate a tennis match with the following setup:

**True (hidden) parameters** - used to generate actual point outcomes:
- `P1actual`: Player 1's true probability of winning a point on serve
- `P2actual`: Player 2's true probability of winning a point on serve

**Prior beliefs** - our initial guesses, modeled as Beta distributions:
- `P1prior`, `alpha1`: Mode and concentration of our belief about P1's serve ability
- `P2prior`, `alpha2`: Mode and concentration of our belief about P2's serve ability

The `alpha` parameter controls how confident we are in our prior:
- Low alpha (~10): Wide distribution, uncertain → updates quickly with new data
- High alpha (~100): Narrow distribution, confident → updates slowly

In [None]:
# ===================
# Match Configuration
# ===================

matchFormat = MatchFormat(bestOfSets=3)

# True serve-win probabilities (unknown in practice, used to simulate outcomes)
P1actual = 0.55
P2actual = 0.65

# Prior beliefs about serve-win probabilities (Beta distribution parameters)
P1prior = 0.50   # Initial guess for P1's serve-win probability
alpha1  = 100    # Confidence in prior (higher = more confident)

P2prior = 0.50   # Initial guess for P2's serve-win probability
alpha2  = 100    # Confidence in prior

# ==============
# Run Simulation
# ==============

probsStatic, probsDynamic, P1estimates, P2estimates = \
    simulateMatchWinProbabilityEvolution(matchFormat, P1actual, P2actual, P1prior, alpha1, P2prior, alpha2)

# =================
# Visualize Results
# =================

nPoints = len(probsStatic)

fig, ax = plt.subplots(figsize=(10, 5))

ax.set_xlim(-5, 10 * (nPoints // 10 + 1))
ax.set_xticks(np.arange(0, nPoints + 1, 20))
ax.set_xlabel("Point Number", fontsize=10)

ax.set_ylim(-0.05, 1.05)
ax.set_yticks(np.arange(0, 1.01, 0.1))
ax.set_ylabel("P(Player 1 Wins Match)", fontsize=10)

ax.grid(linewidth=0.2, alpha=0.7)
ax.axhline(0.5, color="black", linewidth=0.5, linestyle="--", alpha=0.5)

ax.set_title("Evolution of Win Probability During a Tennis Match", fontsize=12)
ax.plot(probsStatic,  linewidth=1.0, color="gray",   alpha=0.8, label="Static (fixed priors)")
ax.plot(probsDynamic, linewidth=1.0, color="purple", alpha=0.9, label="Dynamic (Bayesian updates)")
ax.legend(fontsize=9, loc="upper left")

plt.tight_layout()
plt.show()

print(f"Match result: Player 1 {'won' if probsStatic[-1] == 1 else 'lost'}")
print(f"Total points played: {nPoints - 1}")

## Comparing Static vs Dynamic Forecasts

Which approach produces better probability estimates? 

We define "better" as: the estimate that is closer to the eventual outcome.
- If Player 1 wins, higher probability estimates are better
- If Player 1 loses, lower probability estimates are better

The function below calculates what percentage of the match the dynamic 
estimate was "better" than the static one.

In [None]:
def calcPercentDynamicBetter(probsStatic:  npt.NDArray[np.floating], 
                             probsDynamic: npt.NDArray[np.floating],
                             threshold:    float = 0.01) -> float:
    """
    Calculate the percentage of points where the dynamic estimate was 
    closer to the actual match outcome than the static estimate.
    
    Parameters
    ----------
    probsStatic  : Static probability estimates (fixed priors)
    probsDynamic : Dynamic probability estimates (Bayesian updates)
    threshold    : Minimum difference to count as "better" (default 0.01)
    
    Returns
    -------
    Fraction of points where dynamic was better (0.0 to 1.0)
    """
    P1won = probsStatic[-1] > 0.5
    
    if P1won:
        # Higher estimates are better when P1 wins
        better = (probsDynamic - probsStatic) > threshold
    else:
        # Lower estimates are better when P1 loses
        better = (probsStatic - probsDynamic) > threshold
    
    return np.sum(better) / len(probsStatic)

## Monte Carlo Analysis

Let's simulate many matches to see how often the dynamic (Bayesian) approach
outperforms the static approach across a larger sample.

In [None]:
# Simulate N matches and track when dynamic beats static
N_MATCHES = 100

pctBetter = []
for _ in range(N_MATCHES):
    probsStatic, probsDynamic, _, _ = simulateMatchWinProbabilityEvolution(
        matchFormat, P1actual, P2actual, P1prior, alpha1, P2prior, alpha2
    )
    pctBetter.append(calcPercentDynamicBetter(probsStatic, probsDynamic))

# What fraction of matches had dynamic "winning" more than half the time?
dynamicWins = np.array(pctBetter) > 0.5
winRate = np.sum(dynamicWins) / len(dynamicWins)

print(f"Simulated {N_MATCHES} matches")
print(f"Dynamic approach was better in {winRate:.1%} of matches")
print(f"Average % of points where dynamic was better: {np.mean(pctBetter):.1%}")