# Match Win Probability

This notebook explores the probability of winning a tennis match as a function of the probability of winning points on serve for both players.

**Key Questions:**
- How do serve probabilities translate to match-winning probability?
- How does best-of-3 compare to best-of-5 format?
- How do probabilities compound from game to set to match level?

#### Reference
This notebook accompanies the blog post: https://medium.com/p/e6e48f3c04ae/edit

## Setup and Imports

In [None]:
from matplotlib        import pyplot as plt
from scipy.interpolate import RectBivariateSpline
import numpy as np
import os, sys

# add path to the 'tennis_lab' package if not in PYTHONPATH already 
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname('__file__'), '..'))
SRC_DIR = os.path.join(PROJECT_ROOT, 'src')
if SRC_DIR not in sys.path:
    sys.path.append(SRC_DIR)

from tennis_lab.core.game_score        import GameScore
from tennis_lab.core.match_score       import MatchScore
from tennis_lab.core.set_score         import SetScore
from tennis_lab.core.match_format      import MatchFormat
from tennis_lab.paths.match_path       import MatchPath
from tennis_lab.paths.game_probability import probabilityServerWinsGame
from tennis_lab.paths.set_probability  import probabilityP1WinsSet
from tennis_lab.paths.match_probability import probabilityP1WinsMatch, _loadCachedFunction

## All Possible Score Paths in a Match

A match consists of sets, and the `MatchPath` class generates all possible set-by-set score progressions from a given starting score.

Each path entry is a tuple of `(player1_sets, player2_sets)`.

The number of paths is relatively small compared to games or sets because matches have fewer sets. A best-of-5 match from 0-0 has only 20 possible paths.

**Match endings:**
- Best-of-3: First to 2 sets
- Best-of-5: First to 3 sets

In [None]:
# Generate all possible score paths for a match, in increments of one set.

MF         = MatchFormat(bestOfSets=5)
INIT_SCORE = MatchScore(0, 0, MF)

paths = MatchPath.generateAllPaths(INIT_SCORE)
for p in paths:
    print(p)

## Best-of-3 vs Best-of-5 Match Formats

The probability of winning a match depends on **both players' serve probabilities** and the **match format**.

Longer formats (best-of-5) reduce variance and favor the stronger player. The `probabilityP1WinsMatch` function calculates this by:
1. Computing set-winning probabilities from game/tiebreak probabilities
2. Combining these across all possible match paths

The plot compares best-of-3 and best-of-5 formats. Notice the best-of-5 curve is steeper - small serve advantages get amplified more over a longer match.

In [None]:
# Calculate the probability that Player1 wins the match, as a 
# function of the probability that he wins a point when serving.
# There are multiple such curves, for various values of 'p2', the 
# probability the Player2 wins a point when serving.
# Compare these probabilities for a best-of-3 vs best-of-5 match format.

INIT_SETS_P1 = 0
INIT_SETS_P2 = 0
INIT_SCORE3 = MatchScore(INIT_SETS_P1, INIT_SETS_P2, matchFormat=MatchFormat(bestOfSets=3))
INIT_SCORE5 = MatchScore(INIT_SETS_P1, INIT_SETS_P2, matchFormat=MatchFormat(bestOfSets=5))

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2s is the probability that Player2 wins the point when serving,
# it describes a family of curves (one for each entry in P2s)
P1s = np.linspace(0, 1, 100)
P2s = [0.50]

# plot curves
for p2 in P2s:
    Y3s = probabilityP1WinsMatch(INIT_SCORE3, 1, P1s, p2)
    Y5s = probabilityP1WinsMatch(INIT_SCORE5, 1, P1s, p2)
    plt.plot(P1s, Y3s, linewidth=0.8, label=f"p2={p2}, best-of-3")
    plt.plot(P1s, Y5s, linewidth=0.8, label=f"p2={p2}, best-of-5")
    plt.vlines(p2, 0, 1, linewidth=0.25, color="black")

plt.xlabel("probability of winning the point when serving for player 1")
plt.ylabel("probability of winning the match")
plt.title ("Probability of Player 1 Winning the Match")
plt.xticks(np.arange(0.0, 1.1, 0.1))
plt.yticks(np.arange(0.0, 1.1, 0.1))
plt.grid(linewidth=0.2)
plt.legend(fontsize=8)
plt.show()

## Probability Amplification: Game vs Set vs Match

This visualization shows how point-winning probabilities get **amplified** at each level of tennis scoring.

The three curves show the probability of Player 1 winning:
- **Game** (dotted): Probability of winning a single service game
- **Set** (dashed): Probability of winning a set
- **Match** (solid): Probability of winning a best-of-5 match

**Key insight:** The S-curve becomes progressively steeper at each level. A small serve advantage (e.g., 55% vs 50%) translates to a larger set advantage, which compounds further at the match level. This is why small differences in serve quality can lead to dominant match outcomes.

In [None]:
# Compare Player1's probability of winning a Game, Set and Match.
# This shows how probabilities compound at each level of tennis scoring.

MF = MatchFormat(bestOfSets=5)

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2 is the probability that Player2 wins the point when serving.
P1s = np.linspace(0, 1, 100)
P2  = 0.6
Gs  = [probabilityServerWinsGame(GameScore(0, 0, MF), 1, p1) for p1 in P1s]
Ss  = probabilityP1WinsSet(SetScore(0, 0, isFinalSet=False, matchFormat=MF), 1, P1s, P2)
Ms  = probabilityP1WinsMatch(MatchScore(0, 0, MF), 1, P1s, P2)

plt.plot(P1s, Gs, linewidth=0.8, color='black', linestyle=':' , label=f"game data")
plt.plot(P1s, Ss, linewidth=0.8, color='black', linestyle='--', label=f"set data      (p2={P2})")
plt.plot(P1s, Ms, linewidth=0.8, color='black', label=f"match data (p2={P2})")

plt.xlabel("probability of winning the point when serving for player 1")
plt.ylabel("probability of winning the game, set, match")
plt.title ("Probability of Player 1 Winning the Game, Set, Match")
plt.xticks(np.arange(0.0, 1.1, 0.1))
plt.yticks(np.arange(0.0, 1.1, 0.1))
plt.grid(linewidth=0.2)
plt.legend(fontsize=9)
plt.show()

## Validating Cached Probabilities

For performance, match-winning probabilities can be pre-computed and cached using `scripts/cache-prob-win-match.py`. The `_loadCachedFunction` loads these cached values as interpolated functions.

This cell validates that the cached probabilities match the directly computed values.  
- **Black lines** show the true, directly computed probabilities.  
- **Gray circles** show the cached (interpolated) values.

The two should **overlap exactly**, indicating that the caching and interpolation introduce no measurable error.

In [None]:
# Compare cached probabilities vs true values.
# This is a check that interpolated probabilities cached using 'cache-prob-win-match.py' are correct.

MF          = MatchFormat(bestOfSets=5)
INIT_SCORES = [MatchScore(2, 1, MF), 
               MatchScore(0, 0, MF), 
               MatchScore(1, 2, MF)]

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2 is the probability that Player2 wins the point when serving
P1s = np.linspace(0, 1, 30)
P2  = 0.5

# load cached data
Zs = {}
for initScore in INIT_SCORES:
    probWinMatchCachedFction = _loadCachedFunction(initScore)
    Zs[initScore] = [probWinMatchCachedFction(p1, P2) for p1 in P1s]

# calculate probabilities using the main function
Ys = {}
for initScore in INIT_SCORES:
    Ys[initScore] = probabilityP1WinsMatch(initScore, 1, P1s, P2)

# print both sets of values
for i, initScore in enumerate(INIT_SCORES):
    label_true  = 'true values' if i == 0 else None
    label_cache = 'cached data' if i == 0 else None    
    plt.plot   (P1s, Ys[initScore], color="black", linewidth=0.4, label=label_true)                           # true   values
    plt.scatter(P1s, Zs[initScore], marker='o', s=40, color="lightgrey", edgecolor="grey", label=label_true)  # cached values
plt.title ("Probability of Winning the Match - Cached vs True")
plt.xlabel("probability of winning the point when serving")
plt.ylabel("probability of winning the match")
plt.xticks(np.arange(0, 1.01, 0.1))
plt.yticks(np.arange(0, 1.01, 0.1))
plt.grid(linewidth=0.1, color='grey')
plt.legend()
plt.show()