# Set Win Probability

This notebook explores the probability of winning a tennis set as a function of the probability of winning points on serve for both players.

**Key Questions:**
- How do serve probabilities translate to set-winning probability?
- Does it matter who serves first in a set?
- How does the probability change during a set (from different game scores)?

#### Reference
This notebook accompanies the blog post: https://medium.com/p/13ae3ce1c078/edit

## Setup and Imports

In [None]:
from matplotlib        import pyplot as plt
from scipy.interpolate import RectBivariateSpline
import numpy as np
import os, sys

# add path to the 'tennis_lab' package if not in PYTHONPATH already 
PROJECT_ROOT = os.path.abspath(os.path.join(os.path.dirname('__file__'), '..'))
SRC_DIR = os.path.join(PROJECT_ROOT, 'src')
if SRC_DIR not in sys.path:
    sys.path.append(SRC_DIR)

from tennis_lab.core.game_score       import GameScore
from tennis_lab.core.set_score        import SetScore
from tennis_lab.core.tiebreak_score   import TiebreakScore
from tennis_lab.core.match_format     import MatchFormat
from tennis_lab.paths.set_path        import SetPath
from tennis_lab.paths.set_probability import probabilityP1WinsSet, _loadCachedFunction

## All Possible Score Paths in a Set

A set consists of games, and the `SetPath` class generates all possible game-by-game score progressions from a given starting score, until either on player wins the set or the set is tied at 6-6.

Each path entry is a tuple of `(player1_games, player2_games, next_server)`.

In [None]:
# Generate all possible score paths for a set, in increments of one game.
# Paths stop at 6-6 (tied) because the tiebreak is handled separately - 
# a tiebreak has its own set of paths calculated by TiebreakPath.

SET_LENGTH     = 6   # number of games in set
INIT_SCORE     = SetScore(0, 0, isFinalSet=False, matchFormat=MatchFormat(setLength=SET_LENGTH))
PLAYER_SERVING = 1

paths   = SetPath.generateAllPaths(INIT_SCORE, PLAYER_SERVING)
n_paths = len(paths)

# display score scenarios
if n_paths <= 10:
    for p in paths:
        print(p)
else:
    for p in paths[:5]:
        print(p)
    print(".....", n_paths-10, " more paths")
    for p in paths[-5:]:
        print(p)

## Set Win Probability vs Serve Probabilities

The probability of winning a set depends on **both players' serve probabilities**, similar to tiebreaks but with even greater amplification due to the longer format.

The `probabilityP1WinsSet` function calculates this probability by:
1. Computing game-winning probabilities from point probabilities
2. Computing tiebreak-winning probability (if needed)
3. Combining these across all possible set paths

The S-curve shape is steeper than for games or tiebreaks - small serve advantages get amplified significantly over a full set.

In [None]:
# Calculate the probability that Player1 wins the set, as a 
# function of the probability that he wins a point when serving.
# There are multiple such curves, for various values of 'p2', the 
# probability the Player2 wins a point when serving.

SET_LENGTH     = 6
INIT_SCORE     = SetScore(0, 0, isFinalSet=False, matchFormat=MatchFormat(setLength=SET_LENGTH))
PLAYER_SERVING = 1

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2s is the probability that Player2 wins the point when serving,
# it describes a family of curves (one for each entry in P2s)
P1s = np.linspace(0, 1, 30)
P2s = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

# plot a family of curves
for p2 in P2s:   
    Ys = probabilityP1WinsSet(INIT_SCORE, PLAYER_SERVING, P1s, p2)
    plt.plot(P1s, Ys, linewidth=0.8, label=f"p2={p2}")

plt.xlabel("probability of winning the point when serving for player 1")
plt.ylabel("probability of winning the set")
plt.title (f"Probability of Player 1 Winning the Set")
plt.grid(linewidth=0.2)
plt.legend(fontsize=8)
plt.show()

## Does It Matter Who Serves First?

Similar to tiebreaks, we can analyze whether serving first in a set provides an advantage.

This analysis compares the probability of Player 1 winning when:
- Player 1 serves next (marked with 'x')
- Player 2 serves next (marked with 'o')

**Key insight:** At an **even total game count** (e.g., 0-0, 2-4), the probabilities are identical regardless of who serves next. At an **odd total game count**, they differ slightly.

**Conclusion:** At the start of a set (0-0), it does not matter who serves first!

In [None]:
# Does it matter who serves first?
# For multiple starting score and probability parameters, we calculate the probability 
# that Player1 wins the set in two scenarios: when P1 and when P2 serves next.
# We note that this two probabilities are the same when the number of games played 
# so far is even, but differ when it is odd.
# CONCLUSION: it does not matter who serves first at the beginning of the set.

# Change the init score below to see how probabilities
# change, depending on where we are in the game.
SET_LENGTH = 6
INIT_SCORE = SetScore(0, 0, isFinalSet=False, matchFormat=MatchFormat(setLength=SET_LENGTH))

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2s is the probability that Player2 wins the point when serving,
# it describes a family of curves (one for each entry in P2s)
P1s = np.linspace(0, 1, 25)
P2s = [0.25, 0.5, 0.75]

# plot a family of curves
for p2 in P2s:   
    Y1s = probabilityP1WinsSet(INIT_SCORE, 1, P1s, p2)
    Y2s = probabilityP1WinsSet(INIT_SCORE, 2, P1s, p2)
    plt.scatter(P1s, Y2s, marker='o', s=90, alpha=0.3, label=f"p2={p2} P2 serves next")
    plt.scatter(P1s, Y1s, marker='x', s=15, alpha=0.5, color="black", label=f"p2={p2} P1 serves next")

plt.title (f"Probability of Player1 Winning the Set from {INIT_SCORE.games(1)}")
plt.xlabel("probability of winning the point when serving for Player1")
plt.ylabel("probability of winning the tiebreak")
plt.xticks(np.arange(0, 1.01, 0.1))
plt.yticks(np.arange(0, 1.01, 0.1))
plt.grid(linewidth=0.3)
plt.legend(fontsize=6)
plt.show()

## Validating Cached Probabilities

For performance, set-winning probabilities can be pre-computed and cached using `scripts/cache-prob-win-set.py`. The `_loadCachedFunction` loads these cached values as interpolated functions.

This cell validates that the cached probabilities match the directly computed values.  
- **Black lines** show the true, directly computed probabilities.  
- **Gray circles** show the cached (interpolated) values.

The two should **overlap exactly**, indicating that the caching and interpolation introduce no measurable error.

In [None]:
# Compare cached probabilities vs true values.
# This is a check that interpolated probabilities cached using 'cache-prob-win-set.py' are correct.

INIT_SCORES = [SetScore(6, 5, isFinalSet=False), 
               SetScore(5, 6, isFinalSet=False)]
PLAYER_SERVING = 1

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2 is the probability that Player2 wins the point when serving
P1s = np.linspace(0, 1, 30)
P2  = 0.5

# load cached data
Zs = {}
for initScore in INIT_SCORES:
    probWinSetCachedFction = _loadCachedFunction(initScore, PLAYER_SERVING)
    Zs[initScore] = [probWinSetCachedFction(p1, P2) for p1 in P1s]

# calculate probabilities 'from scratch'
Ys = {}
for initScore in INIT_SCORES:
    Ys[initScore] = probabilityP1WinsSet(initScore, PLAYER_SERVING, P1s, P2)

# print both sets of values
for i, initScore in enumerate(INIT_SCORES):
    label_true  = 'true values' if i == 0 else None
    label_cache = 'cached data' if i == 0 else None
    plt.plot   (P1s, Ys[initScore], color="black", linewidth=0.4, label=label_true)                            # true   values
    plt.scatter(P1s, Zs[initScore], marker='o', s=40, color="lightgrey", edgecolor="grey", label=label_cache)  # cached values
plt.title ("Probability of Winning the Set - Cached vs True")
plt.xlabel("probability of winning the point when serving")
plt.ylabel("probability of winning the game when serving")
plt.xticks(np.arange(0, 1.01, 0.1))
plt.yticks(np.arange(0, 1.01, 0.1))
plt.grid(linewidth=0.1, color='grey')
plt.legend()
plt.show()

## Probability from Arbitrary In-Game Scores

The `probabilityP1WinsSet` function can also calculate probabilities starting from **within a game** - not just at game boundaries.

This is useful for live match analysis. For example, we can compare the set-winning probability when:
- Starting at 0-0, 0-0 (beginning of set)
- Starting at 0-0, 15-0 (one point ahead in first game)
- Starting at 0-0, 30-0 (two points ahead in first game)

The curves show how even small leads within the first game affect overall set-winning probability.

In [None]:
# Calculate the probability of winning the set starting from an arbitrary score.

# compare probabilities for multiple scores
INIT_SCORES    = [SetScore(0, 0, isFinalSet=False, gameScore=GameScore(0, 0)), 
                  SetScore(0, 0, isFinalSet=False, gameScore=GameScore(1, 0)),
                  SetScore(0, 0, isFinalSet=False, gameScore=GameScore(2, 0))]
PLAYER_SERVING = 1

# P1s is the probability that Player1 wins the point when serving, 
# it is the horizontal axis of the graph.
# P2 is the probability that Player2 wins the point when serving.
P1s = np.linspace(0, 1, 100)
P2  = 0.5

for score in INIT_SCORES:
    Ys = probabilityP1WinsSet(score, PLAYER_SERVING, P1s, P2)
    plt.plot(P1s, Ys, linewidth=0.8, label=f"initScore: {score.games(1)} {score.currGameScore.asPoints(1)}")

plt.xlabel("probability of winning the point when serving for player 1")
plt.ylabel("probability of winning the set")
plt.title ("Probability of Player 1 Winning the Set")
plt.grid(linewidth=0.2)
plt.legend(fontsize=8)
plt.show()