# Model Evaluation

While have seen that our model has fairly strong performance metrics, what we really care about is how profitable it is. Theredore we can look at the predicted probabilities for our test set and see how they compare to Bet365 odds. 

### Import Libraries

In [None]:
import pandas as pd
import pickle
import warnings

# Ignore PerformanceWarning and UserWarning
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=pd.errors.PerformanceWarning)

### Load Model

In [None]:
# Load model
with open('rf_model.pkl', 'rb') as file:
    rf = pickle.load(file)

### Load Datasets

In [None]:
# Load datasets
X_test = pd.read_csv('train_test_datasets\X_test.csv')
odds_win_test = pd.read_csv('train_test_datasets\odds_win_test.csv', index=False)

### Function to add model predictions and probabilities to odds dataset

First we create a function to add predicted values and probabilities to our odds_win_test dataframe:

In [305]:
def add_preds_probs(odds_win_df, X_test, model):
    """
    Adds model predictions and converted probabilities to the odds DataFrame.

    This function:
    - Uses the provided model to predict win probabilities and outcomes on the test set.
    - Converts the predicted probabilities into an odds-style format (1/probability of outcome).
    - Appends both the predicted class and model-derived odds to the given `odds_win_df`.

    Parameters:
        odds_win_df (pandas.DataFrame): A DataFrame containing odds and outcome data for test matches.
        X_test (pandas.DataFrame): The feature set used for testing.
        model (fitted model): A trained classification model with `predict` and `predict_proba` methods.

    Returns:
        pandas.DataFrame: The input DataFrame updated with two new columns:
                          - 'model_odds': Model-derived odds based on predicted win probabilities.
                          - 'pred_win': Predicted binary outcome (win/loss).
    """
    # Make predictions on the test data
    y_pred = model.predict(X_test)

    # Get probabilities
    y_prob = model.predict_proba(X_test)

    # Convert probabilities to odds format
    y_prob = 1 / y_prob

    # Get only probability of a win
    y_prob = [prob[1] for prob in y_prob]

    # Add to odds_win_df
    odds_win_df['model_odds'] = y_prob
    odds_win_df['pred_win'] = y_pred

    return odds_win_df


In [306]:
# Run the function
odds_win_df = add_preds_probs(odds_win_test, X_test, rf)
odds_win_df.head(5)

Unnamed: 0,odds,won,model_odds,pred_win
25,3.5,1,3.335092,0
27,2.62,1,2.473532,0
33,5.5,1,3.220135,0
147,2.5,0,2.950401,0
259,1.11,1,1.631039,1


We can now define functions for calculating profit/loss in different scenarios.

### Function to calculate profit from betting on bookmaker favourite

As a benchmark, we can see how much profit or loss would have been achieved should we have simply bet on the favourite. Here, we take the favourite as odds lower than 1.90 rather than 2.00 to account for Bet365's profit margin, since they offer 1.90 (i.e. 52.6% chance) on both sides of a market with equal chances of happening. 

This function will therefore show the hypothetical profit or loss, and the total money staked, should we have bet a £1 stake on all players with bookmaker odds of lower than 1.90:

In [307]:
def returns_from_fav(odds_win_df):
    """
    Calculates and prints the total return from betting £1 on each match where the 
    bookmaker's odds are below 1.90 (i.e. the favourite).

    This function:
    - Filters the matches where the odds suggest a strong favourite (odds < 1.90).
    - Assumes a flat £1 stake on each qualifying match.
    - Calculates profit or loss per match based on the actual outcome.
    - Outputs the total return and total amount staked.

    Parameters:
        odds_win_df (pandas.DataFrame): A DataFrame containing at least the columns 'odds' and 'won'.

    Returns:
        None
    """
    # Filter the DataFrame for rows where the odds are less than 1.90
    filtered_df = odds_win_df[odds_win_df['odds'] < 1.90].copy()

    # Set the stake as £1
    filtered_df.loc[:, 'stake'] = 1

    # Calculate the returns for each bet
    returns = filtered_df.apply(
        lambda row: (row['odds'] * row['stake']) - row['stake'] if row['won'] else -row['stake'],
        axis=1
    )

    # Calculate the total return and total staked
    total_return = returns.sum()
    total_staked = filtered_df['stake'].sum()

    # Print the total return and total staked
    print(f"Total return from betting on favourite: £{total_return:.2f}")
    print(f"Total staked: £{total_staked:.2f}")


In [308]:
returns_from_fav(odds_win_df)

Total return from betting on favourite: £-345.29
Total staked: £6356.00


### Function to calculate profit/loss from betting on model win predictions

We can now build a function to calculate the hypothetical profit or loss, and the money staked, should we have bet a £1 stake on every match which the model predicted the player to win:

In [311]:
def returns_from_model_pred_win(odds_win_df):
    """
    Calculates the return from betting £1 on each match where the model predicted a win.

    This function:
    - Filters the dataset to include only rows where the model predicted a win ('pred_win' == 1).
    - Assumes a flat £1 stake on each bet.
    - Computes net profit for each prediction based on actual match outcomes and bookmaker odds.
    - Prints the total net profit and total amount staked.
    - Returns both values for further analysis.

    Parameters:
        odds_win_df (pandas.DataFrame): A DataFrame containing at least the columns:
                                        'pred_win', 'won', and 'odds'.

    Returns:
        tuple:
            total_net_profit (float): The overall profit/loss from the strategy.
            total_staked (float): The total amount staked across all bets.
    """
    # Filter the DataFrame for rows where the model predicted a win
    filtered_df = odds_win_df[odds_win_df['pred_win'] == 1].copy()

    # Set the stake as £1
    filtered_df.loc[:, 'stake'] = 1

    # Calculate the net profit (or loss) for each bet
    filtered_df['net_profit'] = filtered_df.apply(
        lambda row: (row['odds'] * row['stake']) - row['stake'] if row['won'] else -row['stake'],
        axis=1
    )

    # Calculate the total net profit and total staked
    total_net_profit = filtered_df['net_profit'].sum()
    total_staked = filtered_df['stake'].sum()

    # Print the total net profit and total staked
    print(f"Total net profit from betting on player model predicted to win: £{total_net_profit:.2f}")
    print(f"Total staked: £{total_staked:.2f}")


In [312]:
returns_from_model_pred_win(odds_win_df)

Total net profit from betting on player model predicted to win: £133.59
Total staked: £6722.00


### Function to calculate profit/loss from betting on model odds lower than bookmakers

We can now build a function to calculate the hypothetical profit or loss, and the money staked, should we have bet a £1 stake on every match which the model had lower odds than the bookmaker, i.e. where the model is predicting the player to win with more certainty than the bookmaker:

In [313]:
def returns_from_lower_odds(odds_win_df):
    """
    Calculates the return from betting £1 on matches where the model's odds imply a 
    higher chance of winning than the bookmaker's odds.

    This strategy assumes a bet is placed when the model's implied probability 
    (converted to odds) suggests better value than the bookmaker.

    The function:
    - Filters for cases where model odds are lower than bookmaker odds.
    - Applies a flat £1 stake per bet.
    - Computes the return or loss for each bet based on actual outcomes.
    - Prints the total return and amount staked.

    Parameters:
        odds_win_df (pandas.DataFrame): DataFrame containing at least the following columns:
                                        'model_odds', 'odds', and 'won'.

    Returns:
        None
    """
    # Filter for value bets based on lower model odds
    filtered_df = odds_win_df[odds_win_df['model_odds'] < odds_win_df['odds']].copy()

    # Set a flat stake of £1 per bet
    filtered_df.loc[:, 'stake'] = 1

    # Calculate returns for each bet
    returns = filtered_df.apply(
        lambda row: (row['odds'] * row['stake']) - row['stake'] if row['won'] else -row['stake'],
        axis=1
    )

    # Aggregate total return and total amount staked
    total_return = returns.sum()
    total_staked = filtered_df['stake'].sum()

    # Print summary
    print(f"Total returns from betting on players where model odds are lower than bookmakers: £{total_return:.2f}")
    print(f"Total staked: £{total_staked:.2f}")


In [314]:
returns_from_lower_odds(odds_win_df)

Total returns from betting on players where model odds are lower than bookmakers: £26.58
Total staked: £5317.00


### Function to calculate profit/loss from betting on model win predictions with odds lower than bookmakers

We can now build a function to calculate the hypothetical profit or loss, and the money staked, should we have bet a £1 stake on every match which the model predicted the player to win and had lower odds than the bookmaker, i.e. where the model is predicting the player to win, and to win with more certainty than the bookmaker:

In [315]:
def returns_lower_odds_and_pred_win(odds_win_df):
    """
    Calculates the return from betting £1 only on matches where:
    - The model predicted a win (`pred_win` == 1), and
    - The model's odds implied a higher probability (i.e. were lower) than the bookmaker's odds.

    This strategy combines confidence (prediction of a win) with value (model odds offering 
    better value than the bookmaker), simulating a more selective betting approach.

    The function:
    - Filters the test data accordingly.
    - Applies a flat £1 stake per qualifying match.
    - Calculates the return for each bet based on the match outcome.
    - Prints the total return and total amount staked.

    Parameters:
        odds_win_df (pandas.DataFrame): DataFrame including 'pred_win', 'model_odds',
                                        'odds', and 'won' columns.

    Returns:
        None
    """
    # Filter for predicted wins where model odds < bookmaker odds
    filtered_df = odds_win_df[
        (odds_win_df['pred_win'] == 1) &
        (odds_win_df['model_odds'] < odds_win_df['odds'])
    ].copy()

    # Flat £1 stake per qualifying match
    filtered_df.loc[:, 'stake'] = 1

    # Calculate returns
    returns = filtered_df.apply(
        lambda row: (row['odds'] * row['stake']) - row['stake'] if row['won'] else -row['stake'],
        axis=1
    )

    # Total return and total staked
    total_return = returns.sum()
    total_staked = filtered_df['stake'].sum()

    # Print summary
    print(f"Total returns from betting on players where model odds are lower than B365 and model predicted to win: £{total_return:.2f}")
    print(f"Total staked: £{total_staked:.2f}")


In [316]:
returns_lower_odds_and_pred_win(odds_win_df)

Total returns from betting on players where model odds are lower than B365 and model predicted to win: £316.53
Total staked: £2951.00


### Function to calculate profit/loss from betting on model lose predictions with odds lower than bookmakers

We can now build a function to calculate the hypothetical profit or loss, and the money staked, should we have bet a £1 stake on every match which the model predicted the player to lose and had lower odds than the bookmaker, i.e. where the model is predicting the player to lose, but still believes they have more chance of winning than the bookmaker does:

In [317]:
def returns_lower_odds_and_pred_lose(odds_win_df):
    """
    Calculates the return from betting £1 on matches where:
    - The model predicted the player would lose (`pred_win` == 0), and
    - The model's odds were lower than the bookmaker's odds, implying better value.

    This strategy explores cases where the model disagrees with the implied probability of the bookmaker,
    suggesting possible underestimation of the player's chances.

    The function:
    - Filters the dataset accordingly.
    - Applies a flat £1 stake to each selected match.
    - Calculates the return or loss for each based on actual outcomes.
    - Prints the total return and total staked.

    Parameters:
        odds_win_df (pandas.DataFrame): A DataFrame containing 'pred_win', 'model_odds',
                                        'odds', and 'won' columns.

    Returns:
        None
    """
    # Filter for predicted losses where model odds < bookmaker odds
    filtered_df = odds_win_df[
        (odds_win_df['pred_win'] == 0) &
        (odds_win_df['model_odds'] < odds_win_df['odds'])
    ].copy()

    # Flat £1 stake per selected bet
    filtered_df.loc[:, 'stake'] = 1

    # Calculate returns for each bet
    returns = filtered_df.apply(
        lambda row: (row['odds'] * row['stake']) - row['stake'] if row['won'] else -row['stake'],
        axis=1
    )

    # Total return and total amount staked
    total_return = returns.sum()
    total_staked = filtered_df['stake'].sum()

    # Print summary
    print(f"Total returns from betting on players where model odds are lower than B365 and model predicted to lose: £{total_return:.2f}")
    print(f"Total staked: £{total_staked:.2f}")


In [318]:
returns_lower_odds_and_pred_lose(odds_win_df)

Total returns from betting on players where model odds are lower than B365 and model predicted to lose: £-289.95
Total staked: £2366.00


## Evaluating on 2023-2025 Results

We can now see how profitable the model would have been if we had built it in 2022 and had run it on all games from 2023 to March 2025. Since this data was completely removed from the model building process, and occurred after all the data in the model training and testing sets, we can assume that this will provide an accurate representation on how well the model will perform on future data.

First we can load the previously saved evaluation dataset as well as the training dataset:

In [None]:
# Load datasets
player_match_df_23_25 = pd.read_csv('datasets\player_match_df_23_25.csv')
X_train = pd.read_csv('train_test_datasets\X_train.csv')

### Function to prepare evaluation dataset for analysis

We will now build a function to filter this evaluation dataset to only include the columns that were used in the model training, and then generate predicitions and probabilities on the evaluation dataset using the final selected model:

In [325]:
def prepare_odds_predictions(df, model, X_train):
    """
    Prepares a test set for prediction and adds model-based predictions and odds to the original DataFrame.

    This function:
    - Extracts the feature columns used during training from the provided X_train.
    - Selects these columns from the input DataFrame, along with 'odds', 'won', and 'year'.
    - Uses the trained model to predict outcomes and model-implied odds.
    - Appends predictions to the original DataFrame using `add_preds_probs`.

    Parameters:
        df (pandas.DataFrame): The full DataFrame to generate predictions on (e.g. test matches).
        model (fitted model): A trained classification model with `predict` and `predict_proba` methods.
        X_train (pandas.DataFrame): The training DataFrame, used to extract the list of feature columns.

    Returns:
        pandas.DataFrame: The original input DataFrame with added 'model_odds' and 'pred_win' columns.
    """
    # Get feature column names from X_train
    feature_columns = X_train.columns.tolist()

    # Subset input DataFrame to include only features and reference columns, then make a copy
    df_subset = df[feature_columns + ['odds', 'won', 'year']].copy()

    # Create feature matrix for prediction
    X_test = df_subset.drop(columns=['odds', 'won', 'year'])

    # Add model predictions and model-derived odds
    df_with_preds = add_preds_probs(df_subset, X_test, model)

    return df_with_preds


In [None]:
# Run the function
odds_win_df_23_25 = prepare_odds_predictions(player_match_df_23_25, rf, X_train)

### Run profit/loss functions on evaluation dataset

We can now run the previously defined profit/loss functions to show how profitable the model would have been on our evaluation dataset for the different scenarios:

In [None]:
returns_from_fav(odds_win_df_23_25)

Total return from betting on favourite: £-272.50
Total staked: £5960.00


In [None]:
returns_from_model_pred_win(odds_win_df_23_25)

Total net profit from betting on player model predicted to win: £-78.80
Total staked: £6183.00


In [None]:
returns_from_lower_odds(odds_win_df_23_25)

Total returns from betting on players where model odds are lower than bookmakers: £-115.68
Total staked: £4764.00


In [None]:
returns_lower_odds_and_pred_win(odds_win_df_23_25)

Total returns from betting on players where model odds are lower than B365 and model predicted to win: £80.12
Total staked: £2845.00


In [None]:
returns_lower_odds_and_pred_lose(odds_win_df_23_25)

Total returns from betting on players where model odds are lower than B365 and model predicted to lose: £-195.80
Total staked: £1919.00


## Summary

We can see that our model would have been marginally (2.81%) profitable had we bet on every game where the model predicted the player to win and the model’s odds were lower than the bookmaker's. While this is encouraging and suggests that the model may be identifying genuine value, it's important to interpret these results with caution.

Sports betting outcomes are inherently noisy and prone to short-term fluctuations. A 2.81% edge, while statistically promising, can easily be wiped out by variance, particularly over smaller sample sizes or streaky outcomes. Additionally, betting markets evolve — the model's performance may degrade over time as bookmakers adjust, market efficiency improves, or player dynamics shift.

It's also worth noting that a model showing marginal profitability under simulated conditions doesn’t guarantee consistent success in live betting environments. Factors like odds movement (line changes), delays in placing bets, limited market liquidity, and bet restrictions can all affect real-world returns.

Nonetheless, this initial performance provides a strong foundation. With further refinement — such as improved feature engineering, dynamic model calibration, or integration of live odds tracking — the model could potentially deliver more robust and sustainable profits over time.