# 800m Calculator

This notebook contains the functions used to produce an 800m racing and training calculator.

## Table of Contents

#### [Helper Functions](#Helper-Functions)
* [Data Conversion Functions](#Data-Conversion-Functions)
* [Modeling Functions](#Modeling-Functions)
* [Prediction Functions](#Prediction-Functions)

#### [Training Predictors](#Training-Predictors)
* [600m x 3 Training](#600m-x-3-Training)
* [600m, 400m x 3 Training](#600m,-400m-x-3-Training)
* [600m, 300m x 4 Training](#600m,-300m-x-4-Training)
* [500m x 3 Training](#500m-x-3-Training)
* [300m x 5 Training (2 sets)](#300m-x-3-Training-(2-sets))
* [200m x 8 Training](#200m-x-8-Training)

## Imports

In [5]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import os
import pickle
import math

## Helper Functions

### Data Conversion Functions

These functions convert time strings to seconds, and then seconds back into a nicely formatted string once operations have been performed.

In [8]:
def convert_to_seconds(time_str: str) -> float:
    try:
        if not any(c.isdigit() for c in time_str):
            raise ValueError("Invalid input: Only numbers, colons, and periods are allowed. Please do not use words or letters.")
        if ":" not in time_str and "." not in time_str:
            return float(time_str)
        if ":" in time_str:
            parts = time_str.split(":")
            if len(parts) == 2:
                minutes = int(parts[0])
                if len(parts[1]) < 2:
                    parts[1] += "0"
                seconds = float(parts[1])
                return minutes * 60 + seconds
            elif len(parts) > 2:
                raise ValueError("Too many colons.")
        if "." in time_str:
            parts = time_str.split(".")
            if len(parts) == 2:
                if len(parts[0]) == 1:
                    minutes = int(parts[0])
                else:
                    return float(time_str)
                if len(parts[1]) < 2:
                    parts[1] += "0"
                seconds = int(parts[1])
                return minutes * 60 + seconds
            elif len(parts) == 3:
                minutes = int(parts[0])
                seconds = int(parts[1])
                fractional_seconds = float("0." + parts[2])
                return minutes * 60 + seconds + fractional_seconds
            else:
                raise ValueError("Too many dots.")
        raise ValueError("Could not parse input.")
    except ValueError:
        raise ValueError("Invalid input: Only numbers, colons, and periods are allowed. Please do not use words or letters.")

In [9]:
def seconds_to_minutes(seconds: float) -> str:
    total_seconds = round(seconds, 2)
    minutes = int(total_seconds // 60)
    remaining_seconds = total_seconds - minutes * 60
    if remaining_seconds >= 59.995:  # If rounding pushes it to 60
        minutes += 1
        remaining_seconds = 0.0
    if minutes == 0:
        return f"{remaining_seconds:05.2f}"
    else:
        return f"{minutes}:{remaining_seconds:05.2f}"

In [10]:
def is_valid_time_format(time_str):
    try:
        convert_to_seconds(time_str)
        return True
    except Exception:
        return False

### Modeling Functions

#### Generating Training Tables

Generates a training pace table based on specified inputs.

In [13]:
def generate_training_table(
    start_800,
    start_intervals,
    increments,
    num_rows,
    interval_names=None
):
    """
    Generalized training pace table generator.

    Args:
        start_800 (float): Starting 800m time (in seconds).
        start_intervals (list of floats): Starting paces for each interval type (in seconds).
        increments (list of floats): Amount each interval pace increases per 1s increment of 800m.
        num_rows (int): Number of data points to generate.
        interval_names (list of str, optional): Names for each interval column.

    Returns:
        pd.DataFrame: Training table with columns ['TARGET', interval_names...]
    """
    data = {
        "TARGET": [start_800 + i for i in range(num_rows)]
    }
    for idx, (start, inc) in enumerate(zip(start_intervals, increments)):
        col_name = interval_names[idx] if interval_names else f"interval_{idx+1}"
        data[col_name] = [start + i * inc for i in range(num_rows)]
    return pd.DataFrame(data)

#### Loading Model from Pickle File

Loads model from the external exported pickle file.

In [15]:
def load_model(filepath):
    with open(filepath, 'rb') as file:
        model = pickle.load(file)
    return model

#### General Linear Regression Model Function
Fits a LinearRegression model and saves it to disk.

In [17]:
def fit_and_export_model(
    df,
    feature_cols,
    target_col,
    export_path
):
    """
    Fits a LinearRegression model using specified feature columns,
    saves the model to disk, and returns the fitted model.

    Args:
        df (pd.DataFrame): Training data.
        feature_cols (list of str): Names of the feature columns.
        target_col (str): Name of the target column.
        export_path (str): Where to save the trained model (.pkl).

    Returns:
        model: The trained LinearRegression model.
    """
    from sklearn.linear_model import LinearRegression
    import pickle

    X = df[feature_cols]
    y = df[target_col]
    model = LinearRegression()
    model.fit(X, y)
    with open(export_path, 'wb') as f:
        pickle.dump(model, f)
    return model

### Prediction Functions

In [19]:
def predict_800m(model, feature_cols, input_values):
    processed = []
    for val in input_values:
        if isinstance(val, list):
            avg = sum(convert_to_seconds(x) for x in val) / len(val)
            processed.append(avg)
        else:
            processed.append(convert_to_seconds(val))
    X = pd.DataFrame([processed], columns=feature_cols)
    prediction = model.predict(X)[0]
    # Out-of-range handling
    if prediction < 96:
        raise ValueError("Predicted time is too fast to be realistic (less than 1:36). Please check your inputs.")
    if prediction > 240:
        raise ValueError("Predicted time is too slow (over 4:00). Please check your inputs.")
    return {
        "predicted_seconds": float(prediction),
        "predicted_formatted": seconds_to_minutes(prediction)
    }

In [20]:
def reverse_predict(df, target_col, goal_time, interval_cols, rounding=None):
    val = convert_to_seconds(goal_time)
    upper = math.ceil(val)
    frac, lower = math.modf(val)
    upper_row = df[df[target_col] == upper]
    lower_row = df[df[target_col] == lower]
    if upper_row.empty or lower_row.empty:
        raise ValueError("Goal time is out of range.")
    if rounding is None:
        rounding = [0.5] * len(interval_cols)
    elif isinstance(rounding, (float, int)):
        rounding = [rounding] * len(interval_cols)
    splits = []
    for idx, col in enumerate(interval_cols):
        interp = (
            upper_row[col].values[0] * frac +
            lower_row[col].values[0] * (1 - frac)
        )
        rounded = round(interp / rounding[idx]) * rounding[idx]
        splits.append({
            "interval": col,
            "seconds": float(rounded)
        })
    return splits

## Training Predictors

### 600m x 3 Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [24]:
table_600 = generate_training_table(
    start_800=96,
    start_intervals=[74, 72, 70],
    increments=[0.75, 0.75, 0.75],
    num_rows=144,
    interval_names=["First 600m", "Second 600m", "Third 600m"]
)

In [25]:
table_600.to_csv(os.getcwd() + "/tables/600.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 600m times. Output variable is the 800m race time.

Model is exported and saved as `model_600.pkl` in the working directory.

In [27]:
model_600 = fit_and_export_model(
    df=table_600,
    feature_cols=["First 600m", "Second 600m", "Third 600m"],
    target_col="TARGET",
    export_path="models/model_600.pkl"
)

#### Prediction Function

Takes input data of three 600m times, converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [29]:
prediction = predict_800m(
    model=model_600,
    feature_cols=["First 600m", "Second 600m", "Third 600m"],
    input_values=["1:24", "1:26", "1:28"]
)
print(prediction)

{'predicted_seconds': 114.66666666666661, 'predicted_formatted': '1:54.67'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [31]:
reverse_predict(
    df=table_600,
    target_col="TARGET",
    goal_time="1:54.34",
    interval_cols=["First 600m", "Second 600m", "Third 600m"],
    rounding=0.5
)

[{'interval': 'First 600m', 'seconds': 88.0},
 {'interval': 'Second 600m', 'seconds': 86.0},
 {'interval': 'Third 600m', 'seconds': 84.0}]

### 600m, 400m x 3 Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [34]:
table_600_400 = generate_training_table(
    start_800=96,
    start_intervals=[71.5, 50],                 # 600m, 400m
    increments=[0.75, 0.5],
    num_rows=144,
    interval_names=["600m", "3x400m average"]
)

In [35]:
table_600_400.to_csv(os.getcwd() + "/tables/600_400.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 600m time and average 3x400m split time. Output variable is the 800m race time.

Model is exported and saved as `model_600__400.pkl` in the working directory.

In [37]:
model_600_400 = fit_and_export_model(
    df=table_600_400,
    feature_cols=["600m", "3x400m average"],
    target_col="TARGET",
    export_path="models/model_600_400.pkl"
)

#### Prediction Function

Takes input data of four string times (one 600m time, and three 400m times), converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [39]:
prediction = predict_800m(
    model=model_600_400,
    feature_cols=["600m", "3x400m average"],
    input_values=[
        "1:22.43",                   # 600m time
        ["58.77", "56.02", "56.70"]  # List of 400m splits (will be averaged)
    ]
)
print(prediction)

{'predicted_seconds': 110.4974358974359, 'predicted_formatted': '1:50.50'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [41]:
reverse_predict(
    df=table_600_400,
    target_col="TARGET",
    goal_time="1:51.45",
    interval_cols=["600m", "3x400m average"],
    rounding=[0.5, 0.5]
)

[{'interval': '600m', 'seconds': 83.0},
 {'interval': '3x400m average', 'seconds': 57.5}]

### 600m, 300m x 4 Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [44]:
table_600_300 = generate_training_table(
    start_800=96,
    start_intervals=[71.25, 35.25],             # 600m, 300m
    increments=[0.75, 0.375],
    num_rows=144,
    interval_names=["600m", "4x300m average"]
)

In [45]:
table_600_300.to_csv(os.getcwd() + "/tables/600_300.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 600m time and average 4x300m split time. Output variable is the 800m race time.

Model is exported and saved as `model_600_300.pkl` in the working directory.

In [47]:
model_600_300 = fit_and_export_model(
    df=table_600_300,
    feature_cols=["600m", "4x300m average"],
    target_col="TARGET",
    export_path="models/model_600_300.pkl"
)

#### Prediction Function

Takes input data of four string times (one 600m time, and four 300m times), converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [49]:
prediction = predict_800m(
    model=model_600_300,
    feature_cols=["600m", "4x300m average"],
    input_values=[
        "1:24.43",                          # 600m time
        ["45.32", "41.23", "42.45", "43.56"] # List of 300m splits (will be averaged)
    ]
)
print(prediction)

{'predicted_seconds': 114.2666666666667, 'predicted_formatted': '1:54.27'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [51]:
reverse_predict(
    df=table_600_300,
    target_col="TARGET",
    goal_time="1:54.78",
    interval_cols=["600m", "4x300m average"],
    rounding=[0.5, 0.5]
)

[{'interval': '600m', 'seconds': 85.5},
 {'interval': '4x300m average', 'seconds': 42.5}]

### 500m x 3 Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [54]:
table_500 = generate_training_table(
    start_800=96,
    start_intervals=[60.2, 59.2, 58.7],
    increments=[0.6, 0.6, 0.6],
    num_rows=144,
    interval_names=["First 500m", "Second 500m", "Third 500m"]
)

In [55]:
table_500.to_csv(os.getcwd() + "/tables/500.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 500m times. Output variable is the 800m race time.

Model is exported and saved as `model_500.pkl` in the working directory.

In [57]:
model_500 = fit_and_export_model(
    df=table_500,
    feature_cols=["First 500m", "Second 500m", "Third 500m"],
    target_col="TARGET",
    export_path="models/model_500.pkl"
)

#### Prediction Function

Takes input data of three 500m times, converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [59]:
prediction = predict_800m(
    model=model_500,
    feature_cols=["First 500m", "Second 500m", "Third 500m"],
    input_values=["1:09", "1:07", "1:04"]
)
print(prediction)

{'predicted_seconds': 108.16666666666669, 'predicted_formatted': '1:48.17'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [61]:
reverse_predict(
    df=table_500,
    target_col="TARGET",
    goal_time="1:51.94",
    interval_cols=["First 500m", "Second 500m", "Third 500m"],
    rounding=0.5
)

[{'interval': 'First 500m', 'seconds': 70.0},
 {'interval': 'Second 500m', 'seconds': 69.0},
 {'interval': 'Third 500m', 'seconds': 68.5}]

### 300-400-500-400-300-200m Ladder Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [64]:
table_ladder = generate_training_table(
    start_800=96,
    start_intervals=[34.5, 48, 60, 48, 34.5, 22],                # First set, Second set (from your previous code: three, three-0.75)
    increments=[0.375, 0.5, 0.625, 0.5, 0.375, 0.25],
    num_rows=144,
    interval_names=["First 300m", "First 400m", "500m", "Second 400m", "Second 300m", "200m"]
)

In [65]:
table_ladder.to_csv(os.getcwd() + "/tables/ladder.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 200m, 300m, 400m, and 500m times. Output variable is the 800m race time.

Model is exported and saved as `model_ladder.pkl` in the working directory.

In [67]:
model_ladder = fit_and_export_model(
    df=table_ladder,
    feature_cols=["First 300m", "First 400m", "500m", "Second 400m", "Second 300m", "200m"],
    target_col="TARGET",
    export_path="models/model_ladder.pkl"
)

#### Prediction Function

Takes input data of two 300m times, two 400m times, one 500m time, and one 200m time, converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [69]:
prediction = predict_800m(
    model=model_ladder,
    feature_cols=["First 300m", "First 400m", "500m", "Second 400m", "Second 300m", "200m"],
    input_values=["39.45", "56.34", "1:09.34", "57.34", "41.34", "25.42"]
)
print(prediction)

{'predicted_seconds': 112.16506329113926, 'predicted_formatted': '1:52.17'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [71]:
reverse_predict(
    df=table_ladder,
    target_col="TARGET",
    goal_time="1:57.03",
    interval_cols=["First 300m", "First 400m", "500m", "Second 400m", "Second 300m", "200m"],
    rounding=0.5
)

[{'interval': 'First 300m', 'seconds': 42.5},
 {'interval': 'First 400m', 'seconds': 58.5},
 {'interval': '500m', 'seconds': 73.0},
 {'interval': 'Second 400m', 'seconds': 58.5},
 {'interval': 'Second 300m', 'seconds': 42.5},
 {'interval': '200m', 'seconds': 27.5}]

### 300m x 3 Training (2 sets)

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [74]:
table_300 = generate_training_table(
    start_800=96,
    start_intervals=[33.75, 33],                # First set, Second set (from your previous code: three, three-0.75)
    increments=[0.375, 0.375],
    num_rows=144,
    interval_names=["Set 1 3x300m average", "Set 2 3x300m average"]
)

In [75]:
table_300.to_csv(os.getcwd() + "/tables/300.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Inputs are the average 300m times for set 1 and set 2. Output variable is the 800m race time.

Model is exported and saved as `model_300.pkl` in the working directory.

In [77]:
model_300 = fit_and_export_model(
    df=table_300,
    feature_cols=["Set 1 3x300m average", "Set 2 3x300m average"],
    target_col="TARGET",
    export_path="models/model_300.pkl"
)

#### Prediction Function

Takes input data of six 300m times, converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [79]:
prediction = predict_800m(
    model=model_300,
    feature_cols=["Set 1 3x300m average", "Set 2 3x300m average"],
    input_values=[
        ["38", "38", "38"],   # First set of 3x300m
        ["37", "37", "37"]    # Second set of 3x300m
    ]
)
print(prediction)

{'predicted_seconds': 106.99999999999997, 'predicted_formatted': '1:47.00'}


#### Reverse Prediction Function

Takes input data in the form of an 800m goal time. Returns the training splits needed to reach the goal time.

In [81]:
reverse_predict(
    df=table_300,
    target_col="TARGET",
    goal_time="1:54.34",
    interval_cols=["Set 1 3x300m average", "Set 2 3x300m average"],
    rounding=[0.25, 0.25]
)

[{'interval': 'Set 1 3x300m average', 'seconds': 40.75},
 {'interval': 'Set 2 3x300m average', 'seconds': 40.0}]

### 200m x 8 Training

#### Data Generation

Generate a dataset to use for training targets and times in absence of real training data.

In [84]:
table_200 = generate_training_table(
    start_800=96,
    start_intervals=[22.25] * 8,
    increments=[0.25] * 8,
    num_rows=144,
    interval_names=["First 200m", "Second 200m", "Third 200m", "Fourth 200m", "Fifth 200m", "Sixth 200m", "Seventh 200m", "Eighth 200m"]
)

In [85]:
table_200.to_csv(os.getcwd() + "/tables/200.csv", index=False)

#### Model Fit and Export

Trains a sklearn LinearRegression model. Input is the 8x200m average. Output variable is the 800m race time.

Model is exported and saved as `model_200.pkl` in the working directory.

In [87]:
model_200 = fit_and_export_model(
    df=table_200,
    feature_cols=["First 200m", "Second 200m", "Third 200m", "Fourth 200m", "Fifth 200m", "Sixth 200m", "Seventh 200m", "Eighth 200m"],
    target_col="TARGET",
    export_path="models/model_200.pkl"
)

#### Prediction Function

Takes input data of eight 200m times, converts these to seconds, feeds these into our Linear Regression model as input, and outputs an 800m time prediction.

In [89]:
prediction = predict_800m(
    model=model_200,
    feature_cols=["First 200m", "Second 200m", "Third 200m", "Fourth 200m", "Fifth 200m", "Sixth 200m", "Seventh 200m", "Eighth 200m"],
    input_values=[
        ["26.43"], ["26.78"], ["27.10"], ["27.30"], ["26.78"], ["26.79"], ["26.44"], ["26.13"]
    ]
)
print(prediction)

{'predicted_seconds': 113.87500000000003, 'predicted_formatted': '1:53.88'}
