In [None]:
import numpy as np
import math

# 

### Introduction

This is a report to understand and simulate factors that could possibly predict the time it takes a runner to complete a marathon. A marathon is about 42.2 km (about 26.2 miles). It has long been a challenge for humans to complete a marathon. The name Marathon comes from the legend of Philippides, the Greek messenger. As the story goes he was sent to Athens to announce the defeat of the Persians after the Battle of Marathon (a battle he actually fought in). He ran the distance from Marathon to Athens from Marathon without stopping, burst into the assembly, exclaimed "we have won!", then keeled over and died (a little morbid hostory for you). [https://en.wikipedia.org/wiki/Marathon#Origin]

Marathon races are acient but the distance wasn't set until the 20th Century, in Athens at the 1896 Olympics. This legend inspired the birth of this ludicrous endevour and people have been training for marathons ever since. [https://www.history.com/news/why-is-a-marathon-26-2-miles#:~:text=The%20idea%20for%20the%20modern,After%20making%20his%20announcement%2C%20the]

It is still very hard to get an accurate prediction of a runners finishing time because of the sheer number of variables involved. However, from my research, a few attemps have been made to study this and to predict finishing times. 

### Method

#### The different contributing factors

There are many factors to be taken into consideration when it comes to predict race day preformance. There are two main categories that are frequented when reading documentation on the subject.

##### Training measurements

The list is endless in terms of training measurements from number of runs per week to the VO2 max (the maximum amount of oxygen your body can utilize during exercise). Most studies tend to focus on the measurements that the average person could measure. These being the number of runs, the average mileage, the average pace and so on. It is also prudent to point out that VO2 could also be consider in our next category.


##### Anthropometry measurements

Again the list is comprehensive. Anthropometry refers to the measurement of the human individual. So length of limbs, diameter of muscles and like above VO2 max are some of many measurements to be considered.

Like I mentioned above there are not a lot of studies done on the subjuect. I am interested in the correlation between training and preformance as a goal of mine is to run a marathon and I have no control over my current anthropometry measurements (only by that of training can I change them) so I have decided to focus this analysis on the former group.

### Variables

#### Predictors

The training measurements that appear most often are mean weekly distance ran and mean training pace. They appear in a study on *Prediction of marathon performance time on the basis of training indices* [https://www.researchgate.net/publication/262686102_Prediction_of_marathon_performance_time_on_the_basis_of_training_indices], Predictor Variables for Marathon Race Time in Recreational Female Runners [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426727/], and *An empirical study of race times in recreational endurance runners* [https://bmcsportsscimedrehabil.biomedcentral.com/articles/10.1186/s13102-016-0052-y].

Giovanni Tandas study on the *Prediction of marathon performance time on the basis of training indices* also shows that 

#### Outcomes

The most interesting variable is the time taken to complete a marathon. This is also dpendent on the pace a runner has throughout the race. 

All variables are non-negative real numbers.

- Mean weekly distance is to one decimal place and is measured in kilometers per week (km/week).
- Mean training pace and race pace is to one decimal place and is measured in seconds per kilometer (sec/kilometer).
- Finishing times are whole numbers measured in minutes (min).

The assumption of normality was met, as assessed by a Q-Q plot in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7449326/#b19-ijes-13-6-1132%5D] for these (and other measurements).

There fore I can use the data from the tables in Giovanni Tandas study and the `numpy.normal()` function to create average training pace and kilometers ran per week.

![RUA Runner Data](imgs/rua_runner_data_2.png)

Table containing the standard deviatin and the mean of training pace and kilometers ran per week. [https://rua.ua.es/dspace/bitstream/10045/18930/1/jhse_Vol_VI_N_III_511-520.pdf]

In [None]:
#Set number of runners
no_of_runners = 100

# Set training pace data 
train_pace_mean = 284.6
train_pace_std = 18.1
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

# Set kilometers per week data 
km_per_week_mean = 65.9
km_per_week_std = 15.9
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

Tandas study lead to to a formula that show a corrilation between race day pace and the above parameters. Knowing that we can use the two random sets above to produce the paces of each runner

In [None]:
# Set empty list to hold set
paces = []

# loop over lists to produce each pace with the formula
for i in range(0, len(K)):
    pace = [17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]]
    paces.append(pace)

Then use the distance of a marathon to calculate the time it took them in hours.

In [None]:
# marathon in km
dist = 42.195

#times = (42.195*np.array(paces))/3600
times = (42.195*np.array(paces))/60

In [None]:
np.std(times), np.mean(times)

racepace_mean = 271.8
racepace_std = 17.7

In [107]:
pace

236.57061469603917

In [108]:
(42.195*pace)/3600

2.7728047464164924

In [194]:
no_of_runners = 1000

paces = []
train_pace_mean = 284.6
train_pace_std = 18.1
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

km_per_week_mean = 87
km_per_week_std = 15.2
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

std = 0
mean = 0

racepace_mean = 271.8
racepace_std = 17.7

#while () or ():
for i in range(0, len(K)):
    pace = [17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]]
    #print(pace)
    paces.append(pace)

np.std(paces), np.mean(paces)

(12.617093170816126, 262.07441740830785)

In [308]:
# https://rua.ua.es/dspace/bitstream/10045/18930/1/jhse_Vol_VI_N_III_511-520.pdf

no_of_runners = 1000

train_pace_mean = 284.6
train_pace_std = 18.1
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

km_per_week_mean = 65.9
km_per_week_std = 15.9
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

std = 0
mean = 0

racepace_mean = 271.8
racepace_std = 17.7

paces = []

#while () or ():
for i in range(0, len(K)):
    pace = [17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]]
    #print(pace)
    paces.append(pace)
    
# marathon in km
dist = 42.195

#times = (42.195*np.array(paces))/3600
times = (42.195*np.array(paces))/60

np.std(times), np.mean(times)

(9.09297075036257, 192.31237908742426)

In [299]:
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

print( K.min(), K.max())
print(P.min(), P.max())

10.489155090001034 120.04610579438587
226.5030567824415 336.81767457514474


In [224]:
K.max(), K.min()

(113.18422485834193, 23.447358767523752)

In [313]:
# file:///C:/Users/seanp/Downloads/2015OAJSM_Reprint.pdf
no_of_runners = 126

train_pace_mean = 330
train_pace_std = 41
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

km_per_week_mean = 44.7
km_per_week_std = 24.7
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

racepace_mean = 271.8
racepace_std = 17.7

paces = []

#while () or ():
for i in range(0, len(K)):
    pace = [17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]]
    #print(pace)
    paces.append(pace)
    
# marathon in km
dist = 42.195

#times = (42.195*np.array(paces))/3600
times = (42.195*np.array(paces))/60

# 32, 232
np.std(times), np.mean(times)

(18.947311426185795, 218.2373411978223)

In [317]:
m = 9.7 * 3600
m

9.7/3600

3600/9.7

371.13402061855675

In [145]:
no_of_runners = 1000

#km/hr to sec/km

m = 3600/9.7
std = 3600/41

train_pace_mean = m
train_pace_std = std
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

km_per_week_mean = 34.6
km_per_week_std = 12.0
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

std = 0
mean = 0

racepace_mean = 271.8
racepace_std = 17.7

paces = []

#while () or ():
for i in range(0, len(K)):
    pace = [17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]]
    #print(pace)
    paces.append(pace)
    
# marathon in km
dist = 42.195

#times = (42.195*np.array(paces))/3600
times = (42.195*np.array(paces))/60

# 32, 232
np.std(times), np.mean(times)

261.77863606643535

In [6]:
fts = np.random.lognormal(ft_mean, ft_std, 100)

In [13]:
racepace_mean = 271.8
racepace_std = 17.7

In [69]:
paces = np.random.normal(racepace_mean, racepace_std, 100)

In [70]:
(42.195*paces)/3600

array([3.28305444, 3.24542217, 3.44390015, 2.91448153, 3.16569567,
       2.86055087, 3.15966298, 3.02599841, 3.50599335, 3.10997415,
       2.92904666, 3.28212323, 3.06249459, 3.6432785 , 3.28314128,
       3.2171819 , 3.35654216, 2.98185151, 2.80288573, 3.09749473,
       3.13745304, 3.43337169, 3.3369216 , 3.46636932, 3.21268824,
       3.57320435, 3.32763882, 3.23934542, 3.48748005, 3.1823092 ,
       3.13600632, 3.18235551, 3.54423152, 3.19527646, 3.46422884,
       3.22939468, 3.254654  , 3.38216789, 2.99062351, 3.10935863,
       3.42988473, 3.48046137, 3.37086604, 3.13966477, 3.40915095,
       3.04799169, 3.2288212 , 3.44099476, 3.4901211 , 3.32036324,
       3.27948718, 3.05894806, 3.16486332, 3.23606999, 3.21180422,
       3.04097767, 2.94951919, 3.05285309, 3.16822116, 3.4604193 ,
       3.53972889, 3.16270406, 2.78228168, 3.04403624, 3.05070379,
       3.38110425, 3.24158262, 3.15266275, 3.39273902, 3.37087438,
       3.15144949, 3.21589985, 3.48132401, 3.17867078, 3.40998

In [65]:
pace = np.random.normal(racepace_mean, racepace_std)

In [66]:
secs = 42.195*pace

In [67]:
mins = secs/60

In [68]:
mins/60

3.066311602418703

In [71]:
191/60

3.183333333333333

In [None]:
# 200 runners



# target time (tart)
ncbi_tar_mean = 248
ncbi_tar_std = 46

# predicted_finishing_time (pft) mins
ncbi_pft_mean = 252
ncbi_pft_std = 40

# fft - tar difference
ncbi_tdiff_mean = 4
ncbi_tdiff_std = 29

# N
ncbi_N = [4.1, 7.2, 0, 46]
ncbi_N_mean = 4.1
ncbi_N_std = 7.2
ncbi_N_min = 0
ncbi_N_max = 46

# kg/m2
ncbi_BMI = [24.3, 3.3, 17.5, 36.9]
ncbi_BMI_mean = 24.3
ncbi_BMI_std = 3.3
ncbi_BMI_min = 17.5
ncbi_BMI_max = 36.9

# in years
ncbi_age = [40, 10, 19, 75]
ncbi_age_mean = 40
ncbi_age_std = 10
ncbi_age_min = 19
ncbi_age_max = 75

# in m
ncbi_height = [1.7, 0.1, 1.4, 1.9]
ncbi_height_mean = 1.7
ncbi_height_std = 0.1
ncbi_height_min = 1.4
ncbi_height_max = 1.9

# in kg 
ncbi_weight = [72, 11, 45, 108]
ncbi_weight_mean = 72
ncbi_weight_std = 11
ncbi_weight_min = 45
ncbi_weight_max = 108

# in min
ncbi_finishing_time = [259, 51, 169, 427]
ncbi_finishing_time_mean = 259
ncbi_finishing_time_std = 51
ncbi_finishing_time_min = 169
ncbi_finishing_time_max = 427

# pace variance
ncbi_pvar_mean = 4.7
ncbi_pvar_std = 4.6

Finishing times as log normal dist https://www.researchgate.net/figure/Distribution-of-Marathon-Finishing-Times-n-9-789-093_fig2_301571201

The assumption of normality was met, as assessed by a Q-Q plot.  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7449326/#b19-ijes-13-6-1132%5D

https://www.google.com/search?q=assumption+of+normality&rlz=1C1CHBF_enIE917IE917&oq=assumption+of+normality&aqs=chrome..69i57&sourceid=chrome&ie=UTF-8

[https://bmcsportsscimedrehabil.biomedcentral.com/articles/10.1186/s13102-016-0052-y]
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7449326/]
[https://rua.ua.es/dspace/bitstream/10045/18930/1/jhse_Vol_VI_N_III_511-520.pdf]
[https://www.google.com/search?safe=active&rlz=1C1CHBF_enIE917IE917&sxsrf=ALeKk03Ylj7ATv-5NpVukDg9tv1VloAY-g%3A1608134608390&ei=0C_aX9SXF8XoxgPuq7WoAw&q=what+affects+a+marathon+time+a+study&oq=what+affects+a+marathon+time+a+study&gs_lcp=CgZwc3ktYWIQAzIFCCEQoAE6BAgAEEc6CAghEBYQHRAeUKGrBVilwAVghcMFaABwAngAgAFxiAG4BZIBAzYuMpgBAKABAaoBB2d3cy13aXrIAQjAAQE&sclient=psy-ab&ved=0ahUKEwjUv9Wg8NLtAhVFtHEKHe5VDTUQ4dUDCA0&uact=5]

Marathon day:
Time they took, age, gender, BMI and race training