In [1]:
import numpy as np
import math

# Simulating marathon training data

### Introduction

This is a report to understand and simulate factors that could possibly predict the time it takes a runner to complete a marathon. A marathon is about 42.2 km (about 26.2 miles). It has long been a challenge for humans to complete a marathon. The name Marathon comes from the legend of Philippides, the Greek messenger. As the story goes, he was sent to Athens to announce the defeat of the Persians after the Battle of Marathon (a battle he actually fought in). He ran the distance from Marathon to Athens from Marathon without stopping, burst into the assembly, exclaimed "we have won!", then keeled over and died (a little morbid history for you). [1]

Marathon races are ancient but the distance wasn't set until the 20th Century, in Athens at the 1896 Olympics. This legend inspired the birth of this ludicrous endeavour and people have been training for marathons ever since. [2]

It is still very hard to get an accurate prediction of a runners finishing time because of the sheer number of variables involved. However, from my research, a few attempts have been made to study this and to predict finishing times. 

### Method

#### The different contributing factors

There are many factors to be taken into consideration when it comes to predict race day performance. There are two main categories that are frequented when reading documentation on the subject.

##### Training measurements

The list is endless in terms of training measurements from number of runs per week to the VO2 max (the maximum amount of oxygen your body can utilize during exercise). Most studies tend to focus on the measurements that the average person could measure. These being the number of runs, the average mileage, the average pace and so on. It is also prudent to point out that VO2 could also be consider in our next category.


##### Anthropometry measurements

Again, the list is comprehensive. Anthropometry refers to the measurement of the human individual. So, length of limbs, diameter of muscles and like above VO2 max are some of many measurements to be considered.

Like I mentioned above there are not a lot of studies done on the subject. I am interested in the correlation between training and performance as a goal of mine is to run a marathon and I have no control over my current anthropometry measurements (only by that of training can I change them) so I have decided to focus this analysis on the former group.

### Variables

#### Predictors

The training measurements that appear most often are mean weekly distance ran and mean training pace. They appear in a study on *Prediction of marathon performance time on the basis of training indices* [3], Predictor Variables for Marathon Race Time in Recreational Female Runners [4], and *An empirical study of race times in recreational endurance runners* [5].

Giovanni Tandas study on the *Prediction of marathon performance time on the basis of training indices* also shows that 

#### Outcomes

The most interesting variable is the time taken to complete a marathon. This is also dependent on the pace a runner has throughout the race. 

All variables are non-negative real numbers.

- Mean weekly distance is to one decimal place and is measured in kilometres per week (km/week).
- Mean training pace and race pace is to one decimal place and is measured in seconds per kilometre (sec/kilometer).
- Finishing times are whole numbers measured in minutes (min).

The assumption of normality was met, as assessed by a Q-Q plot [6] for these (and other measurements).

Therefore I can use the data from the tables in Giovanni Tandas [7] study and the `numpy.normal()` function to create average training pace and kilometres ran per week. Shown below in green is the standard deviation and the mean of both.

![RUA Runner Data](imgs/rua_runner_data_2.png)

In [2]:
#Set number of runners (the number of instances looked for)
no_of_runners = 100

# Set training pace data 
train_pace_mean = 284.6
train_pace_std = 18.1
P = np.random.normal(train_pace_mean, train_pace_std, no_of_runners)

# Set kilometers per week data 
km_per_week_mean = 65.9
km_per_week_std = 15.9
K = np.random.normal(km_per_week_mean, km_per_week_std, no_of_runners)

Tandas study lead to a formula that show a correlation between race day pace and the above parameters. Knowing that we can use the two random sets above to produce the paces of each runner: 

$$ P_m (sec/km) = 17.1 + 140.0\exp{[-0.0053 \cdot K(km/week)]} + 0.55 P (sec/km) $$

Where:
- P<sub>m</sub> = Pace for the marathon.
- K = Average kilometres per week.
- P = Average pace during training.

In [3]:
# Set empty list to hold set
paces = []

# loop over lists to produce each pace with the formula
for i in range(0, len(K)):
    pace = 17.1 + 140.0*np.exp(-0.0053*K[i]) + 0.55 * P[i]
    paces.append(pace)

Then use the distance of a marathon to calculate the time it took them in hours.

In [4]:
# marathon in km
dist = 42.195

#times = (42.195*np.array(paces))/3600
times = (42.195*np.array(paces))/60

Lastly display all the data and compare the standard deviation and mean of the finishing times.

In [5]:
print("Training pace: ", P)

Training pace:  [276.73096996 296.49017434 299.82580707 289.9653323  287.89326155
 267.20817996 309.78219931 305.2132722  277.07211851 312.67527518
 306.0406637  314.62406223 271.13797168 288.41575292 298.85233649
 267.94757115 285.82655419 304.78126848 279.11618488 298.73774027
 278.63504465 261.58542649 284.11886141 333.93229813 246.98790731
 292.99791574 276.01951954 260.47469207 256.68912881 275.10000132
 321.84993076 293.69826031 274.43630652 332.24153705 275.16439154
 270.4701763  276.29212274 298.97559073 261.66651444 301.68861478
 295.16585588 248.75283759 297.88482745 283.32998765 281.81195236
 286.6849828  293.14414272 325.80662454 267.17820658 283.61006077
 293.16411993 304.3289055  275.92429694 277.69135559 303.37783246
 285.22099221 284.96836719 266.16592014 302.3888691  285.53136631
 293.3707696  291.31207629 282.32185465 295.25016239 265.71061276
 273.54557323 276.52831502 277.25198314 291.14311131 316.30400469
 278.8069189  277.88457198 308.97573701 283.40983535 273.108

In [6]:
print("Kilometers per week: ", K)

Kilometers per week:  [83.47401728 67.73177918 58.91555554 58.5447355  51.57094848 28.95849342
 65.7857443  75.06503688 36.72264746 76.12548189 74.46156385 68.20728953
 54.24624354 74.72954369 93.17293219 60.64504929 40.280078   69.12598056
 92.59184729 86.99689797 53.04964947 69.51759977 41.83100808 69.82943393
 85.94197926 75.41576383 81.88096319 79.10253572 54.14745104 44.84681337
 43.73188154 74.85927831 48.10323054 55.89299395 61.66703681 62.32581892
 50.16301953 81.45460276 73.93493025 70.10799454 81.86766261 93.03814545
 55.87316243 32.39925861 77.31826736 77.53509223 66.12157681 59.44274142
 25.41212491 88.83916325 49.8855618  94.6504365  68.00035372 88.14648093
 80.04793828 48.37273426 87.46919654 89.4471776  56.11285639 62.11179983
 81.5771487  83.12320929 80.67277764 58.77076985 95.0876438  69.74782399
 70.25361505 76.5479932  66.27969347 55.55511932 48.43565792 83.60794637
 64.72836556 42.61338345 79.64647198 52.82356536 16.00552737 68.95210987
 62.17510922 82.15351598 72.9

In [7]:
print("Race day average pace: ", paces)

Race day average pace:  [259.2498900524025, 277.94411408494904, 284.4557412586908, 279.23403088910334, 281.9595506347464, 284.1450057350851, 286.2683911878078, 279.0145884177199, 284.72914439479916, 282.5915931962969, 279.77093681186926, 287.67165085167005, 271.24446914687746, 269.94332836743996, 266.9097682172867, 265.9878990951398, 287.3916698522428, 281.7843965556333, 256.3184273084195, 269.6897547308656, 276.0359981563629, 257.82544740549645, 285.52668117740643, 297.4562872547369, 241.7223309962124, 272.12148467848914, 259.62125397816027, 252.41725515545718, 263.35260772133574, 278.7877993022407, 305.1544587338683, 272.78394831054754, 276.53401510913136, 303.93883952708404, 269.40876798389877, 266.47502877053284, 276.37673897718986, 272.45230439612885, 255.62886435436866, 279.57961120330253, 270.15813365682163, 239.41610208192944, 285.053592097051, 290.8420509681152, 265.027417985312, 267.6008525574397, 276.9417821550236, 298.4593320642964, 286.40686849463066, 260.51172017697945, 2

In [8]:
print("Race day times: ", times)

Race day times:  [182.31748518 195.46419823 200.04350004 196.37133222 198.28805398
 199.82497528 201.3182461  196.2170093  200.2357708  198.73253792
 196.74891131 202.30508846 190.75267293 189.83764567 187.7042945
 187.05599004 202.10819182 198.16487688 180.255934   189.65932001
 194.1223157  181.31574589 200.79663854 209.18613401 169.99122927
 191.3694341  182.57864686 177.51243469 185.20272138 196.05751986
 214.5998731  191.83531165 194.47254613 213.7449889  189.46171608
 187.39856398 194.36194169 191.60208307 179.77099886 196.61436158
 189.98870749 168.36937379 200.46393864 204.53467234 186.3805317
 188.19029956 194.7593083  209.89152527 201.41563027 183.20486721
 200.99888157 189.35387769 187.41176757 181.1417163  193.7832635
 198.53502316 184.17828109 176.25988478 202.11308733 193.304351
 189.39269547 188.07499469 185.42610768 198.32900695 174.27881869
 185.85872212 186.83028836 184.88414102 193.92724277 207.71177236
 196.02873516 182.71879944 201.3967425  200.19600094 182.2125484

In [9]:
print("Simulated finishing times mean and standard deviation:")
print("Mean: ", np.mean(times))
print("Standard deviation: ", np.std(times))

Simulated finishing times mean and standard deviation:
Mean:  192.6462159160105
Standard deviation:  9.30531865878367


The expected results are a mean of 271.8 sec/km and standard deviation of 17.7 sec/km.

### References

[1] Wikipedia, "Marathon", https://en.wikipedia.org/wiki/Marathon#Origin

[2] Elizabeth Nix, "Why is a marathon 26.2 miles?", https://www.history.com/news/why-is-a-marathon-26-2-miles#:~:text=The%20idea%20for%20the%20modern,After%20making%20his%20announcement%2C%20the

[3] Giovanni Tanda, "Prediction of marathon performance time on the basis of training indices", https://www.researchgate.net/publication/262686102_Prediction_of_marathon_performance_time_on_the_basis_of_training_indices

[4] Wiebke Schmid et al., "Predictor Variables for Marathon Race Time in Recreational Female Runners", https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426727/

[5] Andrew J. Vickers & Emily A. Vertosick , "An empirical study of race times in recreational endurance runners", https://bmcsportsscimedrehabil.biomedcentral.com/articles/10.1186/s13102-016-0052-y

[6] Alison Keogh et al., "The Determinants of Marathon Performance: An Observational Analysis of Anthropometric, Pre-race and In-race Variables", https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7449326/#b19-ijes-13-6-1132%5D

[7] Giovanni Tanda, "Prediction of marathon performance time on the
basis of training indices", https://rua.ua.es/dspace/bitstream/10045/18930/1/jhse_Vol_VI_N_III_511-520.pdf
