#  Combining Garmin and Clue data - Generate training program
This script uses Garmin and Clue data to visualise running performance at the same time as menstrual data. Although Garmin has it's own menstrual calendar it is not possible to download this data or to see this data alongside performance information. If you want to track your own menstrual data rather than using an app create a csv file with a column of the dates you've had your period instead.

You need three files for this script to run.
* `Activities.csv` The CSV downloaded from Garmin connect when all activities are selected.
* `Activities_running_only.csv` The CSV downloaded from Garmin connect when only running is slected. This will export more of the running statistics.
* `clue_measurements.json` This is the `measurements.json` file created when you download data from the menstrual tracking app Clue. If you aren't using clue then you will need `period_dates.csv`, enter the dates in the format `YYYY-MM-DD`.

To download from Garmin:
1. Login to Garmin Connect.
2. Sidebar select `Activities>All activities`.  
3. Select either running or all activites. You will need both.
4. Scroll down as far as the dates you want to include.
5. Click `Export csv`.
6. Move your csv file to the same place you've chosen your data (probably a directory where this script is saved.)

To download from Clue:
1. Open the Clue app
2. Go to the More Menu (the = in the top-right corner of your Cycle View)
3. Tap Settings
4. Tap Download my data
5. Tap Request data
6. A screen will appear with a unique password to download the data file - copy this. You will probably want to send this to yourself as it's likely you'll run this script on a computer rather than phone. So paste->send->copy again.
7. Open the email from Clue that was sent to your Clue email address
8. The email will include a link to download the data file, which expires after 72 hours
9. Tap Download data
10. Extract the zip to the desired location.
11. When you open the file enter the password. Save the `measurements` file as `clue_measurements`. The others can now be deleted.

## What's the program?
This program will use the training and menstrual data to generate a program based on the following:
* Your prior training load and mileage.
* Any previous dips around the cycle.
* Restricting weekly increase to no more than 10% (on average over a block, accounting for down weeks).
* Restricting long runs to no more than 50% of total weekly mileage. 

In [155]:
import pandas as pd
import altair as alt
import numpy as np
import datetime as dt
from IPython.display import display, Markdown
from scipy import stats
from datetime import timedelta
import json
import os

# Your inputs
Input your HR zones and the name of the directory containing your data.

## Heart rate zones
To access your heart rate (HR) zones in the Garmin Connect app, you can do the following:
1. Open the app
2. Select More in the bottom right corner
3. Select Garmin Devices
4. Select your device
5. Select User Settings, User Profile, or My Stats
6. Select Heart Rate Zones or Heart Rate
7. Customize your HR zones
8. Select Done

## Data directory
* If your data directory is saved in the same place as this script then you can enter the name inside \" \". Include "/" at the end.
* Prefix the directory with "../" if it's in the directory above. Use this as many times as you need to go up. 
* If your data isn't in a separate folder and is in the same place as the script then set to \"\".
* If your data is in a completely different part of your system you can use an absolute path for example \"C:/User/name/data/", this is NOT advised if you are going to be sharing this script. Include "/" at the end.

In [156]:
hr_zones = [120,148, 165, 176, 185, 213]
dir_name = "my_data/" # The name and path of the directory you have the data saved in.

# Setup the Functions

## Function for numbering Heartrate zones

In [157]:
def set_hr_zone(row, hr_zones):
    hr = row["Avg HR"]
    if hr <hr_zones[0]: return 1
    elif hr_zones[0] <= hr < hr_zones[1]: return 2
    elif hr_zones[1] <= hr < hr_zones[2]: return 3
    elif hr_zones[2]<= hr < hr_zones[3]: return 4
    elif hr_zones[3] <= hr < hr_zones[4]: return 5
    elif hr_zones[4]<= hr < hr_zones[5]: return 6
    elif hr_zones[5]<= hr : return 6
    else: return 1

## Function to convert `Elapsed Time` and `Duration` to integer number of seconds.

In [158]:
def make_del(row):
    if "Elapsed Time" in row.index:
        entry = row["Elapsed Time"]
    elif "Duration" in row.index:
        entry = row["Duration"]
    else: raise("Duration column is missing, check csv files.")
    splits = entry.split(':')
    if len(splits)>2:
        h, m, s = splits
        if "." in s:
          s, _ = s.split('.')
    elif len(splits)==2:
        h = 0
        m = splits[0]
        if "." in s:
          s, _ = s.split('.')
    else: return dt.timedelta(hours=int(0), minutes=int(0), seconds=int(0)).total_seconds()
    return dt.timedelta(hours=int(h), minutes=int(m), seconds=int(s)).total_seconds()

## Set workout types based on day of the week

In [159]:
def get_rtype_by_weekday(run_date):
    if (run_date.weekday() == 0):
        r_day = "Mon"
        r_type = "Gym"
    elif (run_date.weekday() == 1):
        r_day = "Tue"
        r_type = "Intervals"
    elif (run_date.weekday() == 2):
        r_day = "Wed"
        r_type = "rest"
    elif (run_date.weekday() == 3):
        r_day = "Thur"
        r_type = "Intervals"
    elif (run_date.weekday() == 4):
        r_day = "Fri"
        r_type = "Gym"
    elif (run_date.weekday() == 5):
        r_day = "Sat"
        r_type = "rest"
    elif (run_date.weekday() == 6):
        r_day = "Sun"
        r_type = "Long_run"
    return ({"day" : r_day, "r_type": r_type})


## Set the distance based on type of run

In [160]:
def get_dist_by_type(r_type, week_dist, long_dist):
    if r_type == "Long_run": 
        r_dist = long_dist
    elif r_type == "Intervals":
        r_dist = (week_dist - long_dist)/2
    else: r_dist = 0
    return(r_dist)

## Find the date of the next Monday

In [205]:
def reset_date_next_mon(date):
    day_week = date.weekday()
    if day_week == 0: return(date)
    elif day_week == 1: return(date+timedelta(days=6))
    elif day_week == 2: return(date+timedelta(days=5))
    elif day_week == 3: return(date+timedelta(days=4))
    elif day_week == 4: return(date+timedelta(days=3))
    elif day_week == 5: return(date+timedelta(days=2))
    elif day_week == 6: return(date+timedelta(days=1))
    return

# Load Data
## Load the activities data

In [185]:
try: os.path.isfile(dir_name+ "Activities.csv")
except: print("Activities.csv is not in the data directory. Check location and filename")
activities_df = pd.read_csv(dir_name+"Activities.csv", header = 0, parse_dates=["Date"])
activities_df.head(-5)

Unnamed: 0,Activity Type,Date,Favorite,Title,Distance,Calories,Time,Avg HR,Max HR,Aerobic TE,...,Max Resp,Stress Change,Stress Start,Stress End,Avg Stress,Max Stress,Moving Time,Elapsed Time,Min Elevation,Max Elevation
0,Running,2024-07-21 08:00:25,False,Bath and North East Somerset Running,15.83,1030,01:42:18,176,188,4.4,...,41,--,--,--,--,--,01:40:13,02:02:12,15,186
1,Indoor Cycling,2024-07-17 20:59:15,False,Indoor Cycling,0.00,58,00:20:06,88,124,0.2,...,17,--,--,--,--,--,00:00:00,00:20:06,--,--
2,Indoor Cycling,2024-07-17 17:59:00,False,Indoor Cycling,0.00,427,00:45:58,145,182,3.3,...,43,--,--,--,--,--,00:00:00,00:45:58,--,--
3,Cycling,2024-07-17 17:32:09,False,Bath and North East Somerset Cycling,4.75,131,00:20:13,117,145,1.1,...,--,--,--,--,--,--,00:18:22,00:20:13,25,51
4,Cycling,2024-07-17 11:34:19,False,Bath and North East Somerset Cycling,4.12,103,00:18:21,108,141,0.5,...,--,--,--,--,--,--,00:17:31,03:00:05,26,53
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
590,Walking,2023-08-22 17:03:03,False,Maldon Walking,2.61,155,00:40:42,81,108,0.4,...,28,--,--,--,--,--,00:37:58,00:40:42,22,35
591,Walking,2023-08-20 19:16:42,False,Maldon Walking,1.01,72,00:17:25,100,131,0.3,...,29,--,--,--,--,--,00:15:02,00:18:34,18,36
592,Running,2023-08-20 17:41:19,False,Maldon Running,9.52,662,01:27:29,141,166,3.0,...,37,--,--,--,--,--,01:19:33,01:27:29,15,63
593,Stand Up Paddleboarding,2023-08-19 13:56:30,False,Maldon Stand Up Paddleboarding,2.67,279,00:58:16,111,135,1.0,...,--,--,--,--,--,--,00:42:36,00:58:16,--,--


## Load the running only activities data

In [186]:
try: os.path.isfile(dir_name+ "Activities_running_only.csv")
except: raise("Activities_running_only.csv is not in the data directory. Check location and filename")
running_df = pd.read_csv(dir_name+"Activities_running_only.csv", header = 0, parse_dates=["Date"])
running_df.head(-5)

Unnamed: 0,Activity Type,Date,Favorite,Title,Distance,Calories,Time,Avg HR,Max HR,Aerobic TE,...,Best Lap Time,Number of Laps,Max Temp,Avg Resp,Min Resp,Max Resp,Moving Time,Elapsed Time,Min Elevation,Max Elevation
0,Running,2024-07-21 08:00:25,False,Bath and North East Somerset Running,15.83,1030,01:42:18,176,188,4.4,...,00:04:53.2,16,29.0,34,17,41,01:40:13,02:02:12,15,186
1,Running,2024-07-16 18:37:07,False,Bath and North East Somerset Running,4.75,287,00:37:30,138,169,2.3,...,00:05:50.6,5,31.0,26,16,34,00:30:21,00:39:24,23,50
2,Running,2024-07-14 08:14:53,False,Bath and North East Somerset Running,8.79,527,00:54:05,155,170,3.3,...,00:04:31.6,9,28.0,26,16,35,00:52:48,01:15:26,12,37
3,Running,2024-07-12 19:16:48,False,Wiltshire Running,9.66,620,00:57:05,169,188,3.6,...,00:03:54.3,10,23.0,32,21,41,00:56:59,00:57:05,52,158
4,Running,2024-07-11 19:30:12,False,Bath and North East Somerset Running,4.38,297,00:34:38,148,171,2.3,...,00:02:05.8,5,28.0,26,11,35,00:34:22,00:35:27,34,189
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
210,Running,2023-08-18 09:03:35,False,Maldon Running,1.48,103,00:11:27,147,163,2.0,...,00:03:39.9,2,30.0,32,14,37,00:11:23,00:11:27,27,37
211,Running,2023-08-15 18:49:21,False,Maldon - HR efforts,5.87,395,00:43:24,155,179,3.1,...,00:00:51.0,12,29.0,31,21,38,00:43:22,00:43:24,15,27
212,Running,2023-08-13 11:51:14,False,Bath and North East Somerset - 12 miles + WU +...,23.71,1607,03:00:43,164,183,4.6,...,00:00:15.1,37,31.0,33,21,43,03:00:16,03:00:51,19,44
213,Running,2023-08-11 09:01:31,False,Bath and North East Somerset - Base,5.02,342,00:38:49,147,165,2.6,...,00:01:49.4,6,30.0,29,20,37,00:38:18,00:39:01,30,55


## Load period data from Clue - JSON

In [187]:
try: period_df = pd.read_json(dir_name + "clue_measurements.json")
except: 
    try: period_df = pd.read_csv(dir_name + "period_dates.csv")
    except: print("No period data found. is not in the data directory. Check location and filename")
period_df = period_df[period_df.type=='period'].sort_values(by="date")
period_dates=period_df.date.reset_index(drop=True)

## Calculate the length of each period.
It is assumed that cycles are at least $18$ days and tracking between this is probably spotting.

In [188]:
cycle_lengths = {}
ndates = len(period_dates)
prior_p_date = period_dates[0]
prior_date = period_dates[0]
for i in range(ndates):
    date = period_dates[i]
    day_diff = (date-prior_p_date).days
    if day_diff>18:
        cycle_lengths[i] = {
            'start_date':prior_p_date,
            'end_date': date,
            'cycle_length': day_diff}
        prior_p_date = date
cycle_df = pd.DataFrame(cycle_lengths).T
cycle_df.sort_values(by="start_date")
cycle_df.start_date = pd.to_datetime(cycle_df.start_date)
cycle_df.end_date = pd.to_datetime(cycle_df.end_date)
cycle_df.cycle_length = pd.to_numeric(cycle_df.cycle_length)
avg_cyc_leng = cycle_df.cycle_length.mean()
cyc_var = cycle_df.cycle_length.var()
print("Average cycle length is %d with variance %d"%(avg_cyc_leng, cyc_var))


Average cycle length is 31 with variance 27


## Set the data-type for the Date columns

In [189]:
activities_df["Date"] = pd.to_datetime(activities_df["Date"], dayfirst= True)
running_df["Date"] = pd.to_datetime(running_df["Date"], dayfirst= True)
period_dates = pd.to_datetime(period_dates, dayfirst= True)

# Combine and restucture the data
## Add period data to activities and running dataframes.

In [190]:
running_df['Period'] = 0
activities_df['Period'] = 0
running_df.loc[running_df.Date.dt.date.isin(period_dates.dt.date),'Period'] = 1
activities_df.loc[activities_df.Date.dt.date.isin(period_dates.dt.date),'Period'] = 1


## Get the running data
### Extract data from the strings.

In [191]:
running_df.rename(columns = {"Time": "Duration"}, inplace= True)
running_df["Distance"] = pd.to_numeric(running_df["Distance"])
running_df[['L_GCT Balance', 'R_GCT Balance']] = running_df["Avg GCT Balance"].str.split(" / ", expand= True)
running_df.loc[:,"L_GCT Balance"] = running_df.loc[:,"L_GCT Balance"].str.replace(r'\D', '', regex=True)
running_df.loc[:, "R_GCT Balance"] = running_df.loc[:,"L_GCT Balance"].str.replace(r'\D', '', regex=True)
col_list = ['Avg Stride Length', 'Avg Vertical Ratio', 'Avg Vertical Oscillation', 'Avg Run Cadence', 
            'Avg Ground Contact Time', "L_GCT Balance", "R_GCT Balance"]
running_df[col_list] = running_df[col_list].apply(pd.to_numeric, errors='coerce')
running_df["L_GCT Balance"] = running_df["L_GCT Balance"] / 1000
running_df["R_GCT Balance"] = running_df["R_GCT Balance"] / 1000
running_df["Date"] = pd.to_datetime(running_df["Date"], dayfirst=True, errors = "coerce")
running_df.head()

Unnamed: 0,Activity Type,Date,Favorite,Title,Distance,Calories,Duration,Avg HR,Max HR,Aerobic TE,...,Avg Resp,Min Resp,Max Resp,Moving Time,Elapsed Time,Min Elevation,Max Elevation,Period,L_GCT Balance,R_GCT Balance
0,Running,2024-07-21 08:00:25,False,Bath and North East Somerset Running,15.83,1030,01:42:18,176,188,4.4,...,34,17,41,01:40:13,02:02:12,15,186,0,0.507,0.507
1,Running,2024-07-16 18:37:07,False,Bath and North East Somerset Running,4.75,287,00:37:30,138,169,2.3,...,26,16,34,00:30:21,00:39:24,23,50,0,0.5,0.5
2,Running,2024-07-14 08:14:53,False,Bath and North East Somerset Running,8.79,527,00:54:05,155,170,3.3,...,26,16,35,00:52:48,01:15:26,12,37,0,0.506,0.506
3,Running,2024-07-12 19:16:48,False,Wiltshire Running,9.66,620,00:57:05,169,188,3.6,...,32,21,41,00:56:59,00:57:05,52,158,0,0.519,0.519
4,Running,2024-07-11 19:30:12,False,Bath and North East Somerset Running,4.38,297,00:34:38,148,171,2.3,...,26,11,35,00:34:22,00:35:27,34,189,1,0.498,0.498


## Collect activities by date
### Edit to match your own heart rate zones

In [192]:
start_date = min(running_df.Date.min(),activities_df.Date.min())

### Calculate the total duration in seconds for further calculations.

In [193]:
running_df["Duration_seconds"]=running_df.apply(make_del, axis=1)
running_df["Duration_minutes"]=running_df.Duration_seconds%60
activities_df["Duration_seconds"]=activities_df.apply(make_del, axis=1)
activities_df["Duration_minutes"]=activities_df.Duration_seconds%60

### Set the heartrate zones

In [194]:
activities_df["Avg HR"] = pd.to_numeric(activities_df["Avg HR"], errors='coerce')
activities_df["hr_zone"] = activities_df.apply(set_hr_zone, hr_zones = hr_zones, axis = 1)
activities_df['load'] = activities_df["Duration_minutes"] * activities_df['Avg HR']

In [195]:
running_df.columns

Index(['Activity Type', 'Date', 'Favorite', 'Title', 'Distance', 'Calories',
       'Duration', 'Avg HR', 'Max HR', 'Aerobic TE', 'Avg Run Cadence',
       'Max Run Cadence', 'Avg Pace', 'Best Pace', 'Total Ascent',
       'Total Descent', 'Avg Stride Length', 'Avg Vertical Ratio',
       'Avg Vertical Oscillation', 'Avg Ground Contact Time',
       'Avg GCT Balance', 'Training Stress Score®', 'Grit', 'Flow',
       'Avg. Swolf', 'Avg Stroke Rate', 'Total Reps', 'Min Temp',
       'Decompression', 'Best Lap Time', 'Number of Laps', 'Max Temp',
       'Avg Resp', 'Min Resp', 'Max Resp', 'Moving Time', 'Elapsed Time',
       'Min Elevation', 'Max Elevation', 'Period', 'L_GCT Balance',
       'R_GCT Balance', 'Duration_seconds', 'Duration_minutes'],
      dtype='object')

In [196]:
running_df["Avg HR"] = pd.to_numeric(running_df["Avg HR"], errors='coerce')
running_df["hr_zone"] = running_df.apply(set_hr_zone, hr_zones = hr_zones, axis = 1)
running_df['load'] = running_df["Duration_minutes"] * running_df['Avg HR']

### Day totals
Calculate the total activity per day and the training load corresponding.
Trianing load calculated using: $$\text{load} = \text{minutes of activity} * \text{average heart rate}.$$

In [197]:

#running_df.Date.dt.normalize(),start_date.date()
date_format = "%Y/%m/%d"
ndays = (dt.datetime.today()-start_date).days
by_date = {}
for i in range(ndays):
    new_date = (start_date+timedelta(days=i))
    next_date = (start_date+timedelta(days=i+7))
    runs = running_df.loc[running_df.Date.dt.date==pd.Timestamp(new_date.date()).date()]
    activities = activities_df[activities_df.Date.dt.date==pd.Timestamp(new_date.date()).date()]
    next_week = pd.date_range(new_date, next_date, periods = 7)
    if sum(period_dates==pd.Timestamp(new_date.date())):
        period = 1
    elif period_dates.isin(next_week.date).sum()>0:
        period = 2
    else:
        period = 0
    if len(activities)==0:
        tot_dist=0
        duration=0
        load=0
    else:
        tot_dist = runs["Distance"].sum()
        duration = dt.timedelta(seconds=activities["Duration_seconds"].sum())
        load = activities.load.sum()
    running_df.loc[running_df.Date.dt.date==pd.Timestamp(new_date.date()).date(),"Period"] = period
    activities_df.loc[activities_df.Date.dt.date==pd.Timestamp(new_date.date()).date(),"Period"] = period
    by_date[i] = {
        'Date': new_date,
        'run_dist': tot_dist,
        'duration': duration,
        'duration_seconds': activities["Duration_seconds"].sum(),
        'duration_minutes': activities["Duration_minutes"].sum(),
        'load': load,
        'Period': period}
overall_by_date_df = pd.DataFrame(by_date).T
overall_by_date_df.Date = pd.to_datetime(overall_by_date_df.Date)
overall_by_date_df.run_dist = pd.to_numeric(overall_by_date_df.run_dist)
overall_by_date_df.duration_seconds = pd.to_numeric(overall_by_date_df.duration_seconds)
overall_by_date_df.load = pd.to_numeric(overall_by_date_df.load)
overall_by_date_df.head()

  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:
  elif period_dates.isin(next_week.date).sum()>0:


Unnamed: 0,Date,run_dist,duration,duration_seconds,duration_minutes,load,Period
0,2023-07-22 09:01:52,0.0,0,0.0,0.0,0,1
1,2023-07-23 09:01:52,0.0,0,0.0,0.0,0,1
2,2023-07-24 09:01:52,0.0,0,0.0,0.0,0,0
3,2023-07-25 09:01:52,0.0,0,0.0,0.0,0,0
4,2023-07-26 09:01:52,0.0,0,0.0,0.0,0,0


## Find the relationship between training load and run distance
By finding the most common relationship between load and distance we can then create a proxy distance for none-running activities. We can then use this to create `proxy-distance` to aid coming back from injury. This also prevents over-training from additional activities.

If there are more than 2 weeks with zero running then mileage will be adjusted based on previous running mileage. The `proxy-distance` will then be used to generate an alternative activity that will slowly reduce and readjust to running again. 

In [198]:
running_df["coef"]=running_df["Distance"]/running_df["load"]
predict_coef = running_df["coef"].median()
overall_by_date_df["proxy_kms"] = overall_by_date_df["load"]*predict_coef
activities_df["proxy_kms"] = activities_df.load * predict_coef

## Calculate any change in load due to Luteal phase

In [199]:
luteal_decrease = {}
luteal_dates = overall_by_date_df.loc[overall_by_date_df.Period == 2, "Date"]
luteal_dates.reset_index(inplace=True, drop=True)
luteal_dates

0    2023-08-14 09:01:52
1    2023-08-15 09:01:52
2    2023-08-16 09:01:52
3    2023-08-17 09:01:52
4    2023-08-18 09:01:52
             ...        
71   2024-07-01 09:01:52
72   2024-07-02 09:01:52
73   2024-07-03 09:01:52
74   2024-07-04 09:01:52
75   2024-07-05 09:01:52
Name: Date, Length: 76, dtype: datetime64[ns]

In [200]:
ndates = len(luteal_dates)
prior_p_date = luteal_dates.loc[0]
for i in range(ndates):
    date = luteal_dates[i]
    day_diff = (date-prior_p_date).days
    if day_diff>7:
        previous_week = pd.date_range(date - timedelta(days=7), date, periods =7).date
        this_week = pd.date_range(date, timedelta(days=7) + date, periods =7).date
        pre_acts = overall_by_date_df[overall_by_date_df.Date.dt.date.isin(previous_week)]
        this_acts = overall_by_date_df[overall_by_date_df.Date.dt.date.isin(this_week)]
        if pre_acts.proxy_kms.sum()==0:
            dist_diff = 0
        else:
            dist_diff = (pre_acts.proxy_kms.sum()- this_acts.proxy_kms.sum())/(pre_acts.proxy_kms.sum())
        if pre_acts.load.sum()==0:
            load_diff = 0
        else:
            load_diff = (pre_acts.load.sum()- this_acts.load.sum())/ (pre_acts.load.sum())
        luteal_decrease[i]={
            'luteal_start_date':date,
            'luteal_end_date':date+timedelta(days=7),
            'proxy_dist_diff':dist_diff,
            'load_diff':load_diff
        }
        prior_p_date = date
luteal_df = pd.DataFrame(luteal_decrease).T
luteal_df.sort_values(by="luteal_start_date")
luteal_df.start_date = pd.to_datetime(luteal_df.luteal_start_date)
luteal_df.end_date = pd.to_datetime(luteal_df.luteal_end_date)
luteal_df.proxy_dist_diff = pd.to_numeric(luteal_df.proxy_dist_diff)
luteal_df.pload_diff = pd.to_numeric(luteal_df.load_diff)
avg_dist_diff = luteal_df.proxy_dist_diff.mean()
dist_diff_var = luteal_df.proxy_dist_diff.var()
avg_load_diff = luteal_df.load_diff.mean()
load_diff_var = luteal_df.load_diff.var()
print("Average luteal mileage decrease is %.2f with variance %.2f"%(avg_dist_diff,dist_diff_var))
print("Average luteal load decrease is %.2f with variance %.2f"%(avg_load_diff,load_diff_var))


Average luteal mileage decrease is -0.38 with variance 0.63
Average luteal load decrease is -0.38 with variance 0.63


  luteal_df.start_date = pd.to_datetime(luteal_df.luteal_start_date)
  luteal_df.end_date = pd.to_datetime(luteal_df.luteal_end_date)
  luteal_df.pload_diff = pd.to_numeric(luteal_df.load_diff)


## Find the AVG Heartrate for not running activites
Use the AVG HR to predict training load for non-running activities

In [201]:
act_mean_hrs = activities_df[['Activity Type','Avg HR','Duration_minutes']].groupby(by='Activity Type').mean()
act_mean_hrs['hr_zone'] = act_mean_hrs.apply(set_hr_zone,hr_zones=hr_zones, axis=1)
act_mean_hrs

Unnamed: 0_level_0,Avg HR,Duration_minutes,hr_zone
Activity Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bouldering,104.333333,30.888889,1
Breathwork,48.0,55.0,1
Cardio,129.574468,30.170213,2
Cycling,117.925234,29.850467,1
Elliptical,138.5,30.5,2
HIIT,114.705882,27.176471,1
Indoor Climbing,88.5,30.0,1
Indoor Cycling,130.125,23.125,2
Open Water Swimming,107.0,28.833333,1
Other,84.090909,22.272727,1


## Proxy distance for a strength session

In [202]:
sc_load = act_mean_hrs.loc['Strength Training', 'Duration_minutes']*act_mean_hrs.loc['Strength Training', 'Avg HR']
sc_proxy_kms = sc_load * predict_coef

# Generate The future plan using the training data.

TO ADD: check the previous cycle for missing runs. If no runs have been had for more then 3 weeks then reduce the running distance and add in none running load.

In [277]:
weeks_per_cycle = avg_cyc_leng%7
last_date = overall_by_date_df.Date.dt.date.max()
start_prev_block = pd.to_datetime(last_date-timedelta(days=7*weeks_per_cycle)).date()
init_prox_week_dist = overall_by_date_df.loc[overall_by_date_df.Date.dt.date > start_prev_block ,"proxy_kms"].sum()/weeks_per_cycle
init_load_tot = overall_by_date_df.loc[overall_by_date_df.Date.dt.date > start_prev_block,"load"].sum()/weeks_per_cycle
init_long_dist = overall_by_date_df.loc[overall_by_date_df.Date.dt.date > start_prev_block,"run_dist"].max()

In [289]:
tot_w = 20
init_week_num = 0
week_day_bound = 12 # Maximum distance on a week day. When max week_dist is met this will increase by 10%
long_run_bound = 35 # Maximum distance for a long run. When max week_dist is met this will increase by 10%
week_bound = 200 # Maximum distance per week. When max week_dist is met this will increase by 10%
init_date = last_date + timedelta(days = 1)
init_date = reset_date_next_mon(init_date)
new_week_dist = init_prox_week_dist
long_dist = init_long_dist
new_date = init_date
week_no = init_week_num
DW = "N"
list_dict = []
for nd in range(tot_w):
    # Check if down_week
    print(init_prox_week_dist)
    if nd%weeks_per_cycle==0:
        mul = 1+avg_load_diff
        DW = "Y"
    else:
        mul = 1.1
        DW = "N"
    if new_week_dist * mul >= week_bound:
        new_week_dist = week_bound
        week_bound *= 1.1
        long_run_bound *= 1.1
        week_day_bound *= 1.1
        mul = 1.0
    new_week_dist *= mul
    long_dist *= mul
    # Set default distances for the overflow days
    mon_dist = 0
    wed_dist = 0
    fri_dist = 0
    sat_dist = 0
    # Add Sun
    sun_dist = min(0.5*new_week_dist, long_run_bound)
    wd_dist = (new_week_dist-sun_dist-2*sc_proxy_kms)*0.5
    tue_dist = min(wd_dist, week_day_bound)
    thur_dist = min(wd_dist,week_day_bound)
    extra_dist = max(new_week_dist - 2*sc_proxy_kms - tue_dist - thur_dist - sun_dist,0)
    if extra_dist < week_day_bound:
        sat_dist = extra_dist
    elif week_day_bound <= extra_dist < 2* week_day_bound:
        fri_dist = 0.5*extra_dist
        sat_dist = 0.5*extra_dist
    elif 2*week_day_bound <= extra_dist < 3* week_day_bound:
        mon_dist = extra_dist/3
        fri_dist = extra_dist/3
        sat_dist = extra_dist/3
    elif 3*week_day_bound <= extra_dist:
        mon_dist = min(extra_dist*0.25,week_day_bound)
        wed_dist = min(extra_dist*0.25,week_day_bound)
        fri_dist = min(extra_dist*0.25,week_day_bound)
        sat_dist = min(extra_dist*0.25,week_day_bound)
    new_week_dist = sum([2*sc_proxy_kms,mon_dist, tue_dist,wed_dist,thur_dist,fri_dist,sat_dist, sun_dist])
    mon_dict =  {
        "Week_No": week_no,
        "Day": "Mon",
        "Date": new_date,
        "Week_tot": new_week_dist,
        "Run_type": "Top-up/ rest + Strength",
        "Distance": mon_dist+sc_proxy_kms,
        "Down_week": DW,
        "Percent_week_tot": 100*mon_dist/new_week_dist,
        "Description": "max running dist: %d, Strength session, duration %d mins"%(mon_dist,act_mean_hrs.loc['Strength Training', 'Duration_minutes'])
        }
    tue_dict =  {
        "Week_No": week_no,
        "Day": "Tue",
        "Date": new_date + timedelta(days=1),
        "Week_tot": new_week_dist,
        "Run_type": "Workout",
        "Distance": tue_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*tue_dist/new_week_dist,
        "Description": "Workout session, max dist %d"%tue_dist
        }
    wed_dict =  {
        "Week_No": week_no,
        "Day": "Wed",
        "Date": new_date + timedelta(days=2),
        "Week_tot": new_week_dist,
        "Run_type": "Top-up/ rest + Strength",
        "Distance": wed_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*wed_dist/new_week_dist,
        "Description": "max running dist: %d, Strength session, duration %d mins"%(wed_dist,act_mean_hrs.loc['Strength Training', 'Duration_minutes'])
        }
    thur_dict =  {
        "Week_No": week_no,
        "Day": "Thur",
        "Date": new_date + timedelta(days=3),
        "Week_tot": new_week_dist,
        "Run_type": "Workout",
        "Distance": thur_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*thur_dist/new_week_dist,
        "Description": "Workout session, max dist%d"%thur_dist
        }
    fri_dict =  {
        "Week_No": week_no,
        "Day": "Fri",
        "Date": new_date + timedelta(days=4),
        "Week_tot": new_week_dist,
        "Run_type": "Top-up/ rest",
        "Distance": fri_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*fri_dist/new_week_dist,
        "Description": "Easy day, max dist: %d"%fri_dist
        }
    sat_dict =  {
        "Week_No": week_no,
        "Day": "Sat",
        "Date": new_date + timedelta(days=5),
        "Week_tot": new_week_dist,
        "Run_type": "Top-up/ rest",
        "Distance": sat_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*sat_dist/new_week_dist,
        "Description": "Easy day, max distance: %d"%sat_dist
        }
    sun_dict =  {
        "Week_No": week_no,
        "Day": "Sun",
        "Date": new_date + timedelta(days=6),
        "Week_tot": new_week_dist,
        "Run_type": "Long_run",
        "Distance": sun_dist,
        "Down_week": DW,
        "Percent_week_tot": 100*sun_dist/new_week_dist,
        "Description": "Long run %d"%sun_dist
        }
    list_dict += [ mon_dict, tue_dict, wed_dict, thur_dict, fri_dict, sat_dict, sun_dict]
    week_no+=1
    new_date +=timedelta(days = 7)
plan_df = pd.DataFrame(list_dict)
plan_df
    
    

59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174
59.549093572975174


Unnamed: 0,Week_No,Day,Date,Week_tot,Run_type,Distance,Down_week,Percent_week_tot,Description
0,0,Mon,2024-07-22,36.710260,Top-up/ rest + Strength,3.766384,Y,0.000000,"max running dist: 0, Strength session, duratio..."
1,0,Tue,2024-07-23,36.710260,Workout,5.411180,Y,14.740240,"Workout session, max dist 5"
2,0,Wed,2024-07-24,36.710260,Top-up/ rest + Strength,0.000000,Y,0.000000,"max running dist: 0, Strength session, duratio..."
3,0,Thur,2024-07-25,36.710260,Workout,5.411180,Y,14.740240,"Workout session, max dist5"
4,0,Fri,2024-07-26,36.710260,Top-up/ rest,0.000000,Y,0.000000,"Easy day, max dist: 0"
...,...,...,...,...,...,...,...,...,...
135,19,Wed,2024-12-04,114.532769,Top-up/ rest + Strength,12.000000,N,10.477351,"max running dist: 12, Strength session, durati..."
136,19,Thur,2024-12-05,114.532769,Workout,12.000000,N,10.477351,"Workout session, max dist12"
137,19,Fri,2024-12-06,114.532769,Top-up/ rest,12.000000,N,10.477351,"Easy day, max dist: 12"
138,19,Sat,2024-12-07,114.532769,Top-up/ rest,12.000000,N,10.477351,"Easy day, max distance: 12"


In [293]:
weeks_df = pd.pivot(plan_df, values = "Description", index = 'Week_No', columns = 'Day')
weeks_df = weeks_df[['Mon', 'Tue', 'Wed', 'Thur', 'Fri','Sat','Sun']]
weeks_df

Day,Mon,Tue,Wed,Thur,Fri,Sat,Sun
Week_No,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,"max running dist: 0, Strength session, duratio...","Workout session, max dist 5","max running dist: 0, Strength session, duratio...","Workout session, max dist5","Easy day, max dist: 0","Easy day, max distance: 0",Long run 18
1,"max running dist: 0, Strength session, duratio...","Workout session, max dist 6","max running dist: 0, Strength session, duratio...","Workout session, max dist6","Easy day, max dist: 0","Easy day, max distance: 0",Long run 20
2,"max running dist: 0, Strength session, duratio...","Workout session, max dist 7","max running dist: 0, Strength session, duratio...","Workout session, max dist7","Easy day, max dist: 0","Easy day, max distance: 0",Long run 22
3,"max running dist: 0, Strength session, duratio...","Workout session, max dist 8","max running dist: 0, Strength session, duratio...","Workout session, max dist8","Easy day, max dist: 0","Easy day, max distance: 0",Long run 24
4,"max running dist: 0, Strength session, duratio...","Workout session, max dist 9","max running dist: 0, Strength session, duratio...","Workout session, max dist9","Easy day, max dist: 0","Easy day, max distance: 0",Long run 26
5,"max running dist: 0, Strength session, duratio...","Workout session, max dist 11","max running dist: 0, Strength session, duratio...","Workout session, max dist11","Easy day, max dist: 0","Easy day, max distance: 0",Long run 29
6,"max running dist: 0, Strength session, duratio...","Workout session, max dist 12","max running dist: 0, Strength session, duratio...","Workout session, max dist12","Easy day, max dist: 0","Easy day, max distance: 0",Long run 32
7,"max running dist: 0, Strength session, duratio...","Workout session, max dist 12","max running dist: 0, Strength session, duratio...","Workout session, max dist12","Easy day, max dist: 0","Easy day, max distance: 5",Long run 35
8,"max running dist: 0, Strength session, duratio...","Workout session, max dist 12","max running dist: 0, Strength session, duratio...","Workout session, max dist12","Easy day, max dist: 6","Easy day, max distance: 6",Long run 35
9,"max running dist: 0, Strength session, duratio...","Workout session, max dist 12","max running dist: 0, Strength session, duratio...","Workout session, max dist12","Easy day, max dist: 10","Easy day, max distance: 10",Long run 35


In [None]:
thresh_dates = plan_df.loc[plan_df.Week_tot > 100, "Date"]
indices = thresh_dates.index
mara_date = thresh_dates[indices[0]]
print("Marathon ready by "+ str(mara_date))
print("Earliest race "+ str(mara_date + timedelta(weeks = 3)))

Marathon ready by 2024-11-04 17:09:53
Earliest race 2024-11-25 17:09:53


In [None]:
dt.date(2023, 5, 21) + timedelta(weeks=45 + 3)

datetime.date(2024, 4, 21)

# Load the activities data

In [None]:
activities_df = pd.read_csv("Activities.csv", header = 0, parse_dates=["Date"])
activities_df.head()

  activities_df = pd.read_csv("Activities.csv", header = 0, parse_dates=["Date"])


Unnamed: 0,Activity Type,Date,Favorite,Title,Distance,Calories,Time,Avg HR,Max HR,Aerobic TE,...,Max Resp,Stress Change,Stress Start,Stress End,Avg Stress,Max Stress,Moving Time,Elapsed Time,Min Elevation,Max Elevation
0,Indoor Cycling,2024-07-17 20:59:00,False,Indoor Cycling,0.0,58,00:20:06,88,124,0.2,...,17,--,--,--,--,--,00:00:00,00:20:06,--,--
1,Indoor Cycling,2024-07-17 17:59:00,False,Indoor Cycling,0.0,427,00:45:58,145,182,3.3,...,43,--,--,--,--,--,00:00:00,00:45:58,--,--
2,Cycling,2024-07-17 17:32:00,False,Bath and North East Somerset Cycling,4.75,131,00:20:13,117,145,1.1,...,--,--,--,--,--,--,00:18:22,00:20:13,25,51
3,Cycling,2024-07-17 11:34:00,False,Bath and North East Somerset Cycling,4.12,103,00:18:21,108,141,0.5,...,--,--,--,--,--,--,00:17:31,03:00:05,26,53
4,Running,2024-07-16 18:37:00,False,Bath and North East Somerset Running,4.75,287,00:37:30,138,169,2.3,...,34,--,--,--,--,--,00:30:21,00:39:24,23,50
