In [1]:
#Goal of this notebook is to develop the framework for a reinforcement learning algorithm to drive
#the optimization of performance characteristics of the athlete.

In [2]:
#The general idea is to drive the athlete to faster and faster times while keeping the heart rate of the training as 
#low as possible.

### Reinforcement Learning Basics

The idea of a reinforcement learning model is that an algorithm is incentivized to choose an action that will return the most reward. The actions an algorithm takes and the history of the states that those actions lead to are recorded and used to inform future decisions. 
An algorithm also should use some randomization to ensure that it can "test out" sub optimal actions in the short time, for long term gain.

In this way, I plan to build the algorithm that will "learn" the most effective way to train an individual towards a known goal, by varying the "actions" that it takes (input of the workouts) and monitoring the effectiveness of those actions (performance based metrics of the workouts completed)

Therefore I will need to define the reward function as some function of the workout variables, and success and failure based on the balance of one or more variables. For example a run's distance might not be the only success variable, we might also be interested in the pace of that run, and the heart rate required to accomplish said run.

Some rules will have to be in place to prevent "wild" experimentation, for example a runner who has never run more than 5 miles on the program should not be given a workout to run 20+ miles in one go in a week, as a way to optimize for a variable for example.

Rewards should also most likely be implemented for consistency of engagement, as that is also crucial to the athlete using the program that the algorithm is running behind. If athletes decide to not use the app, then it is most likely not useful.

This notebook will attempt to:
* Define the basic classes of the objects we will need, pythonically
* Define the functions to pull up sufficient data to predict a training plan for a week for an athlete
* Use the greedy reinforcement learning model to optimize future workouts for better returns

---

In [134]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine, insert, update, MetaData
import os
from dotenv import load_dotenv
import json
import datetime

First let's do some local development of classes and functions, and test them out:

In [4]:
class Athlete():
    pass

In [5]:
class TrainingPlan():
    pass

In [6]:
class Workout():
    pass

Two important metrics that we will be using for goal seeking in the reinforcement algorithm will have to do with athlete performance. The faster an athlete runs for each heartbeat as a ratio relative to threshold heartrate will be the metric by which running performance will be measured, and the maximizing of this variable will be the goal we shall seek.

Similarly, the higher the wattage supplied on the bicycle per heart beat as a ratio relative to biking threshold heartrate will be the metric by which running performance will be measured, and maximizing that variable will be the goal we shall seek.

Essentially, we are trying to optimize the workout schedule that will lead to the highest run pace and/or highest wattage for the % of threshold that the athlete's heart averages for that workout, specifically a weekly benchmark usually the long run or long ride.

Functionally, the creation of the workouts can be automated and done at a slow rate, in off peak hours (possibly at midnight). To give the athlete some idea of the upcoming workouts and structure, a placeholder workout will be placed in the calendar and then fleshed out the night before based on the performance of past workouts.

So that the workout is timed and adjusted exactly to the performance of the athlete in recent days, the athlete entry in the database most likely needs to store data about what type of workouts work well for them so that recommendations are optimized. 

Different types of workout affinities can be recorded:
* Propensity to complete a workout
* Propensity for interval work
* Propensity for duration
* Preferred interval duration
* Max 30s power or pace or hr
* Max 2 min power or pace or hr
* Max 5 min power or pace or hr
* Max 10 min power or pace or hr
* Max 20 min power or pace or hr
* Max 40 min power or pace or hr
* Avg workout power or pace or hr
* Propensity for workout variation

So, we will need to define a rewards matrix, and an actions matrix for each athlete. This should be a database entry that is captured and recorded and adjusted each time a workout is completed.

We will also need a decider, that picks the best action based on the responses of the reward matrix, thus making it 'greedy'. Occasionally we will probably want the decider to choose a random or novel approach, to ensure that the first choice taken doesn't overrule a different, more fruitful strategy. 

The variables controlling the workout parameters should be defined as a distribution, so that as the function finds a set of variables that work together it increases the likelihood of that being selected, in effect defining the distribution of the reward function.

The training generator should look back up a certain window and use the distributions of the workout propensities compared to the change in performance from prior in the window to the current day's workouts, and determine if the propensities assigned were successful in increasing the target variable. 

For example, the algorithm assigns a week with a higher % of workouts that are intervals, and notices that the heart rate for the same pace as % of threshold goes down. The algorithm should record this as a "win" or a reward, and should bias slightly towards this course of action in the future. Alternatively, if the heart rate goes up, indicating that the athlete might be getting poor adaptations from it, the algorithm would react by disincentivizing that variable or reducing its weight. 

In [7]:
#Let's define a dummy athlete, and return predictions based on their change in performance:

In [8]:
#Variables saved from the last week:
athlete_bob = {'id':1, 'thresh_hr':155, 'prop_workout':0.95, 'pct_int':0.5, 'prop_dur':0, 'itv_dur':0.1}

In [9]:
athlete_bob['id']

1

In [10]:
this_weeks_workouts = [
    {
        'workout_id':'0001',
        'int_workout':True,
        'workout_dur':60,
        'workout_itv_dur':.083,
        'workout_prop_dur':.625,
        'steps':{
            0:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.65,
                'end_intensity':.85,
                'quantity':1,
                'first_duration':10,
                'second_duration':0
            },
            1:{
                'type':'interval',
                'duration':40,
                'start_intensity':.95,
                'end_intensity':.65,
                'quantity':5,
                'first_duration':5,
                'second_duration':3 
            },
            2:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.85,
                'end_intensity':.65,
                'quantity':1,
                'first_duration':10,
                'second_duration':0 
            }
        }
        
    },
    {
        'workout_id':'0002',
        'int_workout':True,
        'workout_dur':60,
        'workout_itv_dur':.083,
        'workout_prop_dur':.625,
        'steps':{
            0:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.65,
                'end_intensity':.85,
                'quantity':1,
                'first_duration':10,
                'second_duration':0
            },
            1:{
                'type':'interval',
                'duration':40,
                'start_intensity':.95,
                'end_intensity':.65,
                'quantity':5,
                'first_duration':5,
                'second_duration':3 
            },
            2:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.85,
                'end_intensity':.65,
                'quantity':1,
                'first_duration':10,
                'second_duration':0 
            }
        }
        
    }
        
    
]

In [11]:
my_string = """{0:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.65,
                'end_intensity':.85,
                'quantity':1,
                'first_duration':10,
                'second_duration':0
            },
            1:{
                'type':'interval',
                'duration':40,
                'start_intensity':.95,
                'end_intensity':.65,
                'quantity':5,
                'first_duration':5,
                'second_duration':3 
            },
            2:{
                'type':'ramp',
                'duration':10,
                'start_intensity':.85,
                'end_intensity':.65,
                'quantity':1,
                'first_duration':10,
                'second_duration':0 
            }
            }"""

In [11]:
for workout in this_weeks_workouts:
    print(workout['workout_id']+': ')
    for step in workout['steps']:
        print(workout['steps'][step]['type'])

0001: 
ramp
interval
ramp
0002: 
ramp
interval
ramp


In [12]:
def create_week(current_params):
    pass

In [13]:
#The training generator iterates weekly to capture the change in long run and/or long bike performance as measured
#by the heart rate 
class training_generator(params):
    '''
    prop_workout is the propensity to complete a workout, as a float (0.0-1.0)
    pct_int is the percent for interval workouts, as a float (0.0-.75)
    prop_dur is the propensity for duration over intensity, total time spent above .85 of threshold vs below .85, float (0.0 to 1.0)
    itv_dur is preferred interval duration, as a % of total workout time, as a float (0.0 - 1.0)
    iters is the number of iterations to run training generation 
    
    '''
    
    #Parameters input from data gathered from database:
    #Determine if the athlete is on a training plan, used to continue operational loop when called.
    on_training_plan = params['training_plan']
    
    def __init__(self, ):
        pass

SyntaxError: invalid syntax (<ipython-input-13-1bbd929bff0b>, line 17)

In [18]:
def get_next_workout_id():
    '''
    Returns the next workout id from the database
    '''
    
    if not db:
        print("No db connection.")
    
    query = '''
    SELECT MAX (workout_id)
    FROM athlete.workouts
    '''
    
    return int(pd.read_sql(query, db)['max']+1)    

In [None]:
def get_historic_data(user_id):
    '''
    Pulls the historic trend of the database in terms of workouts and rewards.
    This function will be used to determine the path that the training is going in.
    Goal is to return a 2 day trend to analyze, 7 day trend, 14 day trend, 30 day trend,
    and 90 day trend.
    '''
    data_and_rewards = {}
    
    return data_and_rewards
    
    

Testing:

In [2]:
#First, load the .env file:
load_dotenv()

True

In [3]:
#Import environment variables:
user = os.getenv('TEST_DB_USER')
password = os.getenv('TEST_DB_PW')

In [4]:
#Create a connection to the PostgreSQL db:
db = create_engine(f"postgresql://{user}:{password}@localhost:5432/postgres")

In [5]:
#Attaching to meta data
athlete_meta = MetaData(db)

In [6]:
#access the athlete schema, where our table is stored:
athlete_meta.reflect(bind=db, schema='athlete')

In [7]:
workouts_table = athlete_meta.tables['athlete.workouts']

In [39]:
#Finding table columns of workouts_table:
table_query = '''
SELECT * 
FROM athlete.workouts;
'''

df = pd.read_sql(table_query, db)
workout_cols = list(df.columns)
workout_cols


['workout_id',
 'int_workout',
 'workout_dur',
 'workout_itv_dur',
 'workout_prop_dur',
 'steps',
 'FK_athlete_id',
 'workout_date']

In [49]:
list(df.columns)

['workout_id',
 'int_workout',
 'workout_dur',
 'workout_itv_dur',
 'workout_prop_dur',
 'steps',
 'FK_athlete_id',
 'workout_date']

In [112]:
#Let's make a function for getting table columns:
#I recognize this is inefficient, but the INFORMATION_SCHEMA approach does not seem to filter down to the table.
def get_table_columns(schema, table_name):
    '''
    Returns the columns for a given schema and table name from the attached
    postgresql server.
    '''
    
    table_query = f"""
    SELECT *
    FROM {schema}.{table_name}
    WHERE 1 = 0
    ;
    """
    return list(pd.read_sql(table_query, db).columns)
    

In [69]:
#Testing the get_table_columns function:
get_table_columns('athlete', 'workouts')

['workout_id',
 'int_workout',
 'workout_dur',
 'workout_itv_dur',
 'workout_prop_dur',
 'steps',
 'FK_athlete_id',
 'workout_date']

In [70]:
#Make sure its somewhat time efficient:
%timeit get_table_columns('athlete', 'workouts')

2.34 ms ± 62.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [71]:
#Define the insert function in SQL:
ins = workouts_table.insert().values(get_table_columns('athlete', 'workouts'))
str(ins)

'INSERT INTO athlete.workouts (workout_id, int_workout, workout_dur, workout_itv_dur, workout_prop_dur, steps, "FK_athlete_id", workout_date) VALUES (%(workout_id)s, %(int_workout)s, %(workout_dur)s, %(workout_itv_dur)s, %(workout_prop_dur)s, CAST(%(steps)s AS JSONB[]), %(FK_athlete_id)s, %(workout_date)s)'

---

In [72]:
#Can I get the # of the next workout in the database?
get_next_workout_id()

5

In [41]:
#Test the insertion of a workout:
db.execute(ins, {'workout_id': 2, 'int_workout':False, 'workout_dur':60.0, 'workout_itv_dur':0, 'workout_prop_dur':40.0, 'steps':{0}, 'FK_athlete_id':1, 'workout_date':'5/6/2021'})

<sqlalchemy.engine.result.ResultProxy at 0x7fc483409550>

In [42]:
#Can I get the # of the next workout in the database?
get_next_workout_id()

3

In [46]:
#Test the insertion of a workout:
db.execute(ins, {'workout_id': 3, 'int_workout':False, 'workout_dur':60.0, 'workout_itv_dur':0, 'workout_prop_dur':40.0, 'steps':{0}, 'FK_athlete_id':1, 'workout_date':'5/6/2021'})

<sqlalchemy.engine.result.ResultProxy at 0x7fc483e15ee0>

In [43]:
#Let's make the code a little more usable:
db.execute(ins, {'workout_id': (get_next_workout_id()+1), 'int_workout':False, 'workout_dur':70.0, 'workout_itv_dur':0, 'workout_prop_dur':40.0, 'steps':{0}, 'FK_athlete_id':1, 'workout_date':'5/6/2021'})

<sqlalchemy.engine.result.ResultProxy at 0x7fc483e2e070>

In [47]:
#Can I get the # of the next workout in the database?
get_next_workout_id()

5

In [76]:
test_params = {'int_workout':False, 'workout_dur':85.0, 'workout_itv_dur':5.0, 'workout_prop_dur':60.0, 'steps':{0}}

In [79]:
test_workout = {'workout_id': -1,
                 'int_workout': False,
                 'workout_dur': 0.0,
                 'workout_itv_dur': 0.0,
                 'workout_prop_dur': 0.0,
                 'steps': {0},
                 'FK_athlete_id': -1,
                 'workout_date': '01/01/1901'}

In [81]:
test_workout = {key: test_params.get(key, test_workout[key]) for key in test_workout}
test_workout

{'workout_id': -1,
 'int_workout': False,
 'workout_dur': 85.0,
 'workout_itv_dur': 5.0,
 'workout_prop_dur': 60.0,
 'steps': {0},
 'FK_athlete_id': -1,
 'workout_date': '01/01/1901'}

---
Back to developing functions:

In [82]:
def create_workout(athlete_id, date, params):
    '''
    This creates a single workout for a single date for an athlete, with the 
    characteristics defined in the params. 
    Params can be a dict type for ease of reference 
    '''
    
    #First define the insert string:
    ins = workouts_table.insert().values(get_table_columns('athlete', 'workouts'))
    
    #Starting point for a workout:
    this_workout = {'workout_id': -1,
                 'int_workout': False,
                 'workout_dur': 0.0,
                 'workout_itv_dur': 0.0,
                 'workout_prop_dur': 0.0,
                 'steps': {0},
                 'FK_athlete_id': -1,
                 'workout_date': '01/01/1901'}
    
    #Map the params values to the columns inserting into, that match:
    this_workout = {key: params.get(key, this_workout[key]) for key in this_workout}
    this_workout['FK_athlete_id'] = athlete_id
    this_workout['workout_date'] = date
    this_workout['workout_id'] = get_next_workout_id()
    
    db.execute(ins, this_workout)
    

In [83]:
create_workout(1, '5/6/2021', test_params)

In [84]:
#Is the workout database iterated?
get_next_workout_id()

6

In [85]:
#Yes!

---
Next we need a function to update workouts:

In [74]:
workout_query = '''
SELECT *
FROM athlete.workouts
WHERE workout_id = 2
'''

In [75]:
json_str = pd.read_sql(workout_query, db).to_json(orient='index')

In [76]:
json_data = json.loads(json_str)

In [77]:
json_data

{'0': {'workout_id': 2,
  'int_workout': False,
  'workout_dur': 60.0,
  'workout_itv_dur': 0.0,
  'workout_prop_dur': 40.0,
  'steps': [0],
  'FK_athlete_id': 1,
  'workout_date': 1620259200000}}

In [78]:
test_params = {'workout_dur':75}

In [79]:
update_params = pd.DataFrame.from_dict(test_params, orient='index').to_json()

In [87]:
updates = json.loads(update_params)
updates

{'0': {'workout_dur': 75}}

In [81]:
json_data['0']

{'workout_id': 2,
 'int_workout': False,
 'workout_dur': 60.0,
 'workout_itv_dur': 0.0,
 'workout_prop_dur': 40.0,
 'steps': [0],
 'FK_athlete_id': 1,
 'workout_date': 1620259200000}

In [82]:
json_data['0']

{'workout_id': 2,
 'int_workout': False,
 'workout_dur': 60.0,
 'workout_itv_dur': 0.0,
 'workout_prop_dur': 40.0,
 'steps': [0],
 'FK_athlete_id': 1,
 'workout_date': 1620259200000}

In [88]:
[updates['0'].get(key, json_data['0'][key]) for key in json_data['0']]

[2, False, 75, 0.0, 40.0, [0], 1, 1620259200000]

In [89]:
json_data = {'0':{key: updates['0'].get(key, json_data['0'][key]) for key in json_data['0']}}

In [90]:
json_data

{'0': {'workout_id': 2,
  'int_workout': False,
  'workout_dur': 75,
  'workout_itv_dur': 0.0,
  'workout_prop_dur': 40.0,
  'steps': [0],
  'FK_athlete_id': 1,
  'workout_date': 1620259200000}}

In [110]:
list(json_data.items())[0][1]

{'workout_id': 2,
 'int_workout': False,
 'workout_dur': 75,
 'workout_itv_dur': 0.0,
 'workout_prop_dur': 40.0,
 'steps': [0],
 'FK_athlete_id': 1,
 'workout_date': 1620259200000}

In [113]:
upd = workouts_table.update().values(get_table_columns('athlete', 'workouts'))
str(upd)

'UPDATE athlete.workouts SET workout_id=%(workout_id)s, int_workout=%(int_workout)s, workout_dur=%(workout_dur)s, workout_itv_dur=%(workout_itv_dur)s, workout_prop_dur=%(workout_prop_dur)s, steps=CAST(%(steps)s AS JSONB[]), "FK_athlete_id"=%(FK_athlete_id)s, workout_date=%(workout_date)s'

In [128]:
upd

<sqlalchemy.sql.dml.Update object at 0x7fb02a61d5e0>

In [118]:
ms = 1620259200000

In [121]:
datetime.datetime.fromtimestamp(ms/1000.0).strftime('%Y-%m-%d')

'2021-05-05'

In [143]:
stmt = (
    update(workouts_table).
    where(workouts_table.c.workout_id == 1).
    values(test_updates)
)

In [144]:
db.execute(stmt)

<sqlalchemy.engine.result.ResultProxy at 0x7fb02ca40a90>

In [145]:
def update_workout(workout_id, new_params):
    
    stmt = (
    update(workouts_table).
    where(workouts_table.c.workout_id == workout_id).
    values(new_params)
    )
    
    #And execute:
    db.execute(stmt)
    

In [148]:
test_updates = {'workout_dur':6}

In [149]:
update_workout(3, test_updates)