#Exploring Sleep Debt & Health Markers

Literature shows that the type of good health I seek, that of a physically fit athlete, is best examined with the following rubric.

1. According to Wikipedia: "Sleep debt or sleep deficit is the cumulative effect of not getting enough sleep. A large sleep debt may lead to mental or physical fatigue. There are two kinds of sleep debt: the results of partial sleep deprivation and total sleep deprivation." My biggest concern is that sleep deprivation is an indicator of brain degeneration in the elderly. My goal is to work on trending my sleep debt down through sleep hygiene techniques.
2. Resting heart rate (RHR) can be the result of a high level of fitness, meditative breathing, and sleep. There are other reasons, such as disease, diet, family history, and medication, which can result in a low resting heart rate. For my health purposes, I expect to see that when I am healthy and doing self-care (such as meditation, high cardio workouts, sleep hygiene, etc.), my resting heart rate will be relatively steady. And if trends present themselves, the RHR will be trending down.
3. Maximum heart rate (MHR) is the number of beats the heart makes during strenuous exercise. While a high heart rate can be a sign of a medical issue, it is also an indicator of good health. If during use, the heart beats more often as a result of the exercise, it can be a sign of proper conditioning. There is always a countervailing pressure that as the heart becomes more conditioned, it beats slower. My goal is to grow my MHR or at least stay on an even trendline.
4. Heart rate variability (HRV) is the interplay between the sympathetic and parasympathetic branches of the autonomic nervous system. During relaxation, the HRV goes up, and during stress and exercise, the HRV goes down. HRV plays a role as a good indicator of recovery from illness, anxiety, and training. My goal is to drive my average HRV score up over time.
5. Calories burned beyond primary body function are an indication of a strengthened heart which lowers the RHR, encourages the MHR upward, and promotes healthy sleep patterns.
6. REM sleep is a phase of sleep that lasts approximately 25% of any sleep period. It is commonly associated with the processing of learning, memory, and emotional state. Excessive strain, alcohol, and medical issues can suppress REM sleep. While there are exceptions due to circumstances of medicines, drink, and illness, a healthy sleep pattern should involve rough parity between time spent in REM and deep sleep.
7. Deep sleep, or slow-wave sleep, is a phase of sleep most commonly associated with the body recovering from physical exertion during the previous awake period. Excessive strain, alcohol, and medical issues can suppress deep sleep (also called delta wave sleep). While there are exceptions due to circumstances of medicines, alcohol, and illness, a healthy sleep pattern should involve rough parity between time spent in REM and deep sleep.

##Personal Fitness Markers

This study uses personal fitness markers I gathered over the period between February 7, 2018, and February 7, 2020. These two years constitute a big transition in my life between being able to devote a reasonable amount of time and effort to my physical fitness, through some bad injuries, and into a change of life direction into less physical exercise due to injury. These markers were gathered daily through a fitness tracker from Whoop and other methods.  

In [1086]:
#import libraries and methods
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
import matplotlib.pyplot as plt

In [1087]:
!pip install category_encoders
import category_encoders as ce



In [1088]:
#set some options for the notebook

pd.options.display.max_rows = 50
pd.options.mode.chained_assignment = None

In [1089]:
#Import the data from a live Google spreadsheet in CSV. This file is updated regularly.

df = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vTMIUP6_JIoxWSAFCe1h6Hz12r-41t6qHv5cCXIBmYJUK2KS188pKkZnkr4jJRpIcC3mRZV36z21oNv/pub?gid=0&single=true&output=csv')

In [1090]:
#Adding a proper header to the file. Going to do some feature engineering,adding columns, resetting index.

header_list = ['Date','Weight','Fat','Sleep Debt','REM','Deep Sleep','Snore',
     'Meditate','Spanish','Push-ups','Pull-ups','Sit-ups','Coffee','Handstands',
     'Acro','Swing','Strain','Calories','AHR','MHR','HRV','RHR','Recovery',
     'Carbs','Journal','Spinal Mobility','Flexibility','Notes','DOW',
     'Weight_AVR','Fat_AVR','Sleep Debt_AVR','REM_AVR','Deep Sleep_AVR',
     'Strain_AVR','Calories_AVR','AHR_AVR','MHR_AVR','HRV_AVR','RHR_AVR',
     'Recovery_AVR','Weight_PASS','Fat_PASS','Sleep Debt_PASS','REM_PASS',
     'Deep Sleep_PASS','Strain_PASS','Calories_PASS','AHR_PASS','MHR_PASS',
     'HRV_PASS','RHR_PASS','Recovery_PASS']

df = df.reindex(columns = header_list)

In [1091]:
#Setting date column to pandas datetime object and then engineering a day of the week feature.

df['Date'] = pd.to_datetime(df['Date'])

for i in range(len(df)):
    df['DOW'][i] = df['Date'][i].day_name()

In [1092]:
#Lists of features to be used in analysis. These are subsets of header_list.

features = ['Weight','Fat','Sleep Debt','REM','Deep Sleep','Strain','Calories','AHR','MHR','HRV','RHR','Recovery']

special_features = ['Sleep Debt','AHR','RHR']

#Note: Dropping DOW which was engineered directly above.
drops = ['Acro','Meditate','Snore','Coffee','Handstands','Spanish','Push-ups',
         'Pull-ups','Sit-ups','Swing','Carbs','Journal','Spinal Mobility',
         'Flexibility','Notes','DOW']

week_day = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']

In [1093]:
#Dropping columns which will not be used in this iteration of this analysis.

df = df.drop(labels=drops,axis=1)

In [1094]:
#Interpolating columns for missing values. 

for each in features:
    df[each] = df[each].interpolate(method='linear')

In [1095]:
#Engineering an average column for each feature.

for i in range(0,df.shape[0]-1):
    for j in range(len(features)):
        string = features[j] + '_AVR'
        df[string] = df.iloc[:,j+1].expanding(min_periods=7).mean()

In [1096]:
#Using these averages to create a new feature which promotes the marker responses I seek.
#For instance I want my sleep debt to be rewarded for going below the average while I want 
#my MHR to be rewarded for rising.

for i in range(len(df)):
  for feat in features:
    score = feat + '_PASS'
    avr = feat + '_AVR'
    if feat not in special_features:
      if df[feat][i] >= df[avr][i]:
        df[score][i] = "Y"
      else:
        df[score][i] = "N"
    elif feat in special_features:
      if df[feat][i] <= df[avr][i]:
        df[score][i] = "Y"
      else:
        df[score][i] = "N"

##Visualizations of basic trends.

In [None]:
##Predictive Modeling

In [None]:
test = df[df['Date'] >= '11/29/2019']
train = df[df['Date'] <= '07/24/2019']
val = df[(df['Date'] <= '11/28/2019') & (df['Date'] >= '07/25/2019')]

In [None]:
test.shape, train.shape, val.shape

In [None]:
target = 'Sleep Debt_PASS'
# Get a dataframe with all train columns except the target
train_features = train.drop(columns=[target])

# Get a list of the numeric features
numeric_features = train_features.select_dtypes(include='number').columns.tolist()

# Get a series with the cardinality of the nonnumeric features
cardinality = train_features.select_dtypes(exclude='number').nunique()

# Get a list of all categorical features with cardinality <= 50
categorical_features = cardinality[cardinality <= 25].index.tolist()

# Combine the lists 
features = numeric_features + categorical_features

In [None]:
X_train = train[features]
y_train = train[target]
X_val = val[features]
y_val = val[target]
X_test = test[features]
y_test = test[target]

In [None]:
y_train.value_counts(normalize=True)

encoder = ce.OneHotEncoder(use_cat_names=True)
imputer = SimpleImputer(strategy='mean')
rf = RandomForestClassifier(max_features='sqrt',n_estimators=100,n_jobs=-1, random_state=42)
X_train = encoder.fit_transform(X_train)
X_val = encoder.fit_transform(X_val)
X_test = encoder.fit_transform(X_test)
#y_train = encoder.fit_transform(y_train)
#y_val = encoder.fit_transform(y_val)
#y_test = encoder.fit_transform(y_test)
X_train = imputer.fit_transform(X_train)
X_val = imputer.fit_transform(X_val)
X_test = imputer.fit_transform(X_test)
#y_train = imputer.fit_transform(y_train)
#y_val = imputer.fit_transform(y_val)
#y_test = imputer.fit_transform(y_test)
X_train = rf.fit(X_train,y_train)
X_val = rf.fit(X_val,y_val)
X_test = rf.fit(X_test,y_test)
#y_train = rf.fit_transform(y_train)
#y_val = rf.fit_transform(y_val)
#y_test = rf.fit_transform(y_test)

In [None]:
pipeline = make_pipeline( 
    ce.OneHotEncoder(use_cat_names=True),
    SimpleImputer(strategy='mean'), 
    RandomForestClassifier(max_features='sqrt',n_estimators=100,n_jobs=-1, random_state=42)
)

In [None]:
pipeline.fit(X_train, y_train)

print('Validation Accuracy', pipeline.score(X_val, y_val))

In [None]:
y_pred = pipeline.predict(X_test)

print("X_train:",X_train.shape)
print("train:",train.shape)
print("val:",val.shape)
print("X_val:",X_val.shape)
print("test:",test.shape)

X_train.head()

In [None]:
y_pred

In [None]:
pipeline.score(X_test,y_test)

list(X_train.columns.values)

In [None]:

plt.plot(df['REM_AVR'])
plt.plot(df['Deep Sleep_AVR'])

plt.title('Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['REM'])
plt.plot(df['REM_AVR'])

plt.title('Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['Sleep Debt'])
plt.plot(df['Sleep Debt_AVR'])

plt.title('Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['Deep Sleep'])
plt.plot(df['Deep Sleep_AVR'])

plt.title('Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['Sleep Debt_AVR'],color="blue")
plt.plot(df['HRV_AVR'])
plt.plot(df['RHR_AVR'])
plt.title('Heart')
plt.xlabel('Date')
plt.ylabel('Minutes');


In [None]:
plt.plot(df['Deep Sleep_AVR'],color="red")
plt.plot(df['Sleep Debt_AVR'],color="green")
plt.plot(df['REM_AVR'],color="purple")
plt.title('Sleep')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(train['Sleep Debt'])
plt.plot(train['HRV'])
plt.plot(train['RHR'])
plt.title('Heart')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(val['Sleep Debt'])
plt.plot(val['HRV'])
plt.plot(val['RHR'])
plt.title('Heart')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['HRV'])
plt.plot(df['Recovery'])
plt.title('Recovery')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
plt.plot(df['Strain_AVR'])
plt.plot(df['Calories_AVR'])
plt.plot(df['AHR_AVR'])
plt.plot(df['MHR_AVR'])
plt.title('Work')
plt.xlabel('Date')
plt.ylabel('Minutes');

In [None]:
df['Strain'] = np.log(df['Strain'])

In [None]:
len(rf.feature_importances_)

In [None]:
len(X_train.columns)


In [None]:
# Get feature importances
rf = pipeline.named_steps['randomforestclassifier']
importances = pd.Series(rf.feature_importances_, X_train.columns)

# Plot feature importances
%matplotlib inline
import matplotlib.pyplot as plt

n = 20
plt.figure(figsize=(10,n/2))
plt.title(f'Top {n} features')
importances.sort_values()[-n:].plot.barh(color='grey');

from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
y_pred_proba = pipeline.predict_proba(X_val)[:, -1] # Probability for the last class
roc_auc_score(y_val, y_pred_proba)
fpr, tpr, thresholds = roc_curve(y_val, y_pred_proba)
pd.DataFrame({
    'False Positive Rate': fpr, 
    'True Positive Rate': tpr, 
    'Threshold': thresholds
})


import matplotlib.pyplot as plt
plt.scatter(fpr, tpr)
plt.plot(fpr, tpr)
plt.title('ROC curve')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate');