# Deconstructing the Fitbit Sleep Score

In this project I will use different Machine Learning models in order to get a better understanding of the Fitbit Sleep Score. For those people who have a Fitbit, you've probably been wondering how exactly Fitbit comes up with your sleep score. Sometimes you sleep for shorter periods of time with similar amounts of REM and deep sleep but still get a better sleep score. Other times you have rather low amounts of REM and deep sleep but a better score than a night of higher amounts of those. What's the secret behind this?
That's precisely what I will answer throughout this project.

In [None]:
# Import all relevant libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from pprint import pprint
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV

In [None]:
# Read the data
url = 'https://raw.githubusercontent.com/srijp/Fitbit-Sleep-Score/master/Fitbit_Sleep_JB_041219_010720.csv'
sleep_data = pd.read_csv(url)

In [None]:
sleep_data.head()

Unnamed: 0,Start Time,End Time,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep,overall_score
0,30/6/20 21:57,1/7/20 5:59,402,79,40,481,32,282,88,71.0
1,29/6/20 21:35,30/6/20 6:02,444,63,36,507,51,332,61,78.0
2,28/6/20 22:01,29/6/20 6:01,420,60,36,480,37,335,48,78.0
3,27/6/20 22:05,28/6/20 9:27,567,115,51,682,83,390,94,75.0
4,26/6/20 21:40,27/6/20 7:35,495,100,35,595,75,335,85,78.0


In [None]:
# Drop the last row as it doesn't have any sleep score data
sleep_data.dropna(subset=['overall_score'], inplace=True)

For now I will focus on the columns from Minutes Asleep to Minutes Deep Sleep as the features and the overall_score as the label as that most closely resembles the data that the Fitbit App provides to its users. The Number of Awakenings column seems interesting but isn't provided in the app either so I'll drop that one for now as well.

In [None]:
# Obtain column names for features
feats = sleep_data.columns[2:9]

X = sleep_data[feats].astype(float)
X.drop('Number of Awakenings', axis=1, inplace=True)
drop = ['Time in Bed', 'Minutes Light Sleep', 'Minutes Deep Sleep']
X = X.drop(drop, axis=1)
y = sleep_data['overall_score']
X.corr()

Unnamed: 0,Minutes Asleep,Minutes Awake,Minutes REM Sleep
Minutes Asleep,1.0,0.425129,0.540433
Minutes Awake,0.425129,1.0,-0.096565
Minutes REM Sleep,0.540433,-0.096565,1.0


In [None]:
# Split data into training and validation set
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=42)
# Remember: because now I'm looking at a Random Forest Regressor, scaling is not needed

In [None]:
# Create the model using early stopping and a relatively "slow" learning rate
xgb_regressor = XGBRegressor(random_state=42)

# Fit model to training data
xgb = xgb_regressor.fit(X_train, y_train)



In [None]:
# Define a function for scoring the model and returning its accuracy
def evaluate(model, test_features, test_labels):
    predictions = model.predict(test_features)
    errors = abs(predictions - test_labels)
    mape = 100 * np.mean(errors / test_labels)
    accuracy = 100 - mape
    score = model.score(test_features, test_labels)
    print('Model Performance')
    print('Average Error: {:0.4f}.'.format(np.mean(errors)))
    print('Accuracy = {:0.2f}%.'.format(accuracy))
    print('Score = {:0.4f}.'.format(score))
    return accuracy

In [None]:
evaluate(xgb,X_valid,y_valid)

Model Performance
Average Error: 2.2486.
Accuracy = 96.71%.
Score = 0.8084.


96.70841280192947

In [None]:
# Number of trees to be used
xgb_n_estimators = [int(x) for x in np.linspace(200, 2000, 10)]

# Maximum number of levels in tree
xgb_max_depth = [int(x) for x in np.linspace(2, 20, 10)]

# Minimum number of instaces needed in each node
xgb_min_child_weight = [int(x) for x in np.linspace(1, 10, 10)]

# Tree construction algorithm used in XGBoost
xgb_tree_method = ['auto', 'exact', 'approx', 'hist', 'gpu_hist']

# Learning rate
xgb_eta = [x for x in np.linspace(0.1, 0.6, 6)]

# Minimum loss reduction required to make further partition
xgb_gamma = [int(x) for x in np.linspace(0, 0.5, 6)]

# Learning objective used
xgb_objective = ['reg:squarederror', 'reg:squaredlogerror']

# Create the grid
xgb_grid = {'n_estimators': xgb_n_estimators,
            'max_depth': xgb_max_depth,
            'min_child_weight': xgb_min_child_weight,
            'tree_method': xgb_tree_method,
            'eta': xgb_eta,
            'gamma': xgb_gamma}
            #'objective': xgb_objective}

xgb_grid

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000],
 'max_depth': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
 'min_child_weight': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 'tree_method': ['auto', 'exact', 'approx', 'hist', 'gpu_hist'],
 'eta': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6],
 'gamma': [0, 0, 0, 0, 0, 0]}

In [None]:
# Create the model to be tuned
xgb_base = XGBRegressor()

# Create the random search Random Forest
xgb_random = RandomizedSearchCV(estimator = xgb_base, param_distributions = xgb_grid, 
                                n_iter = 200, cv = 3, verbose = 2, 
                                random_state = 420, n_jobs = -1)

# Fit the random search model
xgb_random.fit(X_train, y_train)

Fitting 3 folds for each of 200 candidates, totalling 600 fits


114 fits failed out of a total of 600.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
3 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/sklearn.py", line 396, in fit
    callbacks=callbacks)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/training.py", line 216, in train
    xgb_model=xgb_model, callbacks=callbacks)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/training.py", line 74, in _train_internal
    bst.update(dtrain, i, obj)
  File "/usr/local/lib/python3.7/dist-packages/

RandomizedSearchCV(cv=3, estimator=XGBRegressor(), n_iter=200, n_jobs=-1,
                   param_distributions={'eta': [0.1, 0.2, 0.30000000000000004,
                                                0.4, 0.5, 0.6],
                                        'gamma': [0, 0, 0, 0, 0, 0],
                                        'max_depth': [2, 4, 6, 8, 10, 12, 14,
                                                      16, 18, 20],
                                        'min_child_weight': [1, 2, 3, 4, 5, 6,
                                                             7, 8, 9, 10],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000],
                                        'tree_method': ['auto', 'exact',
                                                        'approx', 'hist',
                                        

In [None]:
# Get the optimal parameters
xgb_random.best_params_

{'tree_method': 'exact',
 'n_estimators': 400,
 'min_child_weight': 1,
 'max_depth': 2,
 'gamma': 0,
 'eta': 0.4}

In [None]:
tuned_xgb = XGBRegressor(tree_method = 'exact', n_estimators = 400, min_child_weight = 1, max_depth = 2, gamma = 0, eta = 0.4)
tuned_xgb.fit(X_train, y_train)



XGBRegressor(eta=0.4, max_depth=2, n_estimators=400, tree_method='exact')

In [None]:
evaluate(tuned_xgb, X_valid, y_valid)

Model Performance
Average Error: 2.0922.
Accuracy = 96.94%.
Score = 0.8338.


96.94359479850657

In [None]:
# Create the random search Random Forest with cv = 5
xgb_random_3 = RandomizedSearchCV(estimator = xgb_base, param_distributions = xgb_grid, 
                                n_iter = 200, cv = 3, verbose = 2, 
                                random_state = 420, n_jobs = -1)

# Fit the random search model
xgb_random_3.fit(X_train, y_train)

Fitting 3 folds for each of 200 candidates, totalling 600 fits


114 fits failed out of a total of 600.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
2 fits failed with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/sklearn.py", line 396, in fit
    callbacks=callbacks)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/training.py", line 216, in train
    xgb_model=xgb_model, callbacks=callbacks)
  File "/usr/local/lib/python3.7/dist-packages/xgboost/training.py", line 74, in _train_internal
    bst.update(dtrain, i, obj)
  File "/usr/local/lib/python3.7/dist-packages/

RandomizedSearchCV(cv=3, estimator=XGBRegressor(), n_iter=200, n_jobs=-1,
                   param_distributions={'eta': [0.1, 0.2, 0.30000000000000004,
                                                0.4, 0.5, 0.6],
                                        'gamma': [0, 0, 0, 0, 0, 0],
                                        'max_depth': [2, 4, 6, 8, 10, 12, 14,
                                                      16, 18, 20],
                                        'min_child_weight': [1, 2, 3, 4, 5, 6,
                                                             7, 8, 9, 10],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000],
                                        'tree_method': ['auto', 'exact',
                                                        'approx', 'hist',
                                        

In [None]:
# Get the optimal parameters
xgb_random_3.best_params_

{'tree_method': 'exact',
 'n_estimators': 400,
 'min_child_weight': 1,
 'max_depth': 2,
 'gamma': 0,
 'eta': 0.4}

In [None]:
tuned_xgb_3 = XGBRegressor(tree_method = 'exact', n_estimators = 850, min_child_weight = 1, max_depth = 3, gamma = 0, eta = 0.2, objective = 'reg:squarederror')
tuned_xgb_3.fit(X_train, y_train)

XGBRegressor(eta=0.2, n_estimators=850, objective='reg:squarederror',
             tree_method='exact')

In [None]:
evaluate(tuned_xgb_3, X_valid, y_valid)

Model Performance
Average Error: 2.0349.
Accuracy = 96.99%.
Score = 0.8305.


96.98944228108317

In [None]:
# Get random forest for use
rf_base = RandomForestRegressor(random_state=42)
rf_base.fit(X_train, y_train)

RandomForestRegressor(random_state=42)

In [None]:
# Look at feature importances
feature_list = list(X.columns)
importances = list(rf_base.feature_importances_)

# List of tuples with variable and importance ans subsequent sorting
feature_importances = [(feature, round(importance, 2)) for feature, importance in zip(feature_list, importances)]
feature_importances = sorted(feature_importances, key = lambda x: x[1], reverse=True)

# Print out features and corresponding importances
[print('Variable: {:20} Importance: {}'.format(*pair)) for pair in feature_importances]

Variable: Minutes Asleep       Importance: 0.65
Variable: Minutes REM Sleep    Importance: 0.2
Variable: Minutes Awake        Importance: 0.15


[None, None, None]

In [None]:
# Define function for converting hours and minutes into minutes
def hours_to_mins(time):
    hour = time[0]
    mins = time[1]
    mins = mins + hour * 60
    return mins

In [None]:
X_train.columns

Index(['Minutes Asleep', 'Minutes Awake', 'Minutes REM Sleep'], dtype='object')

In [None]:
yesterday = [(7,12), (1,20), (8,32), (1,3), (4,45), (1,24)]

In [None]:
# Define function to transform input times
def get_input(times):
    transformed = []
    for time in times:
        transformed.append(hours_to_mins(time))
    transformed = np.array(transformed)
    transformed = transformed.reshape(1, -1)
    return transformed

In [None]:
evaluate(rf_base.fit(X_train,y_train),X_valid,y_valid)

Model Performance
Average Error: 2.5226.
Accuracy = 96.26%.
Score = 0.7373.


96.25910007634091

In [None]:
# Convert last nights sleep score
last_night = get_input(yesterday)
last_night

array([[432,  80, 512,  63, 285,  84]])