# Sleep Regularity Measurement
Sleep regularity is a gauge of how consistent a person’s sleep patterns are, based on the day-to-day variability in their sleep–wake times. There are at least five different metrics that can be used to quantify sleep regularity, each capturing different aspects of it and useful in its own way. The five measures of sleep regularity that we’ll look at in this blog post are listed below:

Traditional/Overall Metrics:
- Individual Standard Deviation (StDev)
- Interdaily Stability (IS)
- Social Jet Lag (SJL)

Newer Metrics:
- Composite Phase Deviation (CPD)
- Sleep Regularity Index (SRI)

All the above metrics can be transferable to behaviors beyond sleep, e.g., exercise and steps. In this work, we look into regularity in all three behaviors.

## IMPORTING LIBRARIES

In [39]:
import datetime as dt
import json
import warnings

import numpy as np
import pandas as pd
from scipy import stats
from tqdm import tqdm

from notebooks_feature_engineering.feature_engineering_functions import interdaily_stability
from notebooks_preprocessing_and_dataframe_creation.preprocessing_functions import fitbit_intraday_sleep
pd.set_option('display.max_rows', 100)

## READING DATA

In [40]:
# Read the daily unified dataframe
df = pd.read_pickle('.\\..\\data\\unified_dataframe\\data_preprocessed.pkl')
df_sleep = df[["id", "date", "nightly_temperature", "full_sleep_breathing_rate", "sleep_duration", "minutesToFallAsleep",
                           "minutesAsleep", "minutesAwake", "minutesAfterWakeup", "sleep_efficiency",
                           "sleep_deep_ratio", "sleep_wake_ratio", "sleep_light_ratio", "sleep_rem_ratio", "startTime", "endTime"]]
df_sleep.head()

Unnamed: 0,id,date,nightly_temperature,full_sleep_breathing_rate,sleep_duration,minutesToFallAsleep,minutesAsleep,minutesAwake,minutesAfterWakeup,sleep_efficiency,sleep_deep_ratio,sleep_wake_ratio,sleep_light_ratio,sleep_rem_ratio,startTime,endTime
0,621e2e8e67b776a24055b564,2021-05-24,34.137687,14.8,31260000.0,0.0,445.0,76.0,0.0,93.0,1.243243,0.987013,0.921642,1.341772,2021-05-24T00:40:00.000,2021-05-24T09:21:00.000
1,621e328667b776a240281372,2021-05-24,33.97312,14.6,27240000.0,0.0,399.0,54.0,0.0,95.0,0.986206,0.963636,0.984866,0.9875,,
2,621e326767b776a24012e179,2021-05-24,33.97312,14.4,27240000.0,0.0,399.0,54.0,0.0,95.0,0.986206,0.963636,0.984866,0.9875,,
3,621e332267b776a24092a584,2021-05-24,33.97312,14.6,27240000.0,0.0,399.0,54.0,0.0,95.0,0.986206,0.963636,0.984866,0.9875,,
4,621e333567b776a240a0c217,2021-05-24,33.97312,14.6,27240000.0,0.0,399.0,54.0,0.0,95.0,0.986206,0.963636,0.984866,0.9875,,


In [41]:
# Calculate sleep time ranges per day
df_intraday = df_sleep[['id', 'date', 'startTime', 'endTime']]
df_intraday.date = pd.to_datetime(df_intraday.date)
df_intraday.startTime = pd.to_datetime(df_intraday.startTime)
df_intraday.endTime = pd.to_datetime(df_intraday.endTime)
df_intraday.drop_duplicates(subset=['date', 'id'], inplace=True)
df_intraday['startTime'] = df_intraday['startTime'].dt.round('H')
df_intraday['endTime'] = df_intraday['endTime'].dt.round('H')
df_intraday.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_intraday.date = pd.to_datetime(df_intraday.date)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_intraday.startTime = pd.to_datetime(df_intraday.startTime)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_intraday.endTime = pd.to_datetime(df_intraday.endTime)
A value is trying to be set on a 

Unnamed: 0,id,date,startTime,endTime
0,621e2e8e67b776a24055b564,2021-05-24,2021-05-24 01:00:00,2021-05-24 09:00:00
1,621e328667b776a240281372,2021-05-24,NaT,NaT
2,621e326767b776a24012e179,2021-05-24,NaT,NaT
3,621e332267b776a24092a584,2021-05-24,NaT,NaT
4,621e333567b776a240a0c217,2021-05-24,NaT,NaT


In [57]:
# create the sleep/awake dataframe per user
user_ids = set(df_intraday.id)
df_users = pd.DataFrame()
for user in tqdm(user_ids):
    # select user's data
    df_user = df_intraday.loc[df_intraday.id == user]
    # fill days without any sleep time
    df_user.loc[:, 'startHour'] = df_user.startTime.dt.hour
    df_user.loc[:, 'endHour'] = df_user.endTime.dt.hour
    mode_sleep_time = stats.mode(df_user.startHour, keepdims=True).mode[0]
    mode_awake_time = stats.mode(df_user.endHour, keepdims=True).mode[0]
    # convert type
    # todo
    # df_user.startHour = df_user.startHour.astype('Int64')
    # df_user.endHour = df_user.endHour.astype('Int64')
    # now fill na
    df_user.startHour.fillna(mode_sleep_time, inplace=True)
    df_user.endHour.fillna(mode_awake_time, inplace=True)
    if (mode_sleep_time >= 0) and (mode_sleep_time <= 4):
        warnings.warn("WARNING: User with most common sleep time after midnight -> Date conversion will be incorrect")
    # convert to datetime
    df_user.loc[:, "startTime"] = df_user.apply(lambda row: dt.datetime.combine(row.date, dt.time(int(row.startHour))) if pd.isna(row.startTime) else row.startTime, axis=1)
    df_user.loc[:, "endTime"] = df_user.apply(lambda row: dt.datetime.combine(row.date + dt.timedelta(days=1), dt.time(int(row.endHour))) if pd.isna(row.endTime) else row.endTime, axis=1)
    # extract time range column
    df_user.loc[:, "time_range"] = df_user.apply(lambda row: np.nan if pd.isna(row.startTime) else pd.date_range(row.startTime, row.endTime, freq='H'), axis=1)
    # extract the asleep hours per day in a list format
    hours_asleep = df_user.time_range
    hours_asleep.dropna(inplace=True)
    timestamps = []
    timestamps.extend(hours_asleep.apply(lambda time_range: [str(x) for x in time_range]))  # create a list of sleep hours
    timestamps = [item for sublist in timestamps for item in sublist]  # flatten list of lists into normal list
    timestamps = pd.Series(pd.to_datetime(timestamps, infer_datetime_format=True))
    # resample the user's dataframe
    df_user = df_user.set_index('date').resample('1H').asfreq().reset_index(drop=False)
    # add new column for sleep/awake state
    df_user.loc[:, 'sleep'] = df_user.date.apply(lambda d: d in timestamps.values)
    # add user id
    df_user.id = user
    # drop unnecessary columns
    df_user.drop(['startTime', 'endTime', 'time_range', 'startHour', 'endHour'], inplace=True, axis=1)
    # append to common dataframe for all users
    df_users.append(df_user)
df_users.head(1000)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_user.loc[:, 'startHour'] = df_user.startTime.dt.hour
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_user.loc[:, 'endHour'] = df_user.endTime.dt.hour
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_user.startHour.fillna(mode_sleep_time, inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the document

In [124]:
#groupby, resample and interpolate
df_intraday.drop('time_range', axis=1, inplace=True)
df_users = df_intraday.groupby('id').apply(lambda x : x.set_index('date')
                                           .resample('1H')
                                           .asfreq()
                                           ).reset_index(level=0, drop=True).reset_index()
df_users.head(1000)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_intraday.drop('time_range', axis=1, inplace=True)


Unnamed: 0,date,id,startTime,endTime
0,2021-05-24 00:00:00,621e2e8e67b776a24055b564,2021-05-24 01:00:00,2021-05-24 09:00:00
1,2021-05-24 01:00:00,,NaT,NaT
2,2021-05-24 02:00:00,,NaT,NaT
3,2021-05-24 03:00:00,,NaT,NaT
4,2021-05-24 04:00:00,,NaT,NaT
...,...,...,...,...
995,2021-07-04 11:00:00,,NaT,NaT
996,2021-07-04 12:00:00,,NaT,NaT
997,2021-07-04 13:00:00,,NaT,NaT
998,2021-07-04 14:00:00,,NaT,NaT


In [127]:
df_users.loc[:, 'sleep'] = df_users.date.apply(lambda d: d in timestamps.values)
df_users.head(1000)

  df_users.loc[:, 'sleep'] = df_users.date.apply(lambda d: d in timestamps.values)


Unnamed: 0,date,id,startTime,endTime,sleep
0,2021-05-24 00:00:00,621e2e8e67b776a24055b564,2021-05-24 01:00:00,2021-05-24 09:00:00,True
1,2021-05-24 01:00:00,,NaT,NaT,True
2,2021-05-24 02:00:00,,NaT,NaT,True
3,2021-05-24 03:00:00,,NaT,NaT,True
4,2021-05-24 04:00:00,,NaT,NaT,True
...,...,...,...,...,...
995,2021-07-04 11:00:00,,NaT,NaT,True
996,2021-07-04 12:00:00,,NaT,NaT,True
997,2021-07-04 13:00:00,,NaT,NaT,True
998,2021-07-04 14:00:00,,NaT,NaT,True


# SLEEP REGULARIT MEASUREMENT

## Individual Standard Deviation (StDev)
We are not working on this feature here, as it is a pure statistical feature, also available in tsfresh

## Interdaily Stability (IS)


In [1]:
import pyActigraphy
import os
fpath = os.path.join(os.path.dirname(pyActigraphy.__file__),'tests/data/')
raw = pyActigraphy.io.read_raw_awd(fpath+'example_01.AWD')
print(raw)

<pyActigraphy.io.awd.awd.RawAWD object at 0x000001AA21165D60>


In [3]:
raw.IS(binarize=False)

0.470359109153365

In [4]:
acti = raw.data

In [5]:
acti1 = acti.resample('1H').sum()

In [7]:
acti2 = acti1.copy()

In [20]:
import random

for i in range(len(acti2)):
    acti2.iloc[i] = (i % 24)

In [11]:
interdaily_stability(acti1)

0.470359109153365

In [21]:
interdaily_stability(acti2)

1.0557225350917545

## Social Jet Lag (SJL)

## Composite Phase Deviation (CPD)

## Sleep Regularity Index (SRI)