# Sleep Dataset Generator

By Kenneth Burchfiel
Released under the MIT License

This script creates a fictional set of sleep data that can serve as the data source for our baby_sleep_analysis script.

**Note: this dataset is *NOT* intended to be an accurate representation of typical infant sleep schedules!**


In [1]:
import pandas as pd
import numpy as np
import time

Setting the date and time of birth: (make sure to express the time in 24-hour format.)

In [2]:
datob = pd.to_datetime('2024-01-15 13:37') # datob = 'date and time of birth'
# Creating our first sleep record, which we can then use to initialize
# our sleep DataFrame:
sleep_start = datob + pd.to_timedelta(174, unit = 'minutes')
sleep_end = sleep_start + pd.to_timedelta(72, unit = 'minutes')

# Creating a days_old value: (These values will be useful in determining how to 
# simulate our sleep start and end times, since the sleep onset and duration
# times will change as the baby gets older):

days_old = (sleep_start - datob).days

# Determining when each night should begin and end: (This will be relevant
# to our calculations, as night sleep should appear different than day sleep
# after the baby begins to differentiate day from night (which can take longer
# than parents might wish!)
night_start_hour = 19
night_end_hour = 7

sleep_start, sleep_end, days_old

(Timestamp('2024-01-15 16:31:00'), Timestamp('2024-01-15 17:43:00'), 0)

Setting up our random number generator:

In [3]:
rng = np.random.default_rng(738)
# https://numpy.org/doc/stable/reference/random/index.html?highlight=random#module-numpy.random
rng

Generator(PCG64) at 0x23805B69D20

Initializing df_sleep with the sleep data created above:

In [4]:
df_sleep = pd.DataFrame(
    index = [0], data = {
        'Sleep Start':sleep_start, 'Sleep End':sleep_end, 
        'days_old':days_old})
df_sleep

Unnamed: 0,Sleep Start,Sleep End,days_old
0,2024-01-15 16:31:00,2024-01-15 17:43:00,0


## Creating a function that adds a new row of sleep data to the dataset:

This function applies the following assumptions when calculating new sleep entries:

1. The time in between sleep periods will increase with age.
2. The length of the initial nighttime sleep period will also increase with age.
3. Subsequent nighttime periods will be shorter than the initial periods.
4. The onset of nighttime sleep should decrease with age.

The function also adds random values to the sleep data so that it will appear more believable.

In [5]:
def add_sleep_entry_to_dataset(df_sleep):
    # Identifying the most recent sleep_start, sleep_end, and days_old
    # values within the dataset:
    latest_row = df_sleep.iloc[-1].copy()
    previous_sleep_start = latest_row['Sleep Start']
    if (previous_sleep_start.hour >= night_start_hour) | (
        previous_sleep_start.hour <= night_end_hour):
        previous_sleep_type = 'nighttime'    
    else:
        previous_sleep_type = 'daytime'
    previous_sleep_end = start_time = latest_row['Sleep End']
    previous_days_old = latest_row['days_old']
    # Determining how long the baby will be awake between sleep periods:
    # The following code calculates this period as a base time of 
    # around 48 to 72 minutes plus a random
    # value that will increase as the child gets older.
    # It also limits the maximum awake time at 16 hours, though I imagine
    # that it will take most children quite a while to stay up this long.
    awake_time = int(
        min(60 * (0.8 + rng.random() * 0.4) 
            + (previous_days_old /2) * rng.random(), 960))
    # Note: rng.random() returns a value greater than or equal to 0 
    # and less than 1, encompasses the range [0,  1). Therefore, 
    # in the above code, 60 will get multiplied by a value greater than or 
    # equal to 0.8 and less than 1.2 (resulting in a base time 
    # of ~48-72 minutes).
    sleep_start = previous_sleep_end + pd.to_timedelta(
        awake_time, unit = 'minutes')
    if (sleep_start.hour >= night_start_hour) | (
        sleep_start.hour <= night_end_hour):
        sleep_type = 'nighttime'
    else:
        sleep_type = 'daytime'
    if sleep_type == 'nighttime':
        
        # The following line pushes the onset of sleep by a value equal to 
        # 84 minus 2 times the baby's age in days. This is meant to simulate
        # an increasingly early bedtime as the child reaches 6 weeks of age.
        # (Marc Weissbluth, in Healthy Sleep Habits, Happy Child, notes
        # that bedtimes can get earlier after babies turn 6 weeks old,
        # although this simulation creates a linear relationship between
        # age and bedtime onset rather than a shift at 6 weeks).
        
        sleep_start += pd.to_timedelta(
            max((168 - previous_days_old * 4), 0), unit = 'minutes')
        if previous_sleep_type != 'nighttime': 
            # In this case, this period will be the baby's
            # first sleep period to start at night, so we'll make it 
            # longer than subsequent nighttime sleep periods.
            # We'll assume that, by 16 weeks (112 days), our fictional 
            # infant will be able to sleep
            # up to 12 hours each night. (This won't be the case for many 
            # babies, of course!!)
            # Thus, we'll start with a baseline sleep period of around
            # 48-60 minutes, then extend this period by up to 11 hours 
            # depending on the child's age.
            # Since 660 (11 hours in minutes) / 112 days equals 
            # around 5.9, we'll add
            # 5.9 * the child's age (in days) * a random value between 
            # 0 and 1 to our sleep total.
            # We'll also limit this value to 660 so that the baby 
            # won't continue to sleep longer
            # and longer as he/she gets older.
            sleep_duration = int(
                (60 * (0.8 + rng.random() * 0.2)) 
                + min(previous_days_old * 5.9 * 
                      (0.3 + rng.random() * 0.7), 660))
        else: # In this case, our baby already had at least one nighttime 
            # sleep period,so this one will be restricted to a duration 
            # of around 144 to 180 minutes.
            sleep_duration = int(180 * (0.8 + rng.random() * 0.2))
            
        
    else: # In this case, the child's sleep started during the daytime, 
        # so we'll simulate a nap 
        # by multiplying 120 minutes by a value in the range
        # [0.8, 1.2).
        sleep_duration = int((120 * (0.8 + rng.random() * 0.4)))
        
    sleep_end = sleep_start + pd.to_timedelta(sleep_duration, unit = 'minutes')       

    # Creating the days_old value that corresponds to the time that sleep began:

    days_old = (sleep_start - datob).days
    
    # print(sleep_start, sleep_end, sleep_duration, days_old)   
    new_index = df_sleep.index[-1] + 1
    df_sleep.loc[new_index] = (
        {'Sleep Start':sleep_start, 
        'Sleep End':sleep_end, 
        'days_old':days_old})
    

Extending our sleep table by running add_sleep_entry_to_dataset until our fictional baby is 180 days old:

In [6]:
while df_sleep.iloc[-1]['days_old'] <= 180:
    # print(df_sleep.iloc[-1]['days_old'])
    add_sleep_entry_to_dataset(df_sleep)

Now that we've created the dataset, we can remove the days_old column, since our sleep analysis script will create a similar field.

In [7]:
df_sleep.drop('days_old', axis = 1, inplace = True)
df_sleep

Unnamed: 0,Sleep Start,Sleep End
0,2024-01-15 16:31:00,2024-01-15 17:43:00
1,2024-01-15 18:46:00,2024-01-15 21:01:00
2,2024-01-16 01:00:00,2024-01-16 01:49:00
3,2024-01-16 05:48:00,2024-01-16 08:44:00
4,2024-01-16 09:53:00,2024-01-16 12:06:00
...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00
957,2024-07-14 04:18:00,2024-07-14 06:42:00
958,2024-07-14 08:38:00,2024-07-14 10:55:00
959,2024-07-14 13:13:00,2024-07-14 15:27:00


## Saving this dataset to a .csv file so that it can be processed by our Baby Sleep Analysis script:

In [8]:
df_sleep.to_csv('sleep_dataset.csv', index = False)

In [9]:
df_sleep

Unnamed: 0,Sleep Start,Sleep End
0,2024-01-15 16:31:00,2024-01-15 17:43:00
1,2024-01-15 18:46:00,2024-01-15 21:01:00
2,2024-01-16 01:00:00,2024-01-16 01:49:00
3,2024-01-16 05:48:00,2024-01-16 08:44:00
4,2024-01-16 09:53:00,2024-01-16 12:06:00
...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00
957,2024-07-14 04:18:00,2024-07-14 06:42:00
958,2024-07-14 08:38:00,2024-07-14 10:55:00
959,2024-07-14 13:13:00,2024-07-14 15:27:00
