# Baby Sleep Analysis

By Kenneth Burchfiel

Released under the MIT License

This script produces multiple analyses and visualizations of changes in infant sleep data over time. It focuses on the following metrics:

1. The length of babies' longest sleep periods
2. Daytime and nighttime sleep distributions
3. Total sleep amounts
4. How early into the evening nighttime sleep begins

Sample copies of some of the interactive charts created by this script can be found [on this Google Sites](https://sites.google.com/view/sample-baby-sleep-charts/home) page. 

## Instructions for use:

This script was designed to accommodate two different data sources:

1. A .csv export of infant data from the Huckleberry app. (I have no affiliation with Huckleberry; it just happens to be the app that I use for sleep tracking purposes.)
2. A custom .csv file that contains start and end times for each sleep period. (A sample version of this file can be found within 'sleep_dataset.csv'.)

Regardless of which dataset you use, overwrite the 'sleep_dataset.csv' file with your copy of sleep data so that this script can process it correctly.

If you have a different type of app or tracker that also supports .csv exports, you should be able to modify the code so that it supports whatever sleep data format it happens to use.

**Note: by default, this notebook imports and analyzes a set of fictional sleep data stored as 'sleep_dataset.csv'. This dataset is NOT meant to indicate regular infant sleep patterns!** (To see how this table was created, open the 'Sleep Dataset Generator.ipynb' file.)

In [1]:
import time
script_start_time = time.time()
import pandas as pd
import numpy as np
import plotly.express as px
import statsmodels.api as sm

# Part 1: Creating and Updating a Sleep Data Table

In [2]:
data_for_author = False
if data_for_author == True: # This block of code allows me to read in my own 
    # baby's sleep data from a separate folder, then save analyses of that data 
    # to that folder. In order to read your own sleep data,
    # keep data_for_author False, then overwrite 
    # 'sleep_dataset.csv' with your own data file.
    df_personal_data = pd.read_csv('personal_variables.csv', index_col = 'Variable')
    dob = pd.to_datetime(df_personal_data.loc['dob', 'Value'])
    path_to_data = df_personal_data.loc['path_to_data', 'Value']
    max_night_start_date_limit = True
    max_night_start_date = pd.to_datetime(df_personal_data.loc['max_night_start_date', 'Value']).date()
    data_output_folder = df_personal_data.loc['data_and_chart_save_path', 'Value']
    visualizations_folder = df_personal_data.loc['data_and_chart_save_path', 'Value']
else:
    dob = pd.to_datetime('2024-01-15') # Update this field with your baby's 
    # date of birth.
    path_to_data = 'sleep_dataset.csv' # Either replace this name with the 
    # path to your own file
    # or overwrite this .csv file with your data.
    data_output_folder = 'Data_Output/'
    visualizations_folder = 'Visualizations/'
    # The following lines allow you to set a maximum date for your data. 
    # This can be useful if you only have partial data for the most recent 
    # date and don't want it to skew the output of your charts.
    max_night_start_date_limit = False
    max_night_start_date = pd.to_datetime('2024-07-13').date()
    

Defining nighttime start and end hours: (I use a 7 PM start time and a 7 AM end time, but feel free to change these to the hours that best fit your child. Note, however, that very unusual settings (such as a night start hour of 9 AM and a night end hour of 4 PM) may produce incorrect analyses and visualizations.

In [3]:
# These values should be entered as 24-hour-formatted integers 
# (e.g. 19 rather than 7 PM or 19:00)
night_start_hour = 19
night_end_hour = 7 

## Importing sleep data:

Note: This script was originally designed to work with a Huckleberry .csv export, but it can accommodate other data sources as well provided that they list start and end times in YYYY-MM-DD HH:MM:SS format. (See sleep_data.csv as an example of which formats and field names to use.) If your data uses a different format, you can update the following cell to make your dataset compatible with this script.

In [4]:
df_baby_data = pd.read_csv(path_to_data)
df_baby_data.rename(
    columns = {'Start':'Sleep Start', 'End':'Sleep End'}, 
    inplace = True)
for column in ['Sleep Start', 'Sleep End']:
    df_baby_data[column] = pd.to_datetime(df_baby_data[column])  
df_baby_data.sort_values('Sleep Start', inplace = True)
if 'Type' in df_baby_data.columns:
    df_sleep_data = df_baby_data.query(
        "Type == 'Sleep'").copy().reset_index(
        drop=True)[['Sleep Start', 'Sleep End']]
else:
    df_sleep_data = df_baby_data.copy()
df_sleep_data

Unnamed: 0,Sleep Start,Sleep End
0,2024-01-15 16:31:00,2024-01-15 17:43:00
1,2024-01-15 18:46:00,2024-01-15 21:01:00
2,2024-01-16 01:00:00,2024-01-16 01:49:00
3,2024-01-16 05:48:00,2024-01-16 08:44:00
4,2024-01-16 09:53:00,2024-01-16 12:06:00
...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00
957,2024-07-14 04:18:00,2024-07-14 06:42:00
958,2024-07-14 08:38:00,2024-07-14 10:55:00
959,2024-07-14 13:13:00,2024-07-14 15:27:00


Note: the following column adds in two additional entries with unusually long sleep periods. The first entry contains a sleep period that begins during the daytime on one date, goes through the whole night, and then ends on the next date. The second entry is even stranger: it begins during one nighttime period, lasts through the entire subsequent day, and then ends during the next nighttime period.

These entries helped me write some code to handle these sorts of edge cases. However, now that that code has been added in, I commented it out so that the entries wouldn't distort the graphs created by the program or data imported by other users.

In [5]:
# if data_for_author == False: 
#     day_night_day_sleep_period = pd.DataFrame(
#         index = [0], data = {'Sleep Start':pd.to_datetime(
#             '2024-07-14 18:26'),'Sleep End':pd.to_datetime(
#             '2024-07-15 08:43')})
    
#     night_day_night_sleep_period = pd.DataFrame(
#         index = [0], data = {'Sleep Start':pd.to_datetime(
#             '2024-07-15 19:35'),'Sleep End':pd.to_datetime(
#             '2024-07-16 20:24')})
    
    
#     df_sleep_data = pd.concat(
#         [df_sleep_data, 
#          day_night_day_sleep_period,
#          night_day_night_sleep_period]).reset_index(
#         drop=True)
#     df_sleep_data

### Calculating sleep durations (in hours) and data associated with each start and end time:

In [6]:
df_sleep_data['Hours'] = (df_sleep_data['Sleep End'] - df_sleep_data['Sleep Start']
                         ).dt.total_seconds() / 3600

# Note: total_seconds() is used in the above code rather than seconds() 
# in order to ensure that we're retrieving the full value of the 
# duration between two time periods
# rather than just the seconds component of the TimeDelta class.
# See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html 
# for more information.

df_sleep_data['Start Hour'] = df_sleep_data['Sleep Start'].dt.hour
df_sleep_data['End Hour'] = df_sleep_data['Sleep End'].dt.hour


df_sleep_data['Start Date'] = df_sleep_data['Sleep Start'].dt.date
df_sleep_data['End Date'] = df_sleep_data['Sleep End'].dt.date
df_sleep_data


Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date
0,2024-01-15 16:31:00,2024-01-15 17:43:00,1.200000,16,17,2024-01-15,2024-01-15
1,2024-01-15 18:46:00,2024-01-15 21:01:00,2.250000,18,21,2024-01-15,2024-01-15
2,2024-01-16 01:00:00,2024-01-16 01:49:00,0.816667,1,1,2024-01-16,2024-01-16
3,2024-01-16 05:48:00,2024-01-16 08:44:00,2.933333,5,8,2024-01-16,2024-01-16
4,2024-01-16 09:53:00,2024-01-16 12:06:00,2.216667,9,12,2024-01-16,2024-01-16
...,...,...,...,...,...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14
957,2024-07-14 04:18:00,2024-07-14 06:42:00,2.400000,4,6,2024-07-14,2024-07-14
958,2024-07-14 08:38:00,2024-07-14 10:55:00,2.283333,8,10,2024-07-14,2024-07-14
959,2024-07-14 13:13:00,2024-07-14 15:27:00,2.233333,13,15,2024-07-14,2024-07-14


## Adding in nighttime start and end dates (along with columns that store the exact times that these periods began and ended):

These fields will make it easier to determine the daytime and nighttime components of each sleep period.

In [7]:
# Determining the date that nighttime began for the given sleep period:
# if the sleep period began *or* ended before night_end_hour, this date will 
# be the day before End Date. (For instance, if the baby woke up at 
# 6 AM on 2024-07-10, and night_end_hour is set to 7 AM, Night Start Date 
# will be set as 2024-07-09. 
# In addition, if the sleep period's start and end dates
# are different, the onset should always also be the day before End Date. (This 
# condition accounts for cases in which the baby started sleeping before 
# night_start_hour on day 1 and ended sleeping after night_end_hour on day 2.) 
# In all other cases, Night Start Date 
# will be made equal to End Date.

condlist = [
    (df_sleep_data['Start Hour'] < night_end_hour) | 
    (df_sleep_data['End Hour'] < night_end_hour),
    (df_sleep_data['Start Date'] != df_sleep_data['End Date'])
]

choicelist = [
    df_sleep_data['End Date'] - pd.Timedelta(days = 1),
    df_sleep_data['End Date'] - pd.Timedelta(days = 1)
]

df_sleep_data['Night Start Date'] = np.select(condlist, choicelist, 
    df_sleep_data['End Date'])

# Each Night End Date is simply the day after 
# its corresponding Night Start Date:
df_sleep_data['Night End Date'] = (
    pd.to_datetime(df_sleep_data['Night Start Date'])
    + pd.Timedelta(days = 1)).dt.date
    
# Adding night_start_hour to this date in order to store the date
# and hour at which the night began: (This will prove useful
# for daytime and nighttime sleep duration calculations.)
df_sleep_data['Night Start Time'] = (
    pd.to_datetime(df_sleep_data['Night Start Date']) 
    + pd.Timedelta(hours = night_start_hour))

df_sleep_data['Night End Time'] = (
    pd.to_datetime(df_sleep_data['Night End Date']) 
    + pd.Timedelta(hours = night_end_hour))

df_sleep_data

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time
0,2024-01-15 16:31:00,2024-01-15 17:43:00,1.200000,16,17,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00
1,2024-01-15 18:46:00,2024-01-15 21:01:00,2.250000,18,21,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00
2,2024-01-16 01:00:00,2024-01-16 01:49:00,0.816667,1,1,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00
3,2024-01-16 05:48:00,2024-01-16 08:44:00,2.933333,5,8,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00
4,2024-01-16 09:53:00,2024-01-16 12:06:00,2.216667,9,12,2024-01-16,2024-01-16,2024-01-16,2024-01-17,2024-01-16 19:00:00,2024-01-17 07:00:00
...,...,...,...,...,...,...,...,...,...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00
957,2024-07-14 04:18:00,2024-07-14 06:42:00,2.400000,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00
958,2024-07-14 08:38:00,2024-07-14 10:55:00,2.283333,8,10,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00
959,2024-07-14 13:13:00,2024-07-14 15:27:00,2.233333,13,15,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00


In [8]:
df_sleep_data['Start Period'] = np.where(
    (df_sleep_data['Start Hour'] >= night_start_hour) |
    (df_sleep_data['Start Hour'] < night_end_hour), 'Nighttime', 'Daytime')
# An example of this logic in use: if night_start_hour is 19 (7 PM) and
# night_end_hour is 7 (7 AM), all start hours from 19 to 23 and from 0 to 6
# will receive a 'Nighttime' start period categorization.

# Performing similar analyses for the end of each sleep entry:
df_sleep_data['End Period'] = np.where(
    (df_sleep_data['End Hour'] >= night_start_hour) |
    (df_sleep_data['End Hour'] < night_end_hour), 'Nighttime', 'Daytime')

# An upcoming section of code will divide the 'Hours' totals for certain
# rows with particularly long sleep durations into separate daytime and 
# nighttime components. Therefore, the following line creates a copy of
# these rows' original durations for use in later analyses.
df_sleep_data['Original Hours'] = df_sleep_data['Hours'].copy()

# We'll also keep a record of the baby's age (in days, weeks, and months)
# at the time of each of these sleep durations. (We'll create a different
# set of age values after making some later updates to the table,
# but these values will be useful as index values for a pivot table
# that stores the longest sleep durations logged each day.)

df_sleep_data['Original Days Old'] = (pd.to_datetime(
    df_sleep_data['Night Start Date']) - dob).dt.days
df_sleep_data['Original Weeks Old'] = (
    df_sleep_data['Original Days Old'] / 7).astype('int')
df_sleep_data['Original Months Old'] = (
    df_sleep_data['Original Days Old'] / 30).astype('int')



## Accounting for special circumstances:

This code will show the nighttime and daytime sleep durations associated with each entry. However, in order to provide accurate calculations, the code will need to account for a few special situations:

1. Cases in which sleep periods began during the nighttime and ended during the daytime (these are probably quite common)
2. Cases in which a baby fell asleep during the daytime on one day; continued sleeping through the night; and then woke up during the daytime on the following day (if this is your baby, **congratulations** on having such a great sleeper!)
3. Cases in which a baby fell asleep during the nighttime on one day, slept through the entire following day, and then woke up during the following nighttime period (these are likely very rare, but I wanted the code to account for them also)

The following blocks of code update the table so that these special cases can be processed correctly. These updates generally consist of splitting a single entry into two different entries. 

Why are these updates required? There are a few reasons:

1. A later section of the code will use each row's start and end period values in order to determine how to allocate its hours into daytime and nighttime components. For rows with 'Daytime' start *and* end period values, all hours will be assigned to the daytime category; similarly, rows with 'Nighttime' start and end period values will have all of their hours added to the nighttime category. This would result in incorrect values for rows with a day-night-day or night-day-night pattern.

2. If an entry begins at night but then ends during the day, the daytime periods would originally be categorized as belonging to the previous date's daytime period. Therefore, in order to show accurate daytime sleep values for each Night Start Date (the variable used by this script to calculate babies' ages), it's necessary to split these rows into their respective nighttime and daytime components, then assign a new Night Start Date to the row with the daytime component.

### Handling entries that begin during one daytime period, then end during the *next* day's daytime period (e.g. day-night-day entries)

The code will process these cases by creating two copies of the data. The first copy will include the initial daytime and nighttime components whereas the second will include only the second daytime component. The code will then update df_sleep_data by replacing the original version of these rows with the new copies.

In [9]:
# Creating a row that stores both the first daytime component and the full
# nighttime component of these periods:
# (The benefit of keeping these two components together is that it will allow
# for more accurate nighttime sleep start time calculations. (In these cases,
# the nighttime sleep will actually have begun during the daytime period.
# If the data had instead been split into 1st daytime and nighttime + 2nd 
# datetime sections, it would have been more difficult to explain when 
# each night sleep period actually began.

df_day_night_day_pt_1 = df_sleep_data.query(
    "`Start Period` == 'Daytime' & `End Period` == 'Daytime' \
& `Start Date` != `End Date`").copy()

# Filtering out the second daytime component by setting the sleep end variables
# equal to the end of the nighttime period:
df_day_night_day_pt_1['Sleep End'] = df_day_night_day_pt_1['Night End Time']
df_day_night_day_pt_1['End Hour'] = df_day_night_day_pt_1['Sleep End'].dt.hour
df_day_night_day_pt_1['End Period'] = 'Nighttime' # Note: the time reflected
# in Night End Time (which we're using as the end time in this DataFrame)
# would normally be considered 'Daytime', but I'm setting it as 
# 'Nighttime' so that these rows won't be misinterpreted as 
# regular daytime-to-daytime periods, which would interfere with our
# upcoming daytime- and nighttime-specific sleep duration calculations.

# Updating the 'Hours' value to use our new Sleep End setting:
df_day_night_day_pt_1['Hours'] = (df_day_night_day_pt_1['Sleep End'] 
- df_day_night_day_pt_1['Sleep Start']).dt.total_seconds() / 3600
df_day_night_day_pt_1

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old


In [10]:
# Creating a row that stores the second daytime component
# of these periods:
df_day_night_day_pt_2 = df_sleep_data.query(
    "`Start Period` == 'Daytime' & `End Period` == 'Daytime' \
& `Start Date` != `End Date`").copy()

# Filtering out the periods already covered within df_day_night_day_part_1:
df_day_night_day_pt_2['Sleep Start'] = df_day_night_day_pt_2['Night End Time']
df_day_night_day_pt_2['Start Hour'] = df_day_night_day_pt_2[
'Sleep Start'].dt.hour
df_day_night_day_pt_2['Start Date'] = df_day_night_day_pt_2[
'Night End Date']
# Advancing various date-related columns by one day to adjust for the fact
# that this modified sleep period corresponds to a later day than
# the values stored within df_day_night_day_pt_1:
for column in ['Night Start Date', 'Night End Date', 
               'Night Start Time', 'Night End Time']:
    df_day_night_day_pt_2[column] += pd.to_timedelta(1, unit = 'days')
df_day_night_day_pt_2['Hours'] = (
    df_day_night_day_pt_2['Sleep End'] 
    - df_day_night_day_pt_2['Sleep Start']).dt.total_seconds() / 3600

df_day_night_day_pt_2

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old


Replacing the original copies of these rows with the new copies:

(This can be done by adding df_day_night_day_pt_1 and df_day_night_day_pt_2 to a copy of df_sleep_data that doesn't include the rows on which those DataFrames are based.)


In [11]:
# The ~ in the following query statement instructs Python to select only those
# rows that *do not* match the criteria contained within the parentheses.
df_sleep_data = pd.concat(
    [df_sleep_data.query(
    "~(`Start Period` == 'Daytime' & `End Period` == 'Daytime' \
& `Start Date` != `End Date`)").copy(), 
     df_day_night_day_pt_1, df_day_night_day_pt_2]
).reset_index(drop=True).sort_values('Sleep Start')
df_sleep_data

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
0,2024-01-15 16:31:00,2024-01-15 17:43:00,1.200000,16,17,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Daytime,1.200000,0,0,0
1,2024-01-15 18:46:00,2024-01-15 21:01:00,2.250000,18,21,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Nighttime,2.250000,0,0,0
2,2024-01-16 01:00:00,2024-01-16 01:49:00,0.816667,1,1,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Nighttime,0.816667,0,0,0
3,2024-01-16 05:48:00,2024-01-16 08:44:00,2.933333,5,8,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Daytime,2.933333,0,0,0
4,2024-01-16 09:53:00,2024-01-16 12:06:00,2.216667,9,12,2024-01-16,2024-01-16,2024-01-16,2024-01-17,2024-01-16 19:00:00,2024-01-17 07:00:00,Daytime,Daytime,2.216667,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,6.716667,180,25,6
957,2024-07-14 04:18:00,2024-07-14 06:42:00,2.400000,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,2.400000,180,25,6
958,2024-07-14 08:38:00,2024-07-14 10:55:00,2.283333,8,10,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.283333,181,25,6
959,2024-07-14 13:13:00,2024-07-14 15:27:00,2.233333,13,15,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.233333,181,25,6


### Handling night-day-night sleep entries:

We'll perform a similar process for rows whose sleep periods begin during the nighttime; continue through the entire following daytime period, and then finish during the next nighttime period. (I figure this would be pretty unusual, but plenty about babies' sleep patterns can strike adults as unusual!)

In [12]:
# Creating rows that store the first nighttime component af these periods:

df_night_day_night_pt_1 = df_sleep_data.query(
    "`Start Period` == 'Nighttime' & `End Period` == 'Nighttime' \
& `Sleep End` > `Night End Time`").copy()
df_night_day_night_pt_1['Sleep End'] = df_night_day_night_pt_1['Night End Time']
# Note: this end time is considered part of the daytime period; however,
#  their End Period value will be kept as 'Nighttime' so that they won't have
# any of their date values changed by a later script that processes 
# nighttime-to-daytime data.
df_night_day_night_pt_1['End Hour'] = df_night_day_night_pt_1[
'Sleep End'].dt.hour
df_night_day_night_pt_1['Hours'] = (df_night_day_night_pt_1['Sleep End'] 
- df_night_day_night_pt_1['Sleep Start']).dt.total_seconds() / 3600
df_night_day_night_pt_1


df_night_day_night_pt_1

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old


In [13]:
# Creating rows that store both the full daytime component and the 
# second nighttime component.)
df_night_day_night_pt_2 = df_sleep_data.query(
    "`Start Period` == 'Nighttime' & `End Period` == 'Nighttime' \
& `Sleep End` > `Night End Time`").copy()
df_night_day_night_pt_2['Sleep Start'] = df_night_day_night_pt_2[
'Night End Time']
df_night_day_night_pt_2['Start Date'] = df_night_day_night_pt_2[
'Sleep Start'].dt.date
df_night_day_night_pt_2['Start Hour'] = df_night_day_night_pt_2[
'Sleep Start'].dt.hour
# Now that we've used Night End Time to calculate our start time, we can advance
# this and other date-related columns by a single day:
for column in ['Night Start Date', 'Night End Date', 
               'Night Start Time', 'Night End Time']:
    df_night_day_night_pt_2[column] += pd.to_timedelta(1, unit = 'days')
df_night_day_night_pt_2['Start Period'] = 'Daytime' 
df_night_day_night_pt_2['Hours'] = (df_night_day_night_pt_2['Sleep End'] 
- df_night_day_night_pt_2['Sleep Start']).dt.total_seconds() / 3600
df_night_day_night_pt_2

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old


Adding these new DataFrames into a copy of df_sleep_data that doesn't include their original underlying data:

In [14]:
df_sleep_data = pd.concat(
    [df_sleep_data.query(
    "~(`Start Period` == 'Nighttime' & `End Period` == 'Nighttime' \
& `Sleep End` > `Night End Time`)").copy(),
     df_night_day_night_pt_1, df_night_day_night_pt_2]
).reset_index(drop=True).sort_values(
    'Sleep Start')
df_sleep_data

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
0,2024-01-15 16:31:00,2024-01-15 17:43:00,1.200000,16,17,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Daytime,1.200000,0,0,0
1,2024-01-15 18:46:00,2024-01-15 21:01:00,2.250000,18,21,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Nighttime,2.250000,0,0,0
2,2024-01-16 01:00:00,2024-01-16 01:49:00,0.816667,1,1,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Nighttime,0.816667,0,0,0
3,2024-01-16 05:48:00,2024-01-16 08:44:00,2.933333,5,8,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Daytime,2.933333,0,0,0
4,2024-01-16 09:53:00,2024-01-16 12:06:00,2.216667,9,12,2024-01-16,2024-01-16,2024-01-16,2024-01-17,2024-01-16 19:00:00,2024-01-17 07:00:00,Daytime,Daytime,2.216667,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
956,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,6.716667,180,25,6
957,2024-07-14 04:18:00,2024-07-14 06:42:00,2.400000,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,2.400000,180,25,6
958,2024-07-14 08:38:00,2024-07-14 10:55:00,2.283333,8,10,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.283333,181,25,6
959,2024-07-14 13:13:00,2024-07-14 15:27:00,2.233333,13,15,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.233333,181,25,6


### Handling entries that begin during the night and end during the day:

Splitting these rows into nighttime- and daytime-only components will allow the daytime values to be associated with the following day's Night Start Date, thus preventing our daytime sleep totals for a given date from potentially including values from two different daytime periods.

In [15]:
# Calculating the nighttime component of these dates (which will extend from
# the original start time to Night End Time):
df_night_to_day_nighttime = df_sleep_data.query(
    "`Start Period` == 'Nighttime' & `End Period` == 'Daytime'").copy()
df_night_to_day_nighttime['Sleep End'] = df_night_to_day_nighttime[
'Night End Time']
df_night_to_day_nighttime['End Hour'] = df_night_to_day_nighttime[
'Sleep End'].dt.hour
df_night_to_day_nighttime['Hours'] = (
    df_night_to_day_nighttime['Sleep End'] - df_night_to_day_nighttime[
    'Sleep Start']).dt.total_seconds() / 3600

df_night_to_day_nighttime


Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
3,2024-01-16 05:48:00,2024-01-16 07:00:00,1.200000,5,7,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Daytime,2.933333,0,0,0
19,2024-01-19 05:00:00,2024-01-19 07:00:00,2.000000,5,7,2024-01-19,2024-01-19,2024-01-18,2024-01-19,2024-01-18 19:00:00,2024-01-19 07:00:00,Nighttime,Daytime,2.500000,3,0,0
52,2024-01-25 05:14:00,2024-01-25 07:00:00,1.766667,5,7,2024-01-25,2024-01-25,2024-01-24,2024-01-25,2024-01-24 19:00:00,2024-01-25 07:00:00,Nighttime,Daytime,2.900000,9,1,0
64,2024-01-27 04:30:00,2024-01-27 07:00:00,2.500000,4,7,2024-01-27,2024-01-27,2024-01-26,2024-01-27,2024-01-26 19:00:00,2024-01-27 07:00:00,Nighttime,Daytime,2.716667,11,1,0
99,2024-02-02 04:39:00,2024-02-02 07:00:00,2.350000,4,7,2024-02-02,2024-02-02,2024-02-01,2024-02-02,2024-02-01 19:00:00,2024-02-02 07:00:00,Nighttime,Daytime,2.600000,17,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
937,2024-07-08 20:00:00,2024-07-09 07:00:00,11.000000,20,7,2024-07-08,2024-07-09,2024-07-08,2024-07-09,2024-07-08 19:00:00,2024-07-09 07:00:00,Nighttime,Daytime,11.816667,175,25,5
941,2024-07-09 22:29:00,2024-07-10 07:00:00,8.516667,22,7,2024-07-09,2024-07-10,2024-07-09,2024-07-10,2024-07-09 19:00:00,2024-07-10 07:00:00,Nighttime,Daytime,11.966667,176,25,5
945,2024-07-11 05:23:00,2024-07-11 07:00:00,1.616667,5,7,2024-07-11,2024-07-11,2024-07-10,2024-07-11,2024-07-10 19:00:00,2024-07-11 07:00:00,Nighttime,Daytime,2.966667,177,25,5
949,2024-07-11 20:52:00,2024-07-12 07:00:00,10.133333,20,7,2024-07-11,2024-07-12,2024-07-11,2024-07-12,2024-07-11 19:00:00,2024-07-12 07:00:00,Nighttime,Daytime,11.983333,178,25,5


In [16]:
# Calculating the daytime component (which will begin at Night End Time
# and then continue until the original end time(:
df_night_to_day_daytime = df_sleep_data.query(
    "`Start Period` == 'Nighttime' & `End Period` == 'Daytime'").copy()
df_night_to_day_daytime['Start Period'] = 'Daytime'
df_night_to_day_daytime['Sleep Start'] = \
df_night_to_day_daytime['Night End Time']
df_night_to_day_daytime['Start Hour'] = \
df_night_to_day_daytime['Sleep Start'].dt.hour
df_night_to_day_daytime['Start Date'] = \
df_night_to_day_daytime['Night End Date']
# Advancing various date-related values by one day:
for column in [
    'Night Start Date', 'Night End Date', 
    'Night Start Time', 'Night End Time']:
    df_night_to_day_daytime[column] += pd.to_timedelta(1, unit = 'days')
df_night_to_day_daytime['Hours'] = (
    df_night_to_day_daytime['Sleep End'] - 
    df_night_to_day_daytime['Sleep Start']).dt.total_seconds() / 3600


df_night_to_day_daytime


Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
3,2024-01-16 07:00:00,2024-01-16 08:44:00,1.733333,7,8,2024-01-16,2024-01-16,2024-01-16,2024-01-17,2024-01-16 19:00:00,2024-01-17 07:00:00,Daytime,Daytime,2.933333,0,0,0
19,2024-01-19 07:00:00,2024-01-19 07:30:00,0.500000,7,7,2024-01-19,2024-01-19,2024-01-19,2024-01-20,2024-01-19 19:00:00,2024-01-20 07:00:00,Daytime,Daytime,2.500000,3,0,0
52,2024-01-25 07:00:00,2024-01-25 08:08:00,1.133333,7,8,2024-01-25,2024-01-25,2024-01-25,2024-01-26,2024-01-25 19:00:00,2024-01-26 07:00:00,Daytime,Daytime,2.900000,9,1,0
64,2024-01-27 07:00:00,2024-01-27 07:13:00,0.216667,7,7,2024-01-27,2024-01-27,2024-01-27,2024-01-28,2024-01-27 19:00:00,2024-01-28 07:00:00,Daytime,Daytime,2.716667,11,1,0
99,2024-02-02 07:00:00,2024-02-02 07:15:00,0.250000,7,7,2024-02-02,2024-02-02,2024-02-02,2024-02-03,2024-02-02 19:00:00,2024-02-03 07:00:00,Daytime,Daytime,2.600000,17,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
937,2024-07-09 07:00:00,2024-07-09 07:49:00,0.816667,7,7,2024-07-09,2024-07-09,2024-07-09,2024-07-10,2024-07-09 19:00:00,2024-07-10 07:00:00,Daytime,Daytime,11.816667,175,25,5
941,2024-07-10 07:00:00,2024-07-10 10:27:00,3.450000,7,10,2024-07-10,2024-07-10,2024-07-10,2024-07-11,2024-07-10 19:00:00,2024-07-11 07:00:00,Daytime,Daytime,11.966667,176,25,5
945,2024-07-11 07:00:00,2024-07-11 08:21:00,1.350000,7,8,2024-07-11,2024-07-11,2024-07-11,2024-07-12,2024-07-11 19:00:00,2024-07-12 07:00:00,Daytime,Daytime,2.966667,177,25,5
949,2024-07-12 07:00:00,2024-07-12 08:51:00,1.850000,7,8,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Daytime,Daytime,11.983333,178,25,5


Adding these new rows back to a copy of the original dataset that excludes all night-to-day values:

In [17]:
df_sleep_data = pd.concat(
    [df_sleep_data.query("~(`Start Period` == 'Nighttime' \
& `End Period` == 'Daytime')").copy(), 
     df_night_to_day_nighttime, 
     df_night_to_day_daytime]).sort_values('Sleep Start')
# If a sleep period ended right at Night End Time, the above script would have 
# split its row into a nighttime component and a 0-minute daytime 
# component for the next day. The following line addresses this issue by 
# removing any rows with an Hours value of 0.
df_sleep_data = df_sleep_data.query("Hours != 0").copy(
).reset_index(drop=True)

df_sleep_data

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
0,2024-01-15 16:31:00,2024-01-15 17:43:00,1.200000,16,17,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Daytime,1.200000,0,0,0
1,2024-01-15 18:46:00,2024-01-15 21:01:00,2.250000,18,21,2024-01-15,2024-01-15,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Daytime,Nighttime,2.250000,0,0,0
2,2024-01-16 01:00:00,2024-01-16 01:49:00,0.816667,1,1,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Nighttime,0.816667,0,0,0
3,2024-01-16 05:48:00,2024-01-16 07:00:00,1.200000,5,7,2024-01-16,2024-01-16,2024-01-15,2024-01-16,2024-01-15 19:00:00,2024-01-16 07:00:00,Nighttime,Daytime,2.933333,0,0,0
4,2024-01-16 07:00:00,2024-01-16 08:44:00,1.733333,7,8,2024-01-16,2024-01-16,2024-01-16,2024-01-17,2024-01-16 19:00:00,2024-01-17 07:00:00,Daytime,Daytime,2.933333,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1074,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,6.716667,180,25,6
1075,2024-07-14 04:18:00,2024-07-14 06:42:00,2.400000,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,2.400000,180,25,6
1076,2024-07-14 08:38:00,2024-07-14 10:55:00,2.283333,8,10,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.283333,181,25,6
1077,2024-07-14 13:13:00,2024-07-14 15:27:00,2.233333,13,15,2024-07-14,2024-07-14,2024-07-14,2024-07-15,2024-07-14 19:00:00,2024-07-15 07:00:00,Daytime,Daytime,2.233333,181,25,6


Now that we've finished creating extra rows of data, we can filter the dataset to exclude any rows that fall beyond the specified maximum Night Start Date.

In [18]:
if max_night_start_date_limit == True:
    df_sleep_data.query(
        "`Night Start Date` <= @max_night_start_date", 
        inplace = True)

df_sleep_data.tail(10)

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,Night End Time,Start Period,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old
1066,2024-07-12 07:00:00,2024-07-12 08:51:00,1.85,7,8,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Daytime,Daytime,11.983333,178,25,5
1067,2024-07-12 10:21:00,2024-07-12 12:31:00,2.166667,10,12,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Daytime,Daytime,2.166667,179,25,5
1068,2024-07-12 14:22:00,2024-07-12 16:13:00,1.85,14,16,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Daytime,Daytime,1.85,179,25,5
1069,2024-07-12 17:52:00,2024-07-12 19:31:00,1.65,17,19,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Daytime,Nighttime,1.65,179,25,5
1070,2024-07-12 21:26:00,2024-07-13 07:00:00,9.566667,21,7,2024-07-12,2024-07-13,2024-07-12,2024-07-13,2024-07-12 19:00:00,2024-07-13 07:00:00,Nighttime,Daytime,11.916667,179,25,5
1071,2024-07-13 07:00:00,2024-07-13 09:21:00,2.35,7,9,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Daytime,Daytime,11.916667,179,25,5
1072,2024-07-13 11:23:00,2024-07-13 13:44:00,2.35,11,13,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Daytime,Daytime,2.35,180,25,6
1073,2024-07-13 16:10:00,2024-07-13 18:12:00,2.033333,16,18,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Daytime,Daytime,2.033333,180,25,6
1074,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,6.716667,180,25,6
1075,2024-07-14 04:18:00,2024-07-14 06:42:00,2.4,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,2024-07-14 07:00:00,Nighttime,Nighttime,2.4,180,25,6


### Adding age information to the dataset:

In [19]:
# Note that the Days Old, Weeks Old, and Months Old columns
# are based on Night Start Date rather than the original Start Date column
# so that these values will always be the same for a given 
# Night Start Date value.
df_sleep_data['Days Old'] = (pd.to_datetime(
    df_sleep_data['Night Start Date']) - dob).dt.days
df_sleep_data['Weeks Old'] = (
    df_sleep_data['Days Old'] / 7).astype('int')
df_sleep_data['Months Old'] = (
    df_sleep_data['Days Old'] / 30).astype('int')


### Determining the number of daytime and nighttime hours represented by each entry

Depending on when a sleep period began and ended, it might have only daytime hours, only nighttime hours, or a mix of both. We can calculate the number of hours belonging to each period using numpy's select() function. We'll calculate daytime and nighttime sleep periods differently for each of the four possible start/end period conditions (e.g. day-to-day, night-to-night, day-to-night, and night-to-day).

**Simple calculations**:
day-to-day: 0 nighttime hours; daytime hours will equal the Hours column
night-to-night: 0 daytime hours; nighttime hours will equal the Hours column

(The earlier section of this script that split day-night-day and night-day-night entries into separate rows ensured that these calculations would still be valid for those sleep entries. If we hadn't split those rows, day-night-day and night-day-night entries would be perceived by the code as containing *only* daytime or nighttime sleep, respectively.)

**Somewhat more complex calculations**:
day-to-night: daytime hours will equal the duration between Sleep Start and Night Start Time; nighttime hours will equal the duration between Night Start Time and Sleep End
night-to-day: nighttime hours will equal the duration between Sleep Start and Night End Time; daytime hours will equal the duration between Night End Time and Sleep End

In [20]:
# Calculating daytime sleep durations:

condlist = [
    (df_sleep_data['Start Period'] == 'Daytime') 
    & (df_sleep_data['End Period'] == 'Daytime'),
    (df_sleep_data['Start Period'] == 'Nighttime') 
    & (df_sleep_data['End Period'] == 'Nighttime'),
    (df_sleep_data['Start Period'] == 'Daytime') 
    & (df_sleep_data['End Period'] == 'Nighttime'),
    (df_sleep_data['Start Period'] == 'Nighttime') 
    & (df_sleep_data['End Period'] == 'Daytime')
]

choicelist = [
    df_sleep_data['Hours'],
    0,
    (df_sleep_data['Night Start Time'] 
     - df_sleep_data['Sleep Start']).dt.total_seconds() / 3600,
    (df_sleep_data['Sleep End'] 
     - df_sleep_data['Night End Time']).dt.total_seconds() / 3600    
]

df_sleep_data['Day Sleep Hours'] = np.select(condlist, choicelist, np.NaN)


# Calculating nighttime sleep durations:

condlist = [
    (df_sleep_data['Start Period'] == 'Daytime') 
    & (df_sleep_data['End Period'] == 'Daytime'),
    (df_sleep_data['Start Period'] == 'Nighttime') 
    & (df_sleep_data['End Period'] == 'Nighttime'),
    (df_sleep_data['Start Period'] == 'Daytime') 
    & (df_sleep_data['End Period'] == 'Nighttime'),
    (df_sleep_data['Start Period'] == 'Nighttime') 
    & (df_sleep_data['End Period'] == 'Daytime')
]

choicelist = [
    0,
    df_sleep_data['Hours'],
    (df_sleep_data['Sleep End'] 
     - df_sleep_data['Night Start Time']).dt.total_seconds() / 3600,
    (df_sleep_data['Night End Time'] 
     - df_sleep_data['Sleep Start']).dt.total_seconds() / 3600
   
]


df_sleep_data['Night Sleep Hours'] = np.select(condlist, choicelist, np.NaN)

df_sleep_data.tail(10)

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,...,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old,Days Old,Weeks Old,Months Old,Day Sleep Hours,Night Sleep Hours
1066,2024-07-12 07:00:00,2024-07-12 08:51:00,1.85,7,8,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,...,Daytime,11.983333,178,25,5,179,25,5,1.85,0.0
1067,2024-07-12 10:21:00,2024-07-12 12:31:00,2.166667,10,12,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,...,Daytime,2.166667,179,25,5,179,25,5,2.166667,0.0
1068,2024-07-12 14:22:00,2024-07-12 16:13:00,1.85,14,16,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,...,Daytime,1.85,179,25,5,179,25,5,1.85,0.0
1069,2024-07-12 17:52:00,2024-07-12 19:31:00,1.65,17,19,2024-07-12,2024-07-12,2024-07-12,2024-07-13,2024-07-12 19:00:00,...,Nighttime,1.65,179,25,5,179,25,5,1.133333,0.516667
1070,2024-07-12 21:26:00,2024-07-13 07:00:00,9.566667,21,7,2024-07-12,2024-07-13,2024-07-12,2024-07-13,2024-07-12 19:00:00,...,Daytime,11.916667,179,25,5,179,25,5,0.0,9.566667
1071,2024-07-13 07:00:00,2024-07-13 09:21:00,2.35,7,9,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,...,Daytime,11.916667,179,25,5,180,25,6,2.35,0.0
1072,2024-07-13 11:23:00,2024-07-13 13:44:00,2.35,11,13,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,...,Daytime,2.35,180,25,6,180,25,6,2.35,0.0
1073,2024-07-13 16:10:00,2024-07-13 18:12:00,2.033333,16,18,2024-07-13,2024-07-13,2024-07-13,2024-07-14,2024-07-13 19:00:00,...,Daytime,2.033333,180,25,6,180,25,6,2.033333,0.0
1074,2024-07-13 19:09:00,2024-07-14 01:52:00,6.716667,19,1,2024-07-13,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,...,Nighttime,6.716667,180,25,6,180,25,6,0.0,6.716667
1075,2024-07-14 04:18:00,2024-07-14 06:42:00,2.4,4,6,2024-07-14,2024-07-14,2024-07-13,2024-07-14,2024-07-13 19:00:00,...,Nighttime,2.4,180,25,6,180,25,6,0.0,2.4


Confirming that the code correctly divided each sleep period into its respective daytime and nighttime components:

In [21]:
df_sleep_data.query(
    "abs(`Day Sleep Hours` + `Night Sleep Hours` - Hours) > 0.001") # This
    # query() call checks to see whether the sum of any row's
    # day sleep and night sleep durations differs from its total sleep duration.
    # (Note that > 0.001 is used rather than != 0 because, due to the 
    # limitations of discrete mathematics, certain Day + Night sums 
    # may differ *very* slightly from the corresponding Hours totals.

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,...,End Period,Original Hours,Original Days Old,Original Weeks Old,Original Months Old,Days Old,Weeks Old,Months Old,Day Sleep Hours,Night Sleep Hours


## Calculating how early into the night a given sleep period began:

(If a sleep period began during the daytime, this value will be negative.)

In [22]:
df_sleep_data['Hours After Night Start Time'] = (
    df_sleep_data['Sleep Start'] 
    - df_sleep_data['Night Start Time']).dt.total_seconds() / 3600

# Identifying the earliest and latest values:
df_sleep_data.sort_values(['Hours After Night Start Time'])

Unnamed: 0,Sleep Start,Sleep End,Hours,Start Hour,End Hour,Start Date,End Date,Night Start Date,Night End Date,Night Start Time,...,Original Hours,Original Days Old,Original Weeks Old,Original Months Old,Days Old,Weeks Old,Months Old,Day Sleep Hours,Night Sleep Hours,Hours After Night Start Time
894,2024-06-08 07:00:00,2024-06-08 07:21:00,0.350000,7,7,2024-06-08,2024-06-08,2024-06-08,2024-06-09,2024-06-08 19:00:00,...,11.816667,144,20,4,145,20,4,0.350000,0.000000,-12.000000
729,2024-05-08 07:00:00,2024-05-08 09:04:00,2.066667,7,9,2024-05-08,2024-05-08,2024-05-08,2024-05-09,2024-05-08 19:00:00,...,2.666667,113,16,3,114,16,3,2.066667,0.000000,-12.000000
1002,2024-06-29 07:00:00,2024-06-29 07:11:00,0.183333,7,7,2024-06-29,2024-06-29,2024-06-29,2024-06-30,2024-06-29 19:00:00,...,9.683333,165,23,5,166,23,5,0.183333,0.000000,-12.000000
952,2024-06-19 07:00:00,2024-06-19 08:00:00,1.000000,7,8,2024-06-19,2024-06-19,2024-06-19,2024-06-20,2024-06-19 19:00:00,...,11.900000,155,22,5,156,22,5,1.000000,0.000000,-12.000000
808,2024-05-23 07:00:00,2024-05-23 08:17:00,1.283333,7,8,2024-05-23,2024-05-23,2024-05-23,2024-05-24,2024-05-23 19:00:00,...,9.466667,128,18,4,129,18,4,1.283333,0.000000,-12.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
691,2024-05-01 06:53:00,2024-05-01 07:00:00,0.116667,6,7,2024-05-01,2024-05-01,2024-04-30,2024-05-01,2024-04-30 19:00:00,...,2.466667,106,15,3,106,15,3,0.000000,0.116667,11.883333
303,2024-03-02 06:56:00,2024-03-02 07:00:00,0.066667,6,7,2024-03-02,2024-03-02,2024-03-01,2024-03-02,2024-03-01 19:00:00,...,2.983333,46,6,1,46,6,1,0.000000,0.066667,11.933333
661,2024-04-26 06:57:00,2024-04-26 07:00:00,0.050000,6,7,2024-04-26,2024-04-26,2024-04-25,2024-04-26,2024-04-25 19:00:00,...,2.783333,101,14,3,101,14,3,0.000000,0.050000,11.950000
206,2024-02-18 06:59:00,2024-02-18 07:00:00,0.016667,6,7,2024-02-18,2024-02-18,2024-02-17,2024-02-18,2024-02-17 19:00:00,...,2.766667,33,4,1,33,4,1,0.000000,0.016667,11.983333


# Part 2: Analyzing and Visualizing Data

### Determining each day's longest sleep period:

We'll use the 'Original Hours' values for these calculations rather than the 'Hours' value, as the latter values got split into multiple entries for all day-night-day and night-day-night sleep periods. In addition, the 'Original' day, month, and week columns will be used as the index, since the same 'Original Hours' total can appear within multiple 'Days Old', 'Months Old', and 'Weeks Old' values due to the aforementioned split.

In [23]:
df_longest_sleep_periods_by_day = df_sleep_data.pivot_table(
    index = ['Original Months Old', 'Original Weeks Old', 'Original Days Old'], 
    values = 'Original Hours', aggfunc = 'max').reset_index()
# Simplifying column names by removing their 'Original ' component:
df_longest_sleep_periods_by_day.columns = [
    column.replace('Original ', '') 
    for column in df_longest_sleep_periods_by_day.columns]

df_longest_sleep_periods_by_day['Count'] = 1 # This value will be helpful
# for calculating weekly averages
df_longest_sleep_periods_by_day.to_csv(
    f'{data_output_folder}longest_sleep_periods_by_day.csv', index = False)
df_longest_sleep_periods_by_day.columns

Index(['Months Old', 'Weeks Old', 'Days Old', 'Hours', 'Count'], dtype='object')

In [24]:
df_longest_sleep_periods_by_day.sort_values('Hours', ascending = False)

Unnamed: 0,Months Old,Weeks Old,Days Old,Hours,Count
178,5,25,178,11.983333,1
153,5,21,153,11.983333,1
176,5,25,176,11.966667,1
152,5,21,152,11.950000,1
160,5,22,160,11.950000,1
...,...,...,...,...,...
3,0,0,3,2.666667,1
16,0,2,16,2.650000,1
18,0,2,18,2.650000,1
15,0,2,15,2.516667,1


### Determining each week's average longest sleep period:

In [25]:
def add_rounded_values(
    df, source_column = 'Hours', 
    rounded_column_name = 'Rounded_Hours',
    tenths_cutoff = 20, integer_cutoff = 30):
    
    '''This function creates a rounded set of values within a DataFrame
    that can be incorporated into charts' data labels. 
    
    df: the DataFrame to which to add the rounded values.
    source_column: the original values within the DataFrame to round.
    rounded_column_name: the name to assign this rounded column.
    
    In order to prevent these labels from making the resulting chart too 
    crowded, the script will round these values to the nearest 
    tenth if there are fewer than tenths_cutoff rows in the table and 
    to the nearest integer if there are between tenths_cutoff 
    and integer_cutoff rows. If the row count exceeds integer_cutoff, 
    the rounded values will all be np.NaN so that no values will appear
    within the chart. 
    
    (An alternative approach would be not to create
    the column with rounded values to begin with if the row count exceeds
    integer_cutoff, but this would cause the chart creation code to crash 
    unless it was instructed not to try to use this column.'''
    
    if len(df) <= tenths_cutoff:
        round_value = 1
    elif len(df) > tenths_cutoff:
        round_value = 0
    
    if len(df) <= integer_cutoff:
        df[rounded_column_name] = df[source_column].round(round_value)
    
    else:
        df[rounded_column_name] = np.NaN

In [26]:
df_average_longest_sleep_period_by_week = \
df_longest_sleep_periods_by_day.pivot_table(
    index = 'Weeks Old', values = ['Hours', 'Count'], 
    aggfunc = 'sum').reset_index()

# Dividing 'Hours' sums by 'Count' sums in order to determine the *average*
# longest sleep period logged each week:
df_average_longest_sleep_period_by_week['Hours'] /= \
df_average_longest_sleep_period_by_week['Count']

# Creating a rounded set of values that can be incorporated into data labels:

add_rounded_values(
    df_average_longest_sleep_period_by_week)
    
df_average_longest_sleep_period_by_week.to_csv(
    f'{data_output_folder}average_longest_sleep_period_by_week.csv', 
    index = False)
df_average_longest_sleep_period_by_week

Unnamed: 0,Weeks Old,Count,Hours,Rounded_Hours
0,0,7,2.871429,3.0
1,1,7,2.814286,3.0
2,2,7,2.711905,3.0
3,3,7,2.87381,3.0
4,4,7,3.209524,3.0
5,5,7,3.166667,3.0
6,6,7,3.602381,4.0
7,7,7,3.852381,4.0
8,8,7,4.9,5.0
9,9,7,4.780952,5.0


### Calculating the average longest sleep period achieved each month:

Here with editing:

Add rounded column values here also using the function you created earlier.
    

In [27]:
df_average_longest_sleep_period_by_month = \
df_longest_sleep_periods_by_day.pivot_table(
    index = 'Months Old', values = ['Hours', 'Count'], 
    aggfunc = 'sum').reset_index()
df_average_longest_sleep_period_by_month['Hours'] /= \
df_average_longest_sleep_period_by_month['Count']

add_rounded_values(df_average_longest_sleep_period_by_month)

df_average_longest_sleep_period_by_month.to_csv(
    f'{data_output_folder}average_longest_sleep_period_by_month.csv', 
    index = False)
df_average_longest_sleep_period_by_month

Unnamed: 0,Months Old,Count,Hours,Rounded_Hours
0,0,30,2.818889,2.8
1,1,30,3.556667,3.6
2,2,30,5.39,5.4
3,3,30,8.523333,8.5
4,4,30,9.696111,9.7
5,5,30,10.536667,10.5
6,6,1,6.716667,6.7


## Charting these longest sleep periods via Plotly, then saving the output to .png and .html files:

In [28]:
px_longest_sleep_periods_by_day = px.line(
    df_longest_sleep_periods_by_day, x = 'Days Old', y = 'Hours',
title = 'Longest Sleep Periods by Age (in Days)')

px_longest_sleep_periods_by_day.write_html(
    f'{visualizations_folder}longest_sleep_periods_by_day.html')
px_longest_sleep_periods_by_day.write_image(
    f'{visualizations_folder}longest_sleep_periods_by_day.png', scale = 4)
px_longest_sleep_periods_by_day


In [29]:
# This scatter plot includes a best fit line; hovering over the line
# allows you to view its regression coefficients and R^2 values.
px_longest_sleep_periods_by_day_scatter = px.scatter(
    df_longest_sleep_periods_by_day, x = 'Days Old', y = 'Hours', 
    trendline = 'ols',
    title = 'Longest Sleep Periods by Age (in Days)')

px_longest_sleep_periods_by_day_scatter.write_html(
    f'{visualizations_folder}longest_sleep_periods_by_day_scatter.html')
px_longest_sleep_periods_by_day_scatter.write_image(
    f'{visualizations_folder}longest_sleep_periods_by_day_scatter.png', 
    scale = 4)
px_longest_sleep_periods_by_day_scatter


In [30]:
px_longest_sleep_periods_by_week = px.line(
    df_average_longest_sleep_period_by_week, 
    x = 'Weeks Old', y = 'Hours', text = 'Rounded_Hours',
title = 'Longest Sleep Periods by Age (in Weeks)')
# The following line keeps each 'Weeks Old' value distinct.
px_longest_sleep_periods_by_week.update_xaxes(type = 'category')
px_longest_sleep_periods_by_week.update_traces(textposition = 'top center')
px_longest_sleep_periods_by_week.write_html(
    f'{visualizations_folder}longest_sleep_periods_by_week.html')
px_longest_sleep_periods_by_week.write_image(
    f'{visualizations_folder}longest_sleep_periods_by_week.png', scale = 4)
px_longest_sleep_periods_by_week


In [31]:
px_longest_sleep_periods_by_month = px.line(
    df_average_longest_sleep_period_by_month, x = 'Months Old', 
    y = 'Hours', text = 'Rounded_Hours',
    title = 'Longest Sleep Periods by Age (in Months)')
px_longest_sleep_periods_by_day.update_xaxes(type = 'category')
px_longest_sleep_periods_by_month.update_traces(
    textposition = 'top center')
px_longest_sleep_periods_by_month.write_html(
    f'{visualizations_folder}longest_sleep_periods_by_month.html')
px_longest_sleep_periods_by_month.write_image(
    f'{visualizations_folder}longest_sleep_periods_by_month.png', scale = 5)
px_longest_sleep_periods_by_month


## Creating a regression equation in order to assess the relationship between age and longest sleep periods:

(This code assumes that the relationship between age and sleep length is linear, but this linear relationship can't continue indefinitely. Therefore, a regression_age_limit variable was added in to limit the upper age range of the regression data. I added in 120 days as the default limit, but feel free to adjust this to a longer or shorter length to better match your own baby's data.)

In [32]:
# Here with editing

In [33]:
regression_age_limit = 120

y = df_longest_sleep_periods_by_day.query(
    "`Days Old` < @regression_age_limit").copy()['Hours']
x = df_longest_sleep_periods_by_day.query(
    "`Days Old` < @regression_age_limit").copy()['Days Old'].copy()
x = sm.add_constant(x)
model = sm.OLS(y, x)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                  Hours   R-squared:                       0.652
Model:                            OLS   Adj. R-squared:                  0.649
Method:                 Least Squares   F-statistic:                     221.2
Date:                Tue, 14 May 2024   Prob (F-statistic):           7.89e-29
Time:                        21:46:49   Log-Likelihood:                -221.17
No. Observations:                 120   AIC:                             446.3
Df Residuals:                     118   BIC:                             451.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4776      0.280      5.284      0.0

The regression model estimates the duration of the baby's longest sleep periods (in hours) as the sum of a constant value ('const') and the product of the Days Old coefficient and the child's age (in days). For instance, if const is 2 and the Days Old coefficient is 0.05, the predicted longest sleep length for the baby on day 100 would be (2 + 100 * 0.05) --> (2 + 5) --> 7 hours.

## Calculating the amount of daytime vs. nighttime sleep that the baby received each day:


In [34]:
df_daily_sleep_by_period = df_sleep_data.pivot_table(
    index = ['Months Old', 'Weeks Old', 'Days Old'], 
    values = ['Day Sleep Hours', 
        'Night Sleep Hours'], aggfunc = 'sum').reset_index()
df_daily_sleep_by_period['Count'] = 1
df_daily_sleep_by_period.to_csv(
    f'{data_output_folder}daily_sleep_by_period.csv', 
    index = False)

df_daily_sleep_by_period


Unnamed: 0,Months Old,Weeks Old,Days Old,Day Sleep Hours,Night Sleep Hours,Count
0,0,0,0,1.433333,4.033333,1
1,0,0,1,8.250000,3.783333,1
2,0,0,2,7.300000,3.566667,1
3,0,0,3,6.850000,4.650000,1
4,0,0,4,7.883333,4.216667,1
...,...,...,...,...,...,...
176,5,25,176,5.683333,10.283333,1
177,5,25,177,7.566667,8.816667,1
178,5,25,178,7.466667,10.250000,1
179,5,25,179,7.000000,10.083333,1


### Plotting these daily sleep distributions:

In [35]:
fig_sleep_distribution = px.line(
    df_daily_sleep_by_period, x = 'Days Old', 
    y = ['Night Sleep Hours', 'Day Sleep Hours'], 
title = 'Daily Day and Night Sleep Totals by Age (in Days)')
# The default y axis name for line charts with multiple y variables
# is 'value,' but this can be updated to 'Hours' using the following line.
fig_sleep_distribution.update_layout(yaxis_title = 'Hours', 
                                     legend_title = 'Metric')
fig_sleep_distribution.write_html(
    f'{visualizations_folder}daily_total_sleep_by_period.html')
fig_sleep_distribution.write_image(
    f'{visualizations_folder}daily_total_sleep_by_period.png', scale = 4)
fig_sleep_distribution

## Calculating how much sleep the baby received each day, then plotting the results:

In [36]:
df_daily_sleep = df_sleep_data.pivot_table(
    index = ['Weeks Old', 'Days Old'], 
    values = 'Hours', 
    aggfunc = 'sum').reset_index()
df_daily_sleep['Count'] = 1 # Will be useful for calculating weekly averages
df_daily_sleep.to_csv(
    f'{data_output_folder}daily_sleep.csv', index = False)

df_daily_sleep

Unnamed: 0,Weeks Old,Days Old,Hours,Count
0,0,0,5.466667,1
1,0,1,12.033333,1
2,0,2,10.866667,1
3,0,3,11.500000,1
4,0,4,12.100000,1
...,...,...,...,...
176,25,176,15.966667,1
177,25,177,16.383333,1
178,25,178,17.716667,1
179,25,179,17.083333,1


In [37]:
fig_daily_sleep = px.line(
    df_daily_sleep, x = 'Days Old', y = 'Hours',
title = 'Daily Sleep Totals by Age (in Days)')
fig_daily_sleep.write_html(
    f'{visualizations_folder}daily_total_sleep.html')
fig_daily_sleep.write_image(
    f'{visualizations_folder}daily_total_sleep.png', scale = 4)
fig_daily_sleep

### Calculating and plotting the average amount of day and night sleep received each week:

In [38]:
df_weekly_sleep_by_period = df_daily_sleep_by_period.pivot_table(
    index = 'Weeks Old', 
    values = ['Day Sleep Hours', 
              'Night Sleep Hours', 'Count'], 
    aggfunc = 'sum').reset_index()
for column in ['Day Sleep Hours', 'Night Sleep Hours']:
    df_weekly_sleep_by_period[f"Avg {column}"] = \
        df_weekly_sleep_by_period[column] / df_weekly_sleep_by_period['Count']
df_weekly_sleep_by_period.to_csv(
    f'{data_output_folder}weekly_sleep_by_period.csv', 
    index = False)
df_weekly_sleep_by_period

Unnamed: 0,Weeks Old,Count,Day Sleep Hours,Night Sleep Hours,Avg Day Sleep Hours,Avg Night Sleep Hours
0,0,7,45.483333,28.183333,6.497619,4.02619
1,1,7,50.516667,35.2,7.216667,5.028571
2,2,7,54.983333,35.983333,7.854762,5.140476
3,3,7,52.216667,43.916667,7.459524,6.27381
4,4,7,57.1,48.133333,8.157143,6.87619
5,5,7,53.25,57.116667,7.607143,8.159524
6,6,7,55.933333,57.75,7.990476,8.25
7,7,7,52.5,60.85,7.5,8.692857
8,8,7,53.35,60.7,7.621429,8.671429
9,9,7,53.816667,61.5,7.688095,8.785714


In [39]:
fig_weekly_sleep_distribution = px.line(
    df_weekly_sleep_by_period, x = 'Weeks Old', 
    y = ['Avg Night Sleep Hours', 'Avg Day Sleep Hours'],
    title = 'Average Daily Day and Night Sleep Totals by Age (in Weeks)')
fig_weekly_sleep_distribution.update_layout(yaxis_title = 'Hours',
                                           legend_title = 'Metric')
fig_weekly_sleep_distribution.write_html(
    f'{visualizations_folder}weekly_total_sleep_by_period.html')
fig_weekly_sleep_distribution.write_image(
    f'{visualizations_folder}weekly_total_sleep_by_period.png', scale = 4)
fig_weekly_sleep_distribution

### Calculating and plotting the average daily total amount of sleep received each week:

In [40]:
df_weekly_sleep = df_daily_sleep.pivot_table(
    index = 'Weeks Old', 
    values = ['Hours', 'Count'], 
    aggfunc = 'sum').reset_index()
df_weekly_sleep['Avg Per Day'] = \
    df_weekly_sleep['Hours'] / df_weekly_sleep['Count']


add_rounded_values(df_weekly_sleep, source_column = 'Avg Per Day', 
    rounded_column_name = 'Rounded Avg Per Day')


df_weekly_sleep.to_csv(
    f'{data_output_folder}weekly_total_sleep.csv', 
    index = False)
df_weekly_sleep

Unnamed: 0,Weeks Old,Count,Hours,Avg Per Day,Rounded Avg Per Day
0,0,7,73.666667,10.52381,11.0
1,1,7,85.716667,12.245238,12.0
2,2,7,90.966667,12.995238,13.0
3,3,7,96.133333,13.733333,14.0
4,4,7,105.233333,15.033333,15.0
5,5,7,110.366667,15.766667,16.0
6,6,7,113.683333,16.240476,16.0
7,7,7,113.35,16.192857,16.0
8,8,7,114.05,16.292857,16.0
9,9,7,115.316667,16.47381,16.0


In [41]:
fig_weekly_sleep = px.line(
    df_weekly_sleep, x = 'Weeks Old', y = 'Avg Per Day',
    title = 'Average Daily Sleep Totals by Age (in Weeks)',
    text = 'Rounded Avg Per Day')
fig_weekly_sleep.update_traces(textposition = 'top center')
fig_weekly_sleep.write_html(
    f'{visualizations_folder}weekly_total_sleep.html')
fig_weekly_sleep.write_image(
    f'{visualizations_folder}weekly_total_sleep.png', scale = 4)
fig_weekly_sleep

## Analyzing trends in sleep onset:

The DataFrame created in the following cell determines, for each day, the *earliest* sleep period that included at least one hour of nighttime sleep. This data can help determine whether a baby is getting to sleep earlier in the evening over time.

In [42]:
df_hours_after_onset = df_sleep_data.query(
    "`Night Sleep Hours` >= 1").pivot_table(
    index = ['Weeks Old', 'Days Old'], 
    values = 'Hours After Night Start Time', 
    aggfunc = 'min').reset_index()

df_hours_after_onset.to_csv(
    f'{data_output_folder}hours_after_onset.csv', 
    index = False)

df_hours_after_onset

Unnamed: 0,Weeks Old,Days Old,Hours After Night Start Time
0,0,0,-0.233333
1,0,1,7.700000
2,0,2,7.683333
3,0,3,-0.033333
4,0,4,4.150000
...,...,...,...
176,25,176,-0.316667
177,25,177,0.916667
178,25,178,1.866667
179,25,179,2.433333


In [43]:
fig_hours_after_onset = px.line(
    df_hours_after_onset, x = 'Days Old', 
    y = 'Hours After Night Start Time',
    title = 'Hours Into the Night That Sleep Began by Age (in Days)')

fig_hours_after_onset.write_html(
    f'{visualizations_folder}hours_after_onset.html')
fig_hours_after_onset.write_image(
    f'{visualizations_folder}hours_after_onset.png', scale = 4)
fig_hours_after_onset

## Calculating and visualizing weekly averages of these nighttime sleep onset data:

In [44]:
df_weekly_avg_hours_after_onset = df_hours_after_onset.pivot_table(
    index = 'Weeks Old', 
    values = 'Hours After Night Start Time', 
    aggfunc = 'mean').reset_index()

add_rounded_values(df_weekly_avg_hours_after_onset, 
    source_column = 'Hours After Night Start Time', 
    rounded_column_name = 'Rounded Hours After Night Start Time')

df_weekly_avg_hours_after_onset.to_csv(
    f'{data_output_folder}weekly_avg_hours_after_onset.csv', 
    index = False)

df_weekly_avg_hours_after_onset

Unnamed: 0,Weeks Old,Hours After Night Start Time,Rounded Hours After Night Start Time
0,0,3.583333,4.0
1,1,1.611905,2.0
2,2,2.519048,3.0
3,3,1.830952,2.0
4,4,0.878571,1.0
5,5,0.247619,0.0
6,6,0.540476,1.0
7,7,0.211905,0.0
8,8,-0.102381,-0.0
9,9,0.652381,1.0


In [45]:
fig_weekly_hours_after_onset = px.line(
    df_weekly_avg_hours_after_onset, 
    x = 'Weeks Old', y = 'Hours After Night Start Time',
    title = 'Average Hours Into the Night That Sleep Began by Age (in Weeks)',
    text = 'Rounded Hours After Night Start Time')
fig_weekly_hours_after_onset.update_traces(
    textposition = 'top center')
fig_weekly_hours_after_onset.write_html(
    f'{visualizations_folder}weekly_avg_hours_after_onset.html')
fig_weekly_hours_after_onset.write_image(
    f'{visualizations_folder}weekly_avg_hours_after_onset.png', scale = 4)
fig_weekly_hours_after_onset

### Saving updated sleep data table to a .csv file:

(Waiting until now to save it ensures that the .csv file will include all updates made to it throughout the script.)

In [46]:
df_sleep_data.to_csv(
    f'{data_output_folder}updated_sleep_data.csv', index = False)

In [47]:
script_end_time = time.time()
run_time = script_end_time - script_start_time
print(f"Finished running the script in {round(run_time, 3)} seconds.")

Finished running the script in 5.901 seconds.


And that's how you can analyze and visualize baby sleep data within Python! Feel free to modify this script for use within your own data analysis projects.