# NHANES 2021–2023: Sleep Disorder (SLQ_L) data

This notebook:
- Loads NHANES **SLQ_L** (physical activity) and **DEMO_L** (demographics)
- Calculates average weekly sleep time
- Calculates social jetlag
    - "the discrepancy between biological time, determined by our internal body clock, and social times, mainly dictated by social obligations such as school or work" (Caliandro et al., 2021) from the NCBI

## 1) Load Data

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# read in raw dataset
sleep_original = pd.read_sas("../data/SLQ_L.xpt")

# make a copy to leave orginal untouched
df_sleep = sleep_original

# SEQN   - Respondent sequence number
# SLQ300 - Usual sleep time on weekdays or workdays
# SLQ310 - Usual wake time on weekdays or workdays
# SLD012 - Sleep hours - weekdays or workdays
# SLQ320 - Usual sleep time on weekends
# SLQ330 - Usual wake time on weekends
# SLD013 - Sleep hours - weekends

## 2) Clean Data
- Replace "Refused" and "Don't know" responses with NaN and convert objects to string types

In [2]:
def clean_nhanes_sleep(df):
    sleep_cols = ['SLD012', 'SLD013', 'SLQ300', 'SLQ310', 'SLQ320', 'SLQ330']
    
     # change object types to strings (applies to all time columns)
    obj_cols = df.select_dtypes(include=['object']).columns
    for col in obj_cols:
        # force to string
        df[col] = df[col].astype(str)

        # strip literal b'' or b"" from values
        df[col] = df[col].str.replace(r"^b['\"]", '', regex=True).str.replace(r"['\"]$", '', regex=True)

    for col in sleep_cols:
        # Refused' (77777) and 'Don't Know' (99999) -> NaN
        df[col] = df[col].replace([77777, 99999, '77777', '99999', "b'77777'", "b'99999'"], np.nan)
    
    return df

# Apply the cleaning
df_sleep = clean_nhanes_sleep(df_sleep)

df_sleep.head()

Unnamed: 0,SEQN,SLQ300,SLQ310,SLD012,SLQ320,SLQ330,SLD013
0,130378.0,21:30,07:00,9.5,00:00,09:00,9.0
1,130379.0,21:00,06:00,9.0,21:00,06:00,9.0
2,130380.0,00:00,08:00,8.0,00:00,09:00,9.0
3,130384.0,21:30,05:00,7.5,23:00,07:00,8.0
4,130385.0,22:05,06:15,8.0,22:05,06:15,8.0


## 3) Calculate Weekly Sleep Average in hours
Only finds average for rows with both total hours on the weekdays and the weekend

In [3]:
def find_weekly_avg(df):
    # calculate the Weighted Weekly Average
    # skip if one value is missing --> average is unreliable
    df['avg daily sleep'] = ((df['SLD012'] * 5) + (df['SLD013'] * 2)) / 7

    return df

df_sleep = find_weekly_avg(df_sleep)
df_sleep.head()

Unnamed: 0,SEQN,SLQ300,SLQ310,SLD012,SLQ320,SLQ330,SLD013,avg daily sleep
0,130378.0,21:30,07:00,9.5,00:00,09:00,9.0,9.357143
1,130379.0,21:00,06:00,9.0,21:00,06:00,9.0,9.0
2,130380.0,00:00,08:00,8.0,00:00,09:00,9.0,8.285714
3,130384.0,21:30,05:00,7.5,23:00,07:00,8.0,7.642857
4,130385.0,22:05,06:15,8.0,22:05,06:15,8.0,8.0


## 4) Calculate Social Jetlag in hours
- Converts sleep hours to minutes to find the midpoint of sleep durations on weekdays vs. weekends
- Finds the "shift" in behavior on days when respondents aren't constrained by work/social pressure

In [4]:
def calculate_social_jetlag(df):    
    def to_minutes(time_str):
        try:
            if pd.isna(time_str) or time_str in ['nan', 'None']: 
                return np.nan
            # HH:MM format
            h, m = map(int, time_str.split(':'))
            return h * 60 + m
        except:
            return np.nan

    # convert all time columns to minutes from midnight
    cols = ['SLQ300', 'SLQ310', 'SLQ320', 'SLQ330']
    for col in cols:
        df[f'{col}_min'] = df[col].apply(to_minutes)

    # calculate Midpoints
        # Formula for midpoint considering midnight cross: (Bedtime + Duration/2) % 1440
    
    # convert durations to minutes
    df['work_dur'] = (df['SLQ310_min'] - df['SLQ300_min']) % 1440
    df['free_dur'] = (df['SLQ330_min'] - df['SLQ320_min']) % 1440
    
    # find midpoints
    df['midpoint_work'] = (df['SLQ300_min'] + (df['work_dur'] / 2)) % 1440
    df['midpoint_free'] = (df['SLQ320_min'] + (df['free_dur'] / 2)) % 1440
    
    # Social Jetlag (absolute difference in hours)
    df['social_jetlag_hrs'] = abs(df['midpoint_free'] - df['midpoint_work']) / 60
    
    # if the shift is > 12 hours, flip by subtracting from 24
    df.loc[df['social_jetlag_hrs'] > 12, 'social_jetlag_hrs'] = 24 - df['social_jetlag_hrs']

    return df

df_sleep = calculate_social_jetlag(df_sleep)
df_sleep.head()

Unnamed: 0,SEQN,SLQ300,SLQ310,SLD012,SLQ320,SLQ330,SLD013,avg daily sleep,SLQ300_min,SLQ310_min,SLQ320_min,SLQ330_min,work_dur,free_dur,midpoint_work,midpoint_free,social_jetlag_hrs
0,130378.0,21:30,07:00,9.5,00:00,09:00,9.0,9.357143,1290.0,420.0,0.0,540.0,570.0,540.0,135.0,270.0,2.25
1,130379.0,21:00,06:00,9.0,21:00,06:00,9.0,9.0,1260.0,360.0,1260.0,360.0,540.0,540.0,90.0,90.0,0.0
2,130380.0,00:00,08:00,8.0,00:00,09:00,9.0,8.285714,0.0,480.0,0.0,540.0,480.0,540.0,240.0,270.0,0.5
3,130384.0,21:30,05:00,7.5,23:00,07:00,8.0,7.642857,1290.0,300.0,1380.0,420.0,450.0,480.0,75.0,180.0,1.75
4,130385.0,22:05,06:15,8.0,22:05,06:15,8.0,8.0,1325.0,375.0,1325.0,375.0,490.0,490.0,130.0,130.0,0.0


## 5) Merge table with Demographics Data (Seqn, Age, Sex) from **DEMO_L**
- Perform an inner join on the two datasets on the seqn (unique identifier for each participant)
- Remove minors from data (original participants are 16+)
- Reorder and rename columns for readability

In [5]:
demographics = pd.read_sas("../data/DEMO_L.xpt")

# create new table with only repondent sequence number, age, and sex
demographics_age = demographics[['SEQN','RIDAGEYR', 'RIAGENDR']] # gender: Male = 1, Female = 2

# inner join demographics with sleep to label each response with age
merged_df = pd.merge(df_sleep, demographics_age, on='SEQN', how='inner')

# remove any respondents under the age of 18
merged_df = merged_df[merged_df['RIDAGEYR'] >= 18]

# rename columns for readability
sleep_cleaned_df = merged_df.rename(columns={'SLQ300': 'weekday sleep time', 
                                             'SLQ310': 'weekday wake time', 
                                             'SLQ320': 'weekend sleep time', 
                                             'SLQ330': 'weekend wake time', 
                                             'RIDAGEYR':'age', 
                                             'RIAGENDR' : 'sex', 
                                             'SEQN' : 'seqn', 
                                             'SLD012' : 'weekday total',
                                             'SLD013' : 'weekend total',
                                             'social_jetlag_hrs' : 'social jetlag (hrs)'
                                            })

# reorder for clarity and exclude intermediate step columns (e.g. SLQ330_min used to calculate midpoints)
new_order = ["seqn", "age", "sex", 
             "weekday sleep time", "weekday wake time", "weekday total", 
             "weekend sleep time", "weekend wake time", "weekend total", 
             "avg daily sleep", "social jetlag (hrs)"]
sleep_cleaned_df = sleep_cleaned_df[new_order]

sleep_cleaned_df.head()

Unnamed: 0,seqn,age,sex,weekday sleep time,weekday wake time,weekday total,weekend sleep time,weekend wake time,weekend total,avg daily sleep,social jetlag (hrs)
0,130378.0,43.0,1.0,21:30,07:00,9.5,00:00,09:00,9.0,9.357143,2.25
1,130379.0,66.0,1.0,21:00,06:00,9.0,21:00,06:00,9.0,9.0,0.0
2,130380.0,44.0,2.0,00:00,08:00,8.0,00:00,09:00,9.0,8.285714,0.5
3,130384.0,43.0,1.0,21:30,05:00,7.5,23:00,07:00,8.0,7.642857,1.75
4,130385.0,65.0,2.0,22:05,06:15,8.0,22:05,06:15,8.0,8.0,0.0


## 6) Export final data table as CSV

In [6]:
export = "../data/slq_sleep_clean.csv"

sleep_cleaned_df.to_csv(export, index=False)

### References

[1]R. Caliandro, A. A. Streng, L. W. M. van Kerkhof, G. T. J. van der Horst, and I. Chaves, “Social Jetlag and Related Risks for Human Health: A Timely Review,” Nutrients, vol. 13, no. 12, p. 4543, Dec. 2021, doi: https://doi.org/10.3390/nu13124543.