# Sleep Health and Lifestyle Data Analysis

The goal of this project will be to analyze survey data on sleep disorders, sleep health, and lifestyle with a special focus on:

1. The average sleep quality and sleep duration by gender and sleep disorder
2. The average stress level by occupation
3. Lifestyle across bmi categories
4. Deploy visualizations to a website

# Importing, Cleaning and Preparing Data

The first part of my notebook will be:
1. making all of the necessary imports needed for the notebook
2. loading csv data into a pandas dataframe
3. clean and prepare data by getting rid of null values and dropping columns I will not use

In [87]:
# make imports
import pandas as pd
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show, output_notebook
from bokeh.layouts import row, column
from bokeh.models import HoverTool
from bokeh.transform import factor_cmap
from ipywidgets import interact, SelectMultiple
from bokeh.io import push_notebook
from bokeh.layouts import gridplot
from bokeh.plotting import output_file
output_notebook() # call output notebook so plots will show throughout the notebook

In [88]:
# load data into pandas dataframe
shl_data = pd.read_csv("Sleep_health_and_lifestyle_dataset.csv")
shl_data

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


In [89]:
# use info to check for null values and data types
shl_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


In [90]:
shl_data = shl_data.fillna('No Sleep Disorder') # we want null values to be included as a group under sleep disorder
shl_data = shl_data.drop('Person ID', axis=1)
shl_data = shl_data.drop('Age', axis=1) # note that age is a factor that can be considered but I will not be using it for my purposes
shl_data

Unnamed: 0,Gender,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,Male,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,No Sleep Disorder
1,Male,Doctor,6.2,6,60,8,Normal,125/80,75,10000,No Sleep Disorder
2,Male,Doctor,6.2,6,60,8,Normal,125/80,75,10000,No Sleep Disorder
3,Male,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,Male,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
...,...,...,...,...,...,...,...,...,...,...,...
369,Female,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,Female,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,Female,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,Female,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


# Stacked Bar Chart

Now that the necessary imports have been made, and the data has been cleaned and prepared, I will move towards my first goal. I will use a stacked bar chart to visualize both average sleep quality and average sleep duration by gender and sleep disorder.

In [91]:
sleep_disorders = shl_data['Sleep Disorder'].unique() # get the unique groups under sleep disorders
print(sleep_disorders)

genders = shl_data['Gender'].unique() # get the unique genders under gender
print(genders)

['No Sleep Disorder' 'Sleep Apnea' 'Insomnia']
['Male' 'Female']


In [92]:
# create table format for sleep disorder and gender average values
average_sleep_duration = shl_data.groupby(['Sleep Disorder', 'Gender'])['Sleep Duration'].mean().unstack()

# let average sleep duration be a column in dataframe and access that column
source = ColumnDataSource(average_sleep_duration.reset_index())

# create figure with x range being the index of average sleep duration in list format
sleep_duration = figure(x_range=average_sleep_duration.index.tolist(), y_range=(0,17), title="Average Sleep Duration by Gender and Sleep Disorder", x_axis_label = "Sleep Disorder", y_axis_label = "Average Sleep Duration" ,height=350, width=800, tooltips ="$name average sleep duration: @$name")

# let figure be a stacked bar chart
sleep_duration.vbar_stack(['Female', 'Male'], x='Sleep Disorder', width=0.5, color=["blue", "green"], source=source, legend_label=['Female', 'Male'])

# specify legend location
sleep_duration.legend.location = "top_left"
sleep_duration.legend.orientation = "horizontal"


In [93]:
# create the same plot as the above code cell but with sleep quality instead of sleep duration

average_sleep_quality = shl_data.groupby(['Sleep Disorder', 'Gender'])['Quality of Sleep'].mean().unstack()

source = ColumnDataSource(average_sleep_quality.reset_index())

sleep_quality = figure(x_range=average_sleep_quality.index.tolist(), y_range=(0,17), title="Average Sleep Quality by Gender and Sleep Disorder", x_axis_label = "Sleep Disorder", y_axis_label = "Average Sleep Quality" ,height=350, width=800, tooltips ="$name average sleep quality: @$name")

sleep_quality.vbar_stack(['Female', 'Male'], x='Sleep Disorder', width=0.5, color=["red", "purple"], source=source, legend_label=['Female', 'Male'])

sleep_quality.legend.location = "top_left"
sleep_quality.legend.orientation = "horizontal"

In [94]:
# show both charts in one plot in column format
show(column(sleep_duration, sleep_quality))

# Bar Chart of Average Stress Level By Occupation and Add Widget

I will now move on to creating a bar chart of average stress level by occupation. After this is done, I will add a `SelectMultiple` widget to my plot.

In [95]:
unique_occupations = shl_data['Occupation'].unique() # get the unique occupations in dataframe
print(unique_occupations)

unique_stress_level = shl_data['Stress Level'].unique() # get the unique values of stress level to help determine y range
print(unique_stress_level)

['Software Engineer' 'Doctor' 'Sales Representative' 'Teacher' 'Nurse'
 'Engineer' 'Accountant' 'Scientist' 'Lawyer' 'Salesperson' 'Manager']
[6 8 7 4 3 5]


In [96]:
# hexadecimal colour palette
colors = ['#eb5cd9', '#ff5e32', '#2b61fb', '#e9e9e9',
          '#ffda28', '#fb8779', '#f16f5e', '#f35541',
          '#76cfff', '#76a9ff', '#991c64']

In [97]:
# include average stress level as a column in dataframe
average_stress_level = shl_data.groupby('Occupation')['Stress Level'].mean().reset_index()

# convert the occupations in average_stress_level into a list of occupations
occupations = average_stress_level['Occupation'].tolist()

occupation_bars = {} # dictionary to store the bars in bar chart

# create figure
stress_level = figure(x_range=occupations, title="Average Stress Level by Occupation",
                      x_axis_label="Occupation", y_axis_label="Average Stress Level",
                      y_range=(0, 10), width=1050, height=300)

# define data source for bars in visualization
for i, occupation in enumerate(occupations):
    source = ColumnDataSource(data={
        'Occupation': [occupation],
        'AverageStressLevel': [average_stress_level.loc[i, 'Stress Level']]
    })

    # create bar chart for visualization
    bar_chart = stress_level.vbar(x='Occupation', top='AverageStressLevel', source=source, width=0.8,
                            fill_color = factor_cmap('Occupation', palette=colors, factors=occupations))
    occupation_bars[occupation] = bar_chart # append bars in bar chart to occupation_bars dictionary

# add a hover tool
hover = HoverTool(tooltips=[("Average Stress Level", "@AverageStressLevel")])
stress_level.add_tools(hover)

# show the figure
show(stress_level)

In [98]:
# set the bars in the plot to be invisible by default
for occupation_bar in occupation_bars.values():
  occupation_bar.visible = False

# any changes to plot will be stored in notebook handle
handle = show(stress_level, notebook_handle=True)

# define our SelectMultiple widget
occupations_choose = SelectMultiple(
  options=occupation_bars.keys(),
  rows = 11,
  description = "Ctrl or Shift to Select Multiple"
)

@interact(selected=occupations_choose) # selection from list in our widget
def updated_plot(selected):
  for occupation_bar_name, occupation_bar in occupation_bars.items():
    if occupation_bar_name in selected:
      occupation_bar.visible = True # if the occupation bar is selected then that bar should be visible
    else:
      occupation_bar.visible = False # if the occupation bar is not selected then that bar should not be visible
    push_notebook(handle=handle) # show updated plot

interactive(children=(SelectMultiple(description='Ctrl or Shift to Select Multiple', options=('Accountant', 'D…

# Show Lifestyle Factors By BMI Category and Use Gridplot

My next goal will be to show a four plots containing four lifestyle factors and a measurement of these factors by bmi category. At the end I will show these figures in a grid using `gridplot`.

In [99]:
# colour palette for bars
color_palette = ['#990000', '#b45f06', '#bf9000', '#38761d']

In [100]:
# create average daily steps column in dataframe
average_daily_steps = shl_data.groupby('BMI Category')['Daily Steps'].mean().reset_index()

bmi = average_daily_steps['BMI Category'].tolist() # convert bmi categories into a list

# define our figure
daily_steps = figure(x_range=bmi, title="Average Daily Steps by BMI Category",
                      x_axis_label="BMI Category", y_axis_label="Average Daily Steps",
                      y_range=(0, 10000), width=500, height=300)

# define our data source for our plot
source = ColumnDataSource(data={
        'BMI': average_daily_steps['BMI Category'],
        'AverageDailySteps': average_daily_steps['Daily Steps']
    })

# create a bar chart with our figure
daily_steps.vbar(x='BMI', top='AverageDailySteps', source=source, width=0.8,
                 fill_color = factor_cmap('BMI', palette=color_palette, factors=bmi))

# add a hover tool
hover = HoverTool(tooltips=[("Average Daily Steps", "@AverageDailySteps{0.2f}")])
daily_steps.add_tools(hover)

In [101]:
# do the same thing as for daily steps with heart rate

average_heart_rate = shl_data.groupby('BMI Category')['Heart Rate'].mean().reset_index()

bmi = average_heart_rate['BMI Category'].tolist()

heart_rate = figure(x_range=bmi, title="Average Heart Rate by BMI Category",
                      x_axis_label="BMI Category", y_axis_label="Average Heart Rate",
                      y_range=(0, 100), width=500, height=300)

source = ColumnDataSource(data={
        'BMI': average_heart_rate['BMI Category'],
        'AverageHeartRate': average_heart_rate['Heart Rate']
    })

heart_rate.vbar(x='BMI', top='AverageHeartRate', source=source, width=0.8,
                 fill_color = factor_cmap('BMI', palette=color_palette, factors=bmi))

hover = HoverTool(tooltips=[("Average Heart Rate", "@AverageHeartRate{0.2f}")])
heart_rate.add_tools(hover)

In [102]:
# do the same thing as with heart rate for activity level

average_activity_level = shl_data.groupby('BMI Category')['Physical Activity Level'].mean().reset_index()

bmi = average_activity_level['BMI Category'].tolist()

activity_level = figure(x_range=bmi, title="Average Physical Activity Level by BMI Category",
                      x_axis_label="BMI Category", y_axis_label="Average Physical Activity Level",
                      y_range=(0, 100), width=500, height=300)

source = ColumnDataSource(data={
        'BMI': average_activity_level['BMI Category'],
        'AverageActivityLevel': average_activity_level['Physical Activity Level']
    })

activity_level.vbar(x='BMI', top='AverageActivityLevel', source=source, width=0.8,
                 fill_color = factor_cmap('BMI', palette=color_palette, factors=bmi))

hover = HoverTool(tooltips=[("Average Physical Activity Level", "@AverageActivityLevel{0.2f}")])
activity_level.add_tools(hover)

In [103]:
# first, expand blood pressure column into two separate columns with systolic and diastolic blood pressure
# second, calculate pulse pressure as the difference between systolic and diastolic blood pressure

shl_data[['Systolic Blood Pressure','Dyastolic Blood Pressure']] = shl_data['Blood Pressure'].str.split('/', expand=True).astype(int)
shl_data['Pulse Pressure'] = shl_data['Systolic Blood Pressure'] - shl_data['Dyastolic Blood Pressure']

In [104]:
# do the same as with activity level for pulse pressure

average_pulse_pressure = shl_data.groupby('BMI Category')['Pulse Pressure'].mean().reset_index()

bmi = average_pulse_pressure['BMI Category'].tolist()

pulse_pressure = figure(x_range=bmi, title="Average Pulse Pressure by BMI Category",
                      x_axis_label="BMI Category", y_axis_label="Average Pulse Pressure",
                      y_range=(0, 70), width=500, height=300)

source = ColumnDataSource(data={
        'BMI': average_pulse_pressure['BMI Category'],
        'AveragePulsePressure': average_pulse_pressure['Pulse Pressure']
    })

pulse_pressure.vbar(x='BMI', top='AveragePulsePressure', source=source, width=0.8,
                 fill_color = factor_cmap('BMI', palette=color_palette, factors=bmi))

hover = HoverTool(tooltips=[("Average Pulse Pressure", "@AveragePulsePressure{0.2f}")])
pulse_pressure.add_tools(hover)

In [105]:
grid = gridplot([[daily_steps, heart_rate], [activity_level, pulse_pressure]], sizing_mode='scale_width') # create our grid plot
show(grid) # show our plot

# Deploy Visualizations to Website

Next I will output an html file that includes my visualizations I have made in my notebook. I will also include an accessible website at the bottom of this notebook that contains the contents of my html file.

Note that for the stress level by occupation visualization, I had to deploy the visualization without the widget. This is because the widget I used is from `ipywidgets` library rather than from `bokeh` and therefore the column() argument would not be able to read my widget.

In [106]:
# redefine the average stress level plot without the widget included

average_stress_level = shl_data.groupby('Occupation')['Stress Level'].mean().reset_index()

occupations = average_stress_level['Occupation'].tolist()

stress_level = figure(x_range=occupations, title="Average Stress Level by Occupation",
                      x_axis_label="Occupation", y_axis_label="Average Stress Level",
                      y_range=(0, 10), width=1050, height=300)

source = ColumnDataSource(data={
        'Occupation': average_stress_level['Occupation'],
        'AverageStressLevel': average_stress_level['Stress Level']
    })

stress_level.vbar(x='Occupation', top='AverageStressLevel', source=source, width=0.8,
                            fill_color = factor_cmap('Occupation', palette=colors, factors=occupations))

hover = HoverTool(tooltips=[("Average Stress Level", "@AverageStressLevel")])
stress_level.add_tools(hover)


# output an html file titled index.html
output_file('index.html')

# show all of the plots in one plot displayed in column format
plots = column(column(sleep_duration, sleep_quality), stress_level, grid)

show(plots) # show the plots and output the associated html file

# End Notes

**Visualization website:**

https://sleephealthvisualization.netlify.app

**Sources:**

Data Set Source: https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset

Blood Pressure Info: https://chatgpt.com/

Colour Palette Source: https://www.color-hex.com/color-palettes/

