## About the Dataset:

- Dataset Overview:
The Sleep Health and Lifestyle Dataset comprises 400 rows and 13 columns, covering a wide range of variables related to sleep and daily habits. It includes details such as gender, age, occupation, sleep duration, quality of sleep, physical activity level, stress levels, BMI category, blood pressure, heart rate, daily steps, and the presence or absence of sleep disorders.

Key Features of the Dataset:
- Comprehensive Sleep Metrics: Explore sleep duration, quality, and factors influencing sleep patterns.
- Lifestyle Factors: Analyze physical activity levels, stress levels, and BMI categories.
- Cardiovascular Health: Examine blood pressure and heart rate measurements.
- Sleep Disorder Analysis: Identify the occurrence of sleep disorders such as Insomnia and Sleep Apnea.

Dataset Columns:
- `Person ID`: An identifier for each individual.
- `Gender`: The gender of the person (Male/Female).
- `Age`: The age of the person in years.
- `Occupation`: The occupation or profession of the person.
- `Sleep Duration (hours)`: The number of hours the person sleeps per day.
- `Quality of Sleep (scale: 1-10)`: A subjective rating of the quality of sleep, ranging from 1 to 10.
- `Physical Activity Level (minutes/day)`: The number of minutes the person engages in physical activity daily.
- `Stress Level (scale: 1-10)`: A subjective rating of the stress level experienced by the person, ranging from 1 to 10.
- `BMI Category`: The BMI category of the person (e.g., Underweight, Normal, Overweight).
- `Blood Pressure (systolic/diastolic)`: The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.
- `Heart Rate (bpm)`: The resting heart rate of the person in beats per minute.
- `Daily Steps`: The number of steps the person takes per day.
- `Sleep Disorder`: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

Details about Sleep Disorder Column:

- `None`: The individual does not exhibit any specific sleep disorder.
- `Insomnia`: The individual experiences difficulty falling asleep or staying asleep, leading to inadequate or poor-quality sleep.
- `Sleep Apnea`: The individual suffers from pauses in breathing during sleep, resulting in disrupted sleep patterns and potential health risks.

In [1]:
# importing the required packages

import pandas as pd
from IPython.display import display, HTML
import altair as alt



# Load the Sleep Health dataset
health_lifestyle = pd.read_csv("Sleep_health_and_lifestyle_dataset.csv")

### Objective

This exploratory data analysis's goal is to thoroughly investigate the connections between several dataset attributes, including stress levels, BMI, sleep duration, sleep disorders, and more. We might discover sleeping disorders in the dataset's subjects by using the analysis, which may reveal trends and patterns between various variables. We hope to uncover any possible work-related influences on sleep duration and quality by carefully analysing this dataset. In addition, we seek to identify correlations between BMI, stress levels, and sleep quality. It is essential to comprehend these dynamics in order to solve modern challenges that people could face on a daily basis in an effort to preserve a work-life balance. 

### Dataset Cleanup

- In the dataset, the sleep disorder attribute contains NaN values which means that a person has never experienced a sleeping disorder. Since this is an important attribute for our exploratory data analysis, we are replacing NaN with None so that we can visualize this attribute in future.

- Furthermore, the BMI Category attribute had Normal and Normal Weight as two of it's values which were essentially the same. We replaced Normal Weight with Normal, so the dataset now has three BMI Categories i.e Normal, Overweight, and Obese

In [2]:
## Dataset Cleanup
health_lifestyle['Sleep Disorder'] = health_lifestyle['Sleep Disorder'].fillna('None')

health_lifestyle['BMI Category'] = health_lifestyle['BMI Category'].replace('Normal Weight', 'Normal')

health_lifestyle.head(15)

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
5,6,Male,28,Software Engineer,5.9,4,30,8,Obese,140/90,85,3000,Insomnia
6,7,Male,29,Teacher,6.3,6,40,7,Obese,140/90,82,3500,Insomnia
7,8,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
8,9,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
9,10,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


## Task Analysis
### Task 1: Heatmap
#### Retrieve Value: What is the average Sleep Duration for different Occupations?

In [3]:
# Creating bins for age to represent age as age group(as an ordinal/categorical variable).
bins = [20, 30, 40, 50, 60]
labels = ['20-30', '30-40', '40-50', '50-60']


# Create a new column for Age Group
health_lifestyle['Age Group'] = pd.cut(health_lifestyle['Age'], bins=bins, labels=labels, right=False)


heatmap = alt.Chart(health_lifestyle).mark_rect().encode(
    alt.X('Occupation:O'),
    alt.Y('Age Group:O', sort=alt.EncodingSortField(field='Age Group', order='descending')),
    color='average(Sleep Duration):Q',
    tooltip=['average(Sleep Duration)']
).properties(
    width=300,
    height=300,
    title='Heatmap of Average Sleep Duration by Occupation and Age Group'
)
heatmap

### Task 2: Ridge Plot
#### Compute Derived Value: What is the Heart Rate distribution for individuals who report "Insomnia" as their Sleep Disoprder.

In [4]:
#create radio button for Gender (Male or Female)
gender_radio = alt.binding_radio(options=['Male', 'Female'])
gender_selection = alt.selection_point(fields=['Gender'], bind=gender_radio, name='Ridge_Plot', value = 'Male', )

ridge_plots = alt.Chart(health_lifestyle).transform_density(
    'Heart Rate',
    as_=['Heart Rate', 'density'],
    groupby=['BMI Category', 'Sleep Disorder', 'Gender']
).mark_area(
    fillOpacity=0.4,
    interpolate='cardinal',
    stroke='black',
    strokeWidth=0.5
).encode(
    alt.X('Heart Rate:Q', title='Heart Rate', axis=alt.Axis(grid=False), scale = alt.Scale(zero = False)),
    alt.Y('density:Q', title=None,axis=alt.Axis(grid=False) ),
    alt.Color('Sleep Disorder:N', scale=alt.Scale(scheme='category10'),
              title='Sleep Disorder'),
    alt.Column('BMI Category:N', title='BMI Category'),
).add_params(
    gender_selection
).transform_filter(
    gender_selection
).properties(
    width=300, 
    height=150
)

ridge_plots

### Task 3: Stacked Bar Chart
#### Characterize Distribution: How is the distribution of Sleep Disorders across different Genders and BMI Categories?

In [5]:
# Create a dropdown for BMI categories
bmi_dropdown = alt.binding_select(options=['Normal', 'Overweight', 'Obese'])
bmi_selection = alt.selection_point(fields=['BMI Category'], bind=bmi_dropdown, name='Stacked_Bar_Chart')


stacked_bar_chart_dropdown = alt.Chart(health_lifestyle).mark_bar(stroke='black').encode(
    alt.X('count()', stack='normalize', axis=alt.Axis(title='Percentage of disorder')),
    alt.Y('Gender:N', axis=alt.Axis(title='Gender')),
    color=alt.Color('Sleep Disorder:N', scale=alt.Scale(scheme='category10'), title='Sleep Disorder'),
    tooltip = ['count()', 'Sleep Disorder:N']
).add_params(
    bmi_selection
).transform_filter(
    bmi_selection
).properties(
    width=450,
    height=60,
    title='Distribution of Sleep Disorders across BMI Categories'
)

stacked_bar_chart_dropdown

### Task 4: Scatter Plot
#### Cluster: Are there clusters of Stress Levels for individuals based on Age and Sleep Duration?

In [6]:
scatter_plot = alt.Chart(health_lifestyle).mark_point(opacity = 0.5, filled = True).encode(
    alt.X('Sleep Duration:Q', axis=alt.Axis(grid=False), scale=alt.Scale(domain=(5.5, 9.0))),
    alt.Y('Age:Q', axis=alt.Axis(grid=False), scale=alt.Scale(domain=(25, 65))),
    size=alt.value(100),
    shape = 'Sleep Disorder:N',
    color=alt.Color('Stress Level:N', scale = alt.Scale(domain=[3,4,5,6,7,8], range=['#85c1e9', '#5dade2', '#3498db', '#f1948a', '#e74c3c', '#c0392b'])),
    tooltip=['Sleep Duration', 'Heart Rate', 'Stress Level']
).properties(
    width=450, 
    height=300,  
    title='Scatter Plot of Sleep Duration vs. Age Colored by Stress Level'
)
scatter_plot

## Creating Bi-direction Interaction for the Heatmap and Scatterplot

In [7]:
# Create the selection interval for the Scatter Plot
interval = alt.selection_interval(encodings=['x', 'y'], name='selector')

# Create the selection point for the Heatmap
point = alt.selection_point(fields=['Occupation', 'Age Group'], name = 'heatmap_selector', empty = True)

# Add the interaction between Scatterplot, heatmap and the bonus bar chart
scatter_plot = scatter_plot.add_params(interval).transform_filter(point)


heatmap = heatmap.transform_filter(interval).add_params(point)


bar_chart = alt.Chart(health_lifestyle).mark_bar().encode(
    alt.X('Occupation:O', axis = None),
    y='average(Physical Activity Level):Q',
    color=alt.condition(interval, alt.value('steelblue'), alt.value('#FFAE42'))
).properties(
    width=300,
    height=150  
).transform_filter(
    (interval & point)
)

## Creating the Final Dashboard

In [8]:
heat_scatter = alt.hconcat(heatmap, scatter_plot.encode(shape = 'Sleep Disorder')).resolve_scale(shape = 'independent')  
bar_stacked_bar = alt.hconcat(bar_chart, stacked_bar_chart_dropdown.encode(color = 'Sleep Disorder:N'), spacing = 150).resolve_scale(color = 'independent')
dashboard_1 = alt.vconcat(heat_scatter, bar_stacked_bar, spacing = 5)

# Note: The Radio button to switch between Male and Female manipulates the Ridge Plot while the BMI Dropdown manipulates the Stacked Bar Chart
# Note: Multiple grids on the heatmap can be selected by holding down the shift key

print('Note:The Radio button to switch between Male and Female manipulates the Ridge Plot while the BMI Dropdown manipulates the Stacked Bar Chart')
print('Note:Multiple grids on the heatmap can be selected by holding down the shift key')

# Create the dashboard
dashboard = alt.vconcat(dashboard_1, ridge_plots.encode(color = 'Sleep Disorder:N')).resolve_scale(color = 'independent')
dashboard

Note:The Radio button to switch between Male and Female manipulates the Ridge Plot while the BMI Dropdown manipulates the Stacked Bar Chart
Note:Multiple grids on the heatmap can be selected by holding down the shift key


The Radio button to switch between Male and Female manipulates the Ridge Plot while the BMI Dropdown manipulates the Stacked Bar Chart