## Sleep Health and Lifestyle Dataset - Exploratory Data Analysis Project

###### Author: Leda Gale

### Dataset General Information

Title: Sleep Health and Lifestyle

Source: Figshare

Version: 2

Date: 13/12/2023

License: (https://creativecommons.org/licenses/by/4.0/)

### Importing the dataset

In [13]:
#Importing libraries
import pandas as pd
import numpy as np

In [14]:
#Importing the dataset
df = pd.read_csv("Sleep_health.csv")

#General review of the features included on the dataset
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
1,375,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
2,376,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
3,377,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,
4,378,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


In [20]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
1,375,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
2,376,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
3,377,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,
4,378,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


### Identifying dataset shape and attribute

In [15]:
#Identifying the dataset shape
df.shape

(186, 13)

The methods .head() and shape have been useful to explore which attributes are included in the dataset. A total of 13 attributes were identified with 186 entries or subjects of study. 

The attributes as indicated in the source are the following ones:
1. Person ID: Identifier for each individual
2. Gender: The gender of each person (Male/Female)
3. Age: The age of the person in years
4. Occupation: The occupation or profession of the person
5. Sleep Duration (hours): The number of hours the person sleeps per day
6. Quality of Sleep (scale 1-10): A subjective rating of the quality of sleep
7. Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily
8. Stress Level (scale 1-10): A subjective rating of the stress level experienced by the person
9. BMI Category: The BMI category of the person (Underweight, Normal, Overweight)
10. Blood Pressure(systolic/diastolic): The blood pressure measurement of the person is indicated as systolic pressure over diastolic pressure.
11. Daily steps: The number of steps the person takes per day.
12. Heart rate (bpm): The resting heart rate of the person in beats per minute
13. Sleep Disorder: The presence of a sleep disorder in the person (None, Insomnia, Sleep Apnea)

### Checking on the datatypes

In [16]:
df.dtypes

Person ID                    int64
Gender                      object
Age                          int64
Occupation                  object
Sleep Duration             float64
Quality of Sleep             int64
Physical Activity Level      int64
Stress Level                 int64
BMI Category                object
Blood Pressure              object
Heart Rate                   int64
Daily Steps                  int64
Sleep Disorder              object
dtype: object

In [17]:
#Searching for more information about the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 186 entries, 0 to 185
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                186 non-null    int64  
 1   Gender                   186 non-null    object 
 2   Age                      186 non-null    int64  
 3   Occupation               186 non-null    object 
 4   Sleep Duration           186 non-null    float64
 5   Quality of Sleep         186 non-null    int64  
 6   Physical Activity Level  186 non-null    int64  
 7   Stress Level             186 non-null    int64  
 8   BMI Category             186 non-null    object 
 9   Blood Pressure           186 non-null    object 
 10  Heart Rate               186 non-null    int64  
 11  Daily Steps              186 non-null    int64  
 12  Sleep Disorder           30 non-null     object 
dtypes: float64(1), int64(7), object(5)
memory usage: 19.0+ KB


In [18]:
#Null values revision
df.isna().sum()

Person ID                    0
Gender                       0
Age                          0
Occupation                   0
Sleep Duration               0
Quality of Sleep             0
Physical Activity Level      0
Stress Level                 0
BMI Category                 0
Blood Pressure               0
Heart Rate                   0
Daily Steps                  0
Sleep Disorder             156
dtype: int64

#### Observations:

After using the method .info() and .isna() it can be seen the attribute sleep disorder shows as if it has 156 null values, however, after the glimpse on the dataset done before, it has been identified this are actually values with the label of "Nan" which means in this case the person does not suffer of any sleeping disorder. 

In [25]:
df["Sleep Disorder"].fillna('Nothing')

0      Sleep Apnea
1          Nothing
2          Nothing
3          Nothing
4          Nothing
          ...     
181       Insomnia
182       Insomnia
183       Insomnia
184       Insomnia
185       Insomnia
Name: Sleep Disorder, Length: 186, dtype: object

In [26]:
df.duplicated().sum()

0

In [8]:
duplicates = df.duplicated("Person ID")
print(df[duplicates])

Empty DataFrame
Columns: [Person ID, Gender, Age, Occupation, Sleep Duration, Quality of Sleep, Physical Activity Level, Stress Level, BMI Category, Blood Pressure, Heart Rate, Daily Steps, Sleep Disorder]
Index: []


### Renaming variables for better understanding

In [10]:
df = df.rename(columns = {"Sleep Duration": "Sleep_duration", "Quality of Sleep": "Sleeping_quality", "Physical Activity Level": "Activity_level", "Stress Level": "Stress_level", "BMI Category": "BMI", "Blood Pressure": "Blood_pressure", "Heart Rate": "Heart_rate", "Daily Steps": "Daily_steps", "Sleep Disorder": "Sleep_disorder"})

### Dropping unnecessary columns

In [None]:
df.drop(columns = ["Person ID"], inplace = True)
df

In [None]:
print(f'Gender: {df["Gender"].unique()}')
print(f'Gender: {df["Gender"].nunique()}')

In [None]:
print(f'Age: {df["Age"].unique()}')
print(f'Age: {df["Age"].nunique()}')

In [None]:
print(f'Occupation: {df["Occupation"].unique()}')
print(f'Occupation: {df["Occupation"].nunique()}')

In [None]:
print(f'Sleep_duration: {df["Sleep_duration"].unique()}')
print(f'Sleep_duration: {df["Sleep_duration"].nunique()}')

In [None]:
print(f'Sleeping_quality: {df["Sleeping_quality"].unique()}')
print(f'Sleeping_quality: {df["Sleeping_quality"].nunique()}')

In [None]:
print(f'Activity_level: {df["Activity_level"].unique()}')
print(f'Activity_level: {df["Activity_level"].nunique()}')

In [None]:
print(f'Stress_level: {df["Stress_level"].unique()}')
print(f'Stress_level: {df["Stress_level"].nunique()}')

In [None]:
print(f'Sleeping_quality: {df["Sleeping_quality"].unique()}')
print(f'Sleeping_quality: {df["Sleeping_quality"].nunique()}')

In [None]:
print(f'BMI: {df["BMI"].unique()}')
print(f'BMI: {df["BMI"].nunique()}')

In [None]:
print(f'Daily_steps: {df["Daily_steps"].unique()}')
print(f'Daily_steps: {df["Daily_steps"].nunique()}')

In [11]:
print(f'Sleep_disorder: {df["Sleep_disorder"].unique()}')
print(f'Sleep_disorder: {df["Sleep_disorder"].nunique()}')

Sleep_disorder: ['Sleep Apnea' nan 'Insomnia']
Sleep_disorder: 2
