## Sleep Health and Lifestyle Dataset - Exploratory Data Analysis Project

###### Author: Leda Gale

### Dataset General Information

Title: Sleep Health and Lifestyle

Source: Figshare

Version: 2

Date: 13/12/2023

License: (https://creativecommons.org/licenses/by/4.0/)

### Importing the dataset

In [1]:
#Importing libraries
import pandas as pd
import numpy as np

In [2]:
#Importing the dataset
df = pd.read_csv("Sleep_health.csv")

#General review of the features included on the dataset
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
1,375,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
2,376,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
3,377,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,
4,378,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


In [3]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
1,375,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
2,376,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,
3,377,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,
4,378,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,


### Identifying dataset shape and attribute

In [4]:
#Identifying the dataset shape
df.shape

(186, 13)

The methods .head() and shape have been useful to explore which attributes are included in the dataset. A total of 13 attributes were identified with 186 entries or subjects of study. 

The attributes as indicated in the source are the following ones:
1. Person ID: Identifier for each individual
2. Gender: The gender of each person (Male/Female)
3. Age: The age of the person in years
4. Occupation: The occupation or profession of the person
5. Sleep Duration (hours): The number of hours the person sleeps per day
6. Quality of Sleep (scale 1-10): A subjective rating of the quality of sleep
7. Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily
8. Stress Level (scale 1-10): A subjective rating of the stress level experienced by the person
9. BMI Category: The BMI category of the person (Underweight, Normal, Overweight)
10. Blood Pressure(systolic/diastolic): The blood pressure measurement of the person is indicated as systolic pressure over diastolic pressure.
11. Daily steps: The number of steps the person takes per day.
12. Heart rate (bpm): The resting heart rate of the person in beats per minute
13. Sleep Disorder: The presence of a sleep disorder in the person (None, Insomnia, Sleep Apnea)

### Checking on the data types and null values on the datafame

In [6]:
#Searching for more information about the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 186 entries, 0 to 185
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                186 non-null    int64  
 1   Gender                   186 non-null    object 
 2   Age                      186 non-null    int64  
 3   Occupation               186 non-null    object 
 4   Sleep Duration           186 non-null    float64
 5   Quality of Sleep         186 non-null    int64  
 6   Physical Activity Level  186 non-null    int64  
 7   Stress Level             186 non-null    int64  
 8   BMI Category             186 non-null    object 
 9   Blood Pressure           186 non-null    object 
 10  Heart Rate               186 non-null    int64  
 11  Daily Steps              186 non-null    int64  
 12  Sleep Disorder           30 non-null     object 
dtypes: float64(1), int64(7), object(5)
memory usage: 19.0+ KB


In [7]:
#Null values revision
df.isna().sum()

Person ID                    0
Gender                       0
Age                          0
Occupation                   0
Sleep Duration               0
Quality of Sleep             0
Physical Activity Level      0
Stress Level                 0
BMI Category                 0
Blood Pressure               0
Heart Rate                   0
Daily Steps                  0
Sleep Disorder             156
dtype: int64

#### Observations:

After using the method .info() and .isna() it can be seen the attribute sleep disorder shows as if it has 156 null values, however, after the glimpse on the dataset done before, it has been identified these are actually values with the label of "Nan" which means in this case the person does not suffer of any sleeping disorder. 
To deal with these values, the "Nan" label will be replaced by "Nothing," which is not classified as a missing value and can be used for further analysis.

In [10]:
df.fillna({"Sleep Disorder": "Nothing"},inplace = True)

In [11]:
#Null values revision
df.isna().sum()

Person ID                  0
Gender                     0
Age                        0
Occupation                 0
Sleep Duration             0
Quality of Sleep           0
Physical Activity Level    0
Stress Level               0
BMI Category               0
Blood Pressure             0
Heart Rate                 0
Daily Steps                0
Sleep Disorder             0
dtype: int64

### Checking for duplicates

### Renaming variables for better understanding

In [15]:
df.duplicated().sum()

0

In [16]:
df = df.rename(columns = {"Sleep Duration": "Sleep_duration", "Quality of Sleep": "Sleeping_quality", "Physical Activity Level": "Activity_level", "Stress Level": "Stress_level", "BMI Category": "BMI", "Blood Pressure": "Blood_pressure", "Heart Rate": "Heart_rate", "Daily Steps": "Daily_steps", "Sleep Disorder": "Sleep_disorder"})

### Dropping unnecessary columns

In [17]:
df.drop(columns = ["Person ID"], inplace = True)
df

Unnamed: 0,Gender,Age,Occupation,Sleep_duration,Sleeping_quality,Activity_level,Stress_level,BMI,Blood_pressure,Heart_rate,Daily_steps,Sleep_disorder
0,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
1,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,Nothing
2,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,Nothing
3,Male,29,Doctor,6.1,6,30,8,Normal,120/80,70,8000,Nothing
4,Male,29,Doctor,7.8,7,75,6,Normal,120/80,70,8000,Nothing
...,...,...,...,...,...,...,...,...,...,...,...,...
181,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia
182,Male,43,Salesperson,6.5,6,45,7,Overweight,130/85,72,6000,Insomnia
183,Female,43,Teacher,6.7,7,45,4,Overweight,135/90,65,6000,Insomnia
184,Male,43,Salesperson,6.4,6,45,7,Overweight,130/85,72,6000,Insomnia


### Getting further information on relevant attributes

In [18]:
print(f'Gender: {df["Gender"].unique()}')
print(f'Gender: {df["Gender"].nunique()}')

Gender: ['Female' 'Male']
Gender: 2


In [19]:
print(f'Age: {df["Age"].unique()}')
print(f'Age: {df["Age"].nunique()}')

Age: [59 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43]
Age: 16


In [20]:
print(f'Occupation: {df["Occupation"].unique()}')
print(f'Occupation: {df["Occupation"].nunique()}')

Occupation: ['Nurse' 'Doctor' 'Engineer' 'Accountant' 'Scientist' 'Teacher'
 'Software Engineer' 'Lawyer' 'Salesperson']
Occupation: 9


In [21]:
print(f'BMI: {df["BMI"].unique()}')
print(f'BMI: {df["BMI"].nunique()}')

BMI: ['Overweight' 'Normal' 'Normal Weight' 'Obese']
BMI: 4


In [22]:
print(f'Sleep_disorder: {df["Sleep_disorder"].unique()}')
print(f'Sleep_disorder: {df["Sleep_disorder"].nunique()}')

Sleep_disorder: ['Sleep Apnea' 'Nothing' 'Insomnia']
Sleep_disorder: 3


#### Observation:

It has been identified that the range of ages included in the study spans from 29 to 59. The occupations of the users are varied and represented by different areas, including professionals from medicine, engineering, finance, science, education, justice, and sales. Regarding the BMI attribute, there are users with Normal, Obese, and Overweight. And about sleeping disorders, a small number of users suffered either insomnia or sleep apnea.

### Descriptive statistics

In [23]:
df.describe()

Unnamed: 0,Age,Sleep_duration,Sleeping_quality,Activity_level,Stress_level,Heart_rate,Daily_steps
count,186.0,186.0,186.0,186.0,186.0,186.0,186.0
mean,35.672043,7.083333,7.198925,58.016129,5.607527,70.204301,6829.569892
std,4.436333,0.623323,0.996317,18.09658,1.452583,3.293102,1331.179858
min,29.0,5.8,4.0,30.0,3.0,65.0,3300.0
25%,32.0,6.6,6.0,45.0,4.0,68.0,5500.0
50%,36.0,7.2,8.0,60.0,5.0,70.0,7000.0
75%,39.0,7.6,8.0,75.0,6.75,72.0,8000.0
max,59.0,8.1,9.0,90.0,8.0,84.0,8000.0


#### Observations