# Sleep Health and Lifestyle

This synthetic dataset contains sleep and cardiovascular metrics as well as lifestyle factors of close to 400 fictive persons.

The workspace is set up with one CSV file, `data.csv`, with the following columns:

- `Person ID`
- `Gender`
- `Age`
- `Occupation`
- `Sleep Duration`: Average number of hours of sleep per day
- `Quality of Sleep`: A subjective rating on a 1-10 scale
- `Physical Activity Level`: Average number of minutes the person engages in physical activity daily
- `Stress Level`: A subjective rating on a 1-10 scale
- `BMI Category`
- `Blood Pressure`: Indicated as systolic pressure over diastolic pressure
- `Heart Rate`: In beats per minute
- `Daily Steps`
- `Sleep Disorder`: One of `None`, `Insomnia` or `Sleep Apnea`

Check out the guiding questions or the scenario described below to get started with this dataset!
Feel free to make this workspace yours by adding and removing cells, or editing any of the existing cells.

Source: [Kaggle](https://www.kaggle.com/datasets/uom190346a/sleep-health-and-lifestyle-dataset/)

### 🔍 **Scenario: Automatically identify potential sleep disorders**

This scenario helps you develop an end-to-end project for your portfolio.

Background: You work for a health insurance company and are tasked to identify whether or not a potential client is likely to have a sleep disorder. The company wants to use this information to determine the premium they want the client to pay.

**Objective**: Construct a classifier to predict the presence of a sleep disorder based on the other columns in the dataset.

Check out our [Linear Classifiers course](https://app.datacamp.com/learn/courses/linear-classifiers-in-python) (Python) or [Supervised Learning course](https://app.datacamp.com/learn/courses/supervised-learning-in-r-classification) (R) for a quick introduction to building classifiers.


You can query the pre-loaded CSV files using SQL directly. Here’s a **sample query**:

# Project

## Importing Required Libraries

In [65]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Loading Data

In [66]:
file_path = r"C:\Users\LENOVO\Desktop\sleep_health_and_life_style\data\data.csv"
data = pd.read_csv(file_path)

In [67]:
data.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


## First Look to Dataset

In [68]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           155 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


In [69]:
data['Gender'].value_counts()

Gender
Male      189
Female    185
Name: count, dtype: int64

In [70]:
data['Occupation'].value_counts()

Occupation
Nurse                   73
Doctor                  71
Engineer                63
Lawyer                  47
Teacher                 40
Accountant              37
Salesperson             32
Scientist                4
Software Engineer        4
Sales Representative     2
Manager                  1
Name: count, dtype: int64

In [71]:
data["Sleep Disorder"].value_counts()

Sleep Disorder
Sleep Apnea    78
Insomnia       77
Name: count, dtype: int64

In [72]:
data["Sleep Disorder"].fillna("Healthy", inplace=True)
data["Sleep Disorder"].value_counts()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data["Sleep Disorder"].fillna("Healthy", inplace=True)


Sleep Disorder
Healthy        219
Sleep Apnea     78
Insomnia        77
Name: count, dtype: int64

## Feature Engineering

In [73]:
if "Person ID" in data.columns:
	data.drop("Person ID", axis=1, inplace=True)

### is_sleep_disorder

In [74]:
data['is_sleep_disorder'] = data['Sleep Disorder'].apply(lambda x: 1 if x != 'Healthy' else 0)

### Gender

In [75]:
data['Gender'] = data['Gender'].apply(lambda x: 0 if x == 'Female' else 1)
data["Gender"] = data["Gender"].astype('int')
data.head()

Unnamed: 0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder,is_sleep_disorder
0,1,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,Healthy,0
1,1,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,Healthy,0
2,1,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,Healthy,0
3,1,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea,1
4,1,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea,1


### BMI Category

In [76]:
# BMI Category
data['BMI Category'].value_counts()

BMI Category
Normal           195
Overweight       148
Normal Weight     21
Obese             10
Name: count, dtype: int64

In [77]:
data.loc[data["BMI Category"] == "Normal Weight", "BMI Category"] = "Normal"
data['BMI Category'].value_counts()

BMI Category
Normal        216
Overweight    148
Obese          10
Name: count, dtype: int64

In [78]:
data['BMI Category'] = data['BMI Category'].apply(lambda x: 0 if x == 'Normal' else 1 if x == 'Overweight' else 3)
data["BMI Category"] = data["BMI Category"].astype(int)
data.head()

Unnamed: 0,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder,is_sleep_disorder
0,1,27,Software Engineer,6.1,6,42,6,1,126/83,77,4200,Healthy,0
1,1,28,Doctor,6.2,6,60,8,0,125/80,75,10000,Healthy,0
2,1,28,Doctor,6.2,6,60,8,0,125/80,75,10000,Healthy,0
3,1,28,Sales Representative,5.9,4,30,8,3,140/90,85,3000,Sleep Apnea,1
4,1,28,Sales Representative,5.9,4,30,8,3,140/90,85,3000,Sleep Apnea,1


## Save Data

In [79]:
data.to_csv(r"C:\Users\LENOVO\Desktop\sleep_health_and_life_style\data\cleaned_data.csv", index=False)