In [1]:
#importing the necessary libraries
import numpy as numpy
import numpy as numpy
import matplotlib.pyplot as plt 
import seaborn as sns
import pandas as pd

# Sleep Disorder Prediction

The aim of the project is to analyze the person's lifestyles and medical variables such as age, BMI, physical activity, sleep duration, blood pressure, and many more, to predict the sleep disorder and its type.

### Data Dictionary

| Column Name | Description |
| --- | --- |
| Person_ID | Unique ID assigned to each person |
| Gender | The gender of the person (Male/Female) |
| Age | Age of the person in years |
| Occupation | The occupation of the person |
| Sleep_duration | The duration of sleep of the person in hours |
| Quality_of_sleep | A subjective rating of the quality of sleep, ranging from 1 to 10 |
| Physical_activity | The level of physical activity of the person (Low/Medium/High) |
| Stress Level | A subjective rating of the stress level, ranging from 1 to 10 |
| BMI_category | The BMI category of the person (Underweight/Normal/Overweight/Obesity) |
| Blood_pressure | The blood pressure of the person in mmHg |
| Heart_rate | The heart rate of the person in beats per minute |
| Daily Steps | The number of steps taken by the person per day |
| Sleep_disorder | The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea) |


In [None]:
# loading the dataset
df = pd.read_csv('Sleep_health_and_lifestyle_dataset.csv')
df.head()

In [None]:
# checking for missing values
df.isnull().sum()


In [None]:
# replacing the null values with 'None' in the column 'Sleep Disorder'
df['Sleep Disorder'].fillna('None', inplace=True)

## Data Preprocessing Part 1


In [None]:
# splitting the blood pressure into two columns
df['systolic_bp'] = df['Blood Pressure'].apply(lambda x: x.split('/')[0])
df['diastolic_bp'] = df['Blood Pressure'].apply(lambda x: x.split('/')[1])
# dropping the blood pressure column
df.drop('Blood Pressure', axis=1, inplace=True)


In [None]:
# formatting graphs
sns.set(style="whitegrid")
%matplotlib inline


In [None]:
# splitting dataset into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df.drop('Sleep Disorder', axis=1), df['Sleep Disorder'], test_size=0.3, random_state=42)


## Model Building

To predict sleep disorders using classification algorithms, we will use:
1. Decision Tree Classifier
2. Random Forest Classifier


### Decision Tree Classifier

In [None]:
from sklearn.tree import DecisionTreeClassifier
dtree = DecisionTreeClassifier()
dtree


In [None]:
# Training the model with training dataset
dtree.fit(X_train, y_train)

# Training accuracy
print("Training Accuracy:", dtree.score(X_train, y_train))



### Decision Tree Model Evaluation


In [None]:
# make predictions
d_pred = dtree.predict(X_test)
d_pred


In [None]:
# display confusion matrix
from sklearn.metrics import confusion_matrix
sns.heatmap(confusion_matrix(y_test, d_pred), annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix')
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()


The confusion matrix shows true positives (correct predictions) along the diagonal. The off-diagonal cells indicate false positives (incorrect predictions).


### Distribution Plot for Predicted and Actual Values

In [None]:
# visualize how well model's predictions match actual values
ax = sns.distplot(y_test, hist=False, color="r", label="Actual Value")
sns.distplot(d_pred, hist=False, color="b", label="Fitted Values" , ax=ax)
plt.title('Actual vs Fitted Values for Sleep Disorder Prediction')
plt.xlabel('Sleep Disorder')
plt.ylabel('Proportion of People')
plt.show()


The actual values are shown in red, while the predicted values are in blue. The model's predictions follow the actual data trend, but there are noticeable differences, indicating the model may not predict values with perfect accuracy.


##### Classification Report

In [None]:
# generate report
from sklearn.metrics import classification_report
print(classification_report(y_test, d_pred))


The model achieves an accuracy of 87% with an average F1 score of 0.83, indicating it performs fairly well in predicting sleep disorders.

### Random Forest Classifier


In [None]:
# initialize RandomForectClassifier
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=100, random_state=42)


In [None]:
# training the model with train dataset
rfc.fit(X_train, y_train)


## Conclusion

This project explored the prediction of sleep disorders using classification algorithms. The Decision Tree model performed well, achieving an accuracy of 87% with a decent F1 score. However, additional fine-tuning and testing with models like the Random Forest classifier can potentially enhance prediction accuracy.
