# Problem Statement: Impact of Lifestyle on Sleep Health

# Introduction

Sleep plays a vital role in maintaining overall health and well-being. However, various lifestyle factors can significantly impact sleep quality and duration. Understanding the relationship between lifestyle choices and sleep health is essential for individuals seeking to improve their sleep patterns. As a data scientist, analyzing these factors can provide valuable insights into the causes and effects of sleep disturbances, helping individuals make informed decisions to optimize their sleep health.

# Dataset Overview

The Sleep Health and Lifestyle Dataset comprises 400 rows and 13 columns, providing comprehensive information on sleep-related variables and daily habits. It covers a wide range of factors, including sleep duration, sleep quality, physical activity levels, stress levels, BMI category, blood pressure, heart rate, daily steps, and the presence or absence of sleep disorders. This dataset offers valuable insights into the relationship between lifestyle and sleep health.

# Key Features of the Dataset

Comprehensive Sleep Metrics: The dataset includes variables related to sleep duration, quality, and factors influencing sleep patterns. These metrics allow for a detailed analysis of sleep-related aspects.

Lifestyle Factors: The dataset provides information on various lifestyle factors, such as physical activity levels and stress levels. These variables allow for the exploration of how lifestyle choices impact sleep health.

Cardiovascular Health: Blood pressure and heart rate measurements are included in the dataset. These variables enable the examination of the relationship between cardiovascular health and sleep-related factors.

Sleep Disorder Analysis: The presence or absence of sleep disorders, such as Insomnia and Sleep Apnea, is indicated in the dataset. This information allows for the identification and analysis of sleep disorders within the context of other variables.

# Dataset Columns

The dataset consists of the following columns:

Person ID: An identifier for each individual in the dataset.

Gender: The gender of the person (Male/Female).

Age: The age of the person in years.

Occupation: The occupation or profession of the person.

Sleep Duration (hours): The number of hours the person sleeps per day.

Quality of Sleep (scale: 1-10): A subjective rating of the quality of sleep, ranging from 1 to 10

Physical Activity Level (minutes/day): The number of minutes the person engages in physical activity daily.

Stress Level (scale: 1-10): A subjective rating of the stress level experienced by the person, ranging from 1 to 10.

BMI Category: The BMI category of the person (e.g., Underweight, Normal, Overweight).

Blood Pressure (systolic/diastolic): The blood pressure measurement of the person, indicated as systolic pressure over diastolic pressure.

Heart Rate (bpm): The resting heart rate of the person in beats per minute.

Daily Steps: The number of steps the person takes per day.

Sleep Disorder: The presence or absence of a sleep disorder in the person (None, Insomnia, Sleep Apnea).

This dataset provides a rich source of information for exploring the impact of various lifestyle factors on sleep health. Analyzing this data can yield valuable insights and assist in developing strategies to improve sleep quality and overall well-being.

# Importing

In [1]:
# Importing
    
import numpy as np
import pandas as pd
import os
import plotly.graph_objs as go
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

In [2]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/sleep-health-and-lifestyle-dataset/Sleep_health_and_lifestyle_dataset.csv


# Load the dataset

In [3]:

df = pd.read_csv('/kaggle/input/sleep-health-and-lifestyle-dataset/Sleep_health_and_lifestyle_dataset.csv')


# Check the head and tail of the dataset

In [4]:
df.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,126/83,77,4200,
1,2,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
2,3,Male,28,Doctor,6.2,6,60,8,Normal,125/80,75,10000,
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,140/90,85,3000,Sleep Apnea


In [5]:
df.tail()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Blood Pressure,Heart Rate,Daily Steps,Sleep Disorder
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea
373,374,Female,59,Nurse,8.1,9,75,3,Overweight,140/95,68,7000,Sleep Apnea


# Data Outline and Preprocessing


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Person ID                374 non-null    int64  
 1   Gender                   374 non-null    object 
 2   Age                      374 non-null    int64  
 3   Occupation               374 non-null    object 
 4   Sleep Duration           374 non-null    float64
 5   Quality of Sleep         374 non-null    int64  
 6   Physical Activity Level  374 non-null    int64  
 7   Stress Level             374 non-null    int64  
 8   BMI Category             374 non-null    object 
 9   Blood Pressure           374 non-null    object 
 10  Heart Rate               374 non-null    int64  
 11  Daily Steps              374 non-null    int64  
 12  Sleep Disorder           374 non-null    object 
dtypes: float64(1), int64(7), object(5)
memory usage: 38.1+ KB


In [7]:
df.describe()

Unnamed: 0,Person ID,Age,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,Heart Rate,Daily Steps
count,374.0,374.0,374.0,374.0,374.0,374.0,374.0,374.0
mean,187.5,42.184492,7.132086,7.312834,59.171123,5.385027,70.165775,6816.84492
std,108.108742,8.673133,0.795657,1.196956,20.830804,1.774526,4.135676,1617.915679
min,1.0,27.0,5.8,4.0,30.0,3.0,65.0,3000.0
25%,94.25,35.25,6.4,6.0,45.0,4.0,68.0,5600.0
50%,187.5,43.0,7.2,7.0,60.0,5.0,70.0,7000.0
75%,280.75,50.0,7.8,8.0,75.0,7.0,72.0,8000.0
max,374.0,59.0,8.5,9.0,90.0,8.0,86.0,10000.0


In [8]:
print('Unique Values of Occupation are', df['Occupation'].unique())

print('\nUnique Values of BMI Category are', df['BMI Category'].unique())

print('\nUnique Values of Sleep Disorder are', df['Sleep Disorder'].unique())


Unique Values of Occupation are ['Software Engineer' 'Doctor' 'Sales Representative' 'Teacher' 'Nurse'
 'Engineer' 'Accountant' 'Scientist' 'Lawyer' 'Salesperson' 'Manager']

Unique Values of BMI Category are ['Overweight' 'Normal' 'Obese' 'Normal Weight']

Unique Values of Sleep Disorder are ['None' 'Sleep Apnea' 'Insomnia']


# Preprocessing - Divide 'Blood Pressure' to highest and lowest

In [9]:
df['Blood Pressure'].unique()

array(['126/83', '125/80', '140/90', '120/80', '132/87', '130/86',
       '117/76', '118/76', '128/85', '131/86', '128/84', '115/75',
       '135/88', '129/84', '130/85', '115/78', '119/77', '121/79',
       '125/82', '135/90', '122/80', '142/92', '140/95', '139/91',
       '118/75'], dtype=object)

In [10]:
df1 = pd.concat([df, df['Blood Pressure'].str.split('/', expand=True)], axis=1).drop('Blood Pressure', axis=1)

In [11]:
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,0,1
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,77,4200,,126,83
1,2,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,,125,80
2,3,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,,125,80
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140,90
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140,90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95


In [12]:
df1 = df1.rename(columns={0: 'BloodPressure_Upper_Value', 1: 'BloodPressure_Lower_Value'})

In [13]:
df1

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,BloodPressure_Upper_Value,BloodPressure_Lower_Value
0,1,Male,27,Software Engineer,6.1,6,42,6,Overweight,77,4200,,126,83
1,2,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,,125,80
2,3,Male,28,Doctor,6.2,6,60,8,Normal,75,10000,,125,80
3,4,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140,90
4,5,Male,28,Sales Representative,5.9,4,30,8,Obese,85,3000,Sleep Apnea,140,90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
369,370,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
370,371,Female,59,Nurse,8.0,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
371,372,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95
372,373,Female,59,Nurse,8.1,9,75,3,Overweight,68,7000,Sleep Apnea,140,95


In [14]:
df1['BloodPressure_Upper_Value'] = df1['BloodPressure_Upper_Value'].astype(float)
df1['BloodPressure_Lower_Value'] = df1['BloodPressure_Lower_Value'].astype(float)


In [15]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 374 entries, 0 to 373
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Person ID                  374 non-null    int64  
 1   Gender                     374 non-null    object 
 2   Age                        374 non-null    int64  
 3   Occupation                 374 non-null    object 
 4   Sleep Duration             374 non-null    float64
 5   Quality of Sleep           374 non-null    int64  
 6   Physical Activity Level    374 non-null    int64  
 7   Stress Level               374 non-null    int64  
 8   BMI Category               374 non-null    object 
 9   Heart Rate                 374 non-null    int64  
 10  Daily Steps                374 non-null    int64  
 11  Sleep Disorder             374 non-null    object 
 12  BloodPressure_Upper_Value  374 non-null    float64
 13  BloodPressure_Lower_Value  374 non-null    float64

# Handling Categorical Variables

In [16]:
#import label encoder
from sklearn import preprocessing 
#make an instance of Label Encoder
label_encoder = preprocessing.LabelEncoder()
df1['Gender'] = label_encoder.fit_transform(df1['Gender'])
df1['Occupation'] = label_encoder.fit_transform(df1['Occupation'])
df1['BMI Category'] = label_encoder.fit_transform(df1['BMI Category'])
df1['Sleep Disorder'] = label_encoder.fit_transform(df1['Sleep Disorder'])
df1.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,BloodPressure_Upper_Value,BloodPressure_Lower_Value
0,1,1,27,9,6.1,6,42,6,3,77,4200,1,126.0,83.0
1,2,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0
2,3,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0
3,4,1,28,6,5.9,4,30,8,2,85,3000,2,140.0,90.0
4,5,1,28,6,5.9,4,30,8,2,85,3000,2,140.0,90.0


In [17]:
# Outlier Removal
num_col = ['Age', 'Sleep Duration', 'Quality of Sleep', 'Physical Activity Level', 'Stress Level',
           'Heart Rate', 'Daily Steps', 'BloodPressure_Upper_Value', 'BloodPressure_Lower_Value']

Q1 = df1[num_col].quantile(0.25)
Q3 = df1[num_col].quantile(0.75)
IQR = Q3 - Q1

df1 = df1[~((df1[num_col] < (Q1 - 1.5 * IQR)) | (df1[num_col] > (Q3 + 1.5 * IQR))).any(axis=1)]


In [18]:
df1.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,BloodPressure_Upper_Value,BloodPressure_Lower_Value
0,1,1,27,9,6.1,6,42,6,3,77,4200,1,126.0,83.0
1,2,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0
2,3,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0
7,8,1,29,1,7.8,7,75,6,0,70,8000,1,120.0,80.0
8,9,1,29,1,7.8,7,75,6,0,70,8000,1,120.0,80.0


# Visualization

In [19]:
# Correlation Heatmap
fig = px.imshow(df1.drop('Person ID', axis=1).corr())
fig.show()

In [20]:
# Pairplot
fig = px.scatter_matrix(df1.drop(['Person ID'], axis=1), color='Sleep Disorder')
fig.show()

In [21]:
# Histogram by Sleep Disorder
fig = px.histogram(df1, x='Sleep Duration', color='Sleep Disorder', marginal='rug', nbins=30)
fig.update_layout(title='Histogram by Sleep Disorder',
                  xaxis=dict(title='Sleep Duration'),
                  yaxis=dict(title='Count'),
                  legend=dict(title='Sleep Disorder'),
                  showlegend=True)
fig.show()

In [22]:
# Histogram by BMI Category
fig = px.histogram(df1, x='Sleep Duration', color='BMI Category', marginal='rug', nbins=30)
fig.update_layout(title='Histogram by BMI Category',
                  xaxis=dict(title='Sleep Duration'),
                  yaxis=dict(title='Count'),
                  legend=dict(title='BMI Category'),
                  showlegend=True)
fig.show()

In [23]:
# Boxplot by Gender
fig = px.box(df1, x='Gender', y='Sleep Duration', color='Gender')
fig.update_layout(title='Boxplot by Gender',
                  xaxis=dict(title='Gender'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [24]:
# Boxplot by Occupation
fig = px.box(df1, x='Occupation', y='Sleep Duration', color='Occupation')
fig.update_layout(title='Boxplot by Occupation',
                  xaxis=dict(title='Occupation'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [25]:
# Boxplot by BMI Category
fig = px.box(df1, x='BMI Category', y='Sleep Duration', color='BMI Category')
fig.update_layout(title='Boxplot by BMI Category',
                  xaxis=dict(title='BMI Category'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [26]:
# Boxplot by Sleep Disorder
fig = px.box(df1, x='Sleep Disorder', y='Sleep Duration', color='Sleep Disorder')
fig.update_layout(title='Boxplot by Sleep Disorder',
                  xaxis=dict(title='Sleep Disorder'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [27]:
# Analysis - "Relationship between sleep duration and body mass index depends on age"

# Scatterplot with Age, Sleep Duration and BMI Category
fig = px.scatter(df1, x='Age', y='Sleep Duration', color='BMI Category', hover_data=['Age', 'Sleep Duration'])
fig.update_layout(title='Scatterplot: Age vs Sleep Duration (Color: BMI Category)',
                  xaxis=dict(title='Age'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [28]:
df1['Age'].unique()

array([27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
       45, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59])

# Create age group 20s, 30s, 40s, and 50s

In [29]:
# Create age group 20s, 30s, 40s, and 50s
df1['Age_bin'] = pd.cut(df1['Age'], [20, 30, 40, 50, 60], labels=['20s', '30s', '40s', '50s'])

In [30]:
# Boxplot: BMI Category by Age_bin
fig = px.box(df1, x='Age_bin', y='BMI Category', color='Age_bin')
fig.update_layout(title='Boxplot: BMI Category by Age_bin',
                  xaxis=dict(title='Age_bin'),
                  yaxis=dict(title='BMI Category'))
fig.show()

In [31]:
# Boxplot: Sleep Duration by Age_bin
fig = px.box(df1, x='Age_bin', y='Sleep Duration', color='Age_bin')
fig.update_layout(title='Boxplot: Sleep Duration by Age_bin',
                  xaxis=dict(title='Age_bin'),
                  yaxis=dict(title='Sleep Duration'))
fig.show()

In [32]:
# Age_bin, BMI Category, and Sleep Duration Boxplot by Occupation
df_long = pd.melt(df1, id_vars=['Occupation'], value_vars=['Age_bin', 'BMI Category', 'Sleep Duration'],
                  var_name='Variable', value_name='Value')

fig = px.box(df_long, x='Occupation', y='Value', color='Variable')
fig.update_layout(title='Boxplot: Age_bin, BMI Category, and Sleep Duration by Occupation',
                  xaxis=dict(title='Occupation'),
                  yaxis=dict(title='Value'))
fig.show()


In [33]:
df1.head()

Unnamed: 0,Person ID,Gender,Age,Occupation,Sleep Duration,Quality of Sleep,Physical Activity Level,Stress Level,BMI Category,Heart Rate,Daily Steps,Sleep Disorder,BloodPressure_Upper_Value,BloodPressure_Lower_Value,Age_bin
0,1,1,27,9,6.1,6,42,6,3,77,4200,1,126.0,83.0,20s
1,2,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0,20s
2,3,1,28,1,6.2,6,60,8,0,75,10000,1,125.0,80.0,20s
7,8,1,29,1,7.8,7,75,6,0,70,8000,1,120.0,80.0,20s
8,9,1,29,1,7.8,7,75,6,0,70,8000,1,120.0,80.0,20s


In [34]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 359 entries, 0 to 373
Data columns (total 15 columns):
 #   Column                     Non-Null Count  Dtype   
---  ------                     --------------  -----   
 0   Person ID                  359 non-null    int64   
 1   Gender                     359 non-null    int64   
 2   Age                        359 non-null    int64   
 3   Occupation                 359 non-null    int64   
 4   Sleep Duration             359 non-null    float64 
 5   Quality of Sleep           359 non-null    int64   
 6   Physical Activity Level    359 non-null    int64   
 7   Stress Level               359 non-null    int64   
 8   BMI Category               359 non-null    int64   
 9   Heart Rate                 359 non-null    int64   
 10  Daily Steps                359 non-null    int64   
 11  Sleep Disorder             359 non-null    int64   
 12  BloodPressure_Upper_Value  359 non-null    float64 
 13  BloodPressure_Lower_Value  359 non-

# Machine Learning - Multi-Classification Prediction

In [35]:
# Machine Learning - Multi-Classification Prediction
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Prepare the data

In [36]:
# Prepare the data
X = df1.drop(['Person ID', 'Sleep Disorder'], axis=1)
y = df1['Sleep Disorder']

In [37]:
X.drop(['Age_bin'], axis=1, inplace=True)

# Split the data into train and test sets

In [38]:
# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)

# Create a pipeline

In [39]:
# Create a pipeline with data preprocessing and classification model
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier())
])

In [40]:
# Define parameter grids for hyperparameter tuning
param_grid = [
    {
        'clf': [RandomForestClassifier()],
        'clf__n_estimators': [100, 200, 300,400],
        'clf__max_depth': [None, 5, 10,15],
    },
    {
        'clf': [SVC()],
        'clf__kernel': ['linear', 'rbf'],
        'clf__C': [0.01,0.1, 1, 10],
    },
    {
        'clf': [LogisticRegression()],
        'clf__solver': ['liblinear', 'lbfgs'],
        'clf__C': [0.01,0.1, 1, 10],
    },
    {
        'clf': [KNeighborsClassifier()],
        'clf__n_neighbors': [3, 5, 7,9],
    },
    {
        'clf': [GradientBoostingClassifier()],
        'clf__n_estimators': [100, 200, 300,400],
        'clf__learning_rate': [0.01, 0.1, 1],
    },
    {
        'clf': [DecisionTreeClassifier()],
        'clf__max_depth': [None, 5, 10,15],
    }
]

# Perform grid search for hyperparameter tuning

In [41]:
# Perform grid search for hyperparameter tuning
grid_search = GridSearchCV(pipeline, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Get the best model
best_model = grid_search.best_estimator_

# Calculate accuracy scores for each model
models = [
    ('Random Forest', RandomForestClassifier()),
    ('SVM', SVC()),
    ('Logistic Regression', LogisticRegression()),
    ('KNN', KNeighborsClassifier()),
    ('Gradient Boosting', GradientBoostingClassifier()),
    ('Decision Tree', DecisionTreeClassifier())
]

accuracy_scores = []
for name, model in models:
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('clf', model)
    ])
    pipeline.fit(X_train, y_train)
    y_pred = pipeline.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores.append(accuracy)

# Comparison Chart

In [42]:
# Comparison Chart
fig = go.Figure(data=go.Bar(x=[name for name, _ in models], y=accuracy_scores))
fig.update_layout(title='Comparison of Models',
                  xaxis=dict(title='Models'),
                  yaxis=dict(title='Accuracy Score'))
fig.show()

# Feature Importance

In [43]:
# Feature Importance
importance = best_model.named_steps['clf'].feature_importances_
feature_names = X.columns

sorted_indices = np.argsort(importance)[::-1]
sorted_importance = importance[sorted_indices]
sorted_features = feature_names[sorted_indices]

fig = go.Figure(data=go.Bar(x=sorted_features, y=sorted_importance))
fig.update_layout(title='Feature Importance',
                  xaxis=dict(title='Features'),
                  yaxis=dict(title='Importance'))
fig.show()

# Hey So best Algo/Model for this dataset called GradientBoostingClassifier() which gives me the best highend accuracy with 94%. 

# Hey if you like this notebook then please upvote it and share your feedback.