![](https://t4.ftcdn.net/jpg/00/60/96/23/240_F_60962332_pkCnhd9oKWA7aPwDD24yfpoNIw5nMcx1.jpg)

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths.This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.


Attribute Information

1) id: unique identifier

2) gender: "Male", "Female" or "Other"

3) age: age of the patient

4) hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension

5) heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease

6) ever_married: "No" or "Yes"

7) work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"

8) Residence_type: "Rural" or "Urban"

9) avg_glucose_level: average glucose level in blood

10) bmi: body mass index

11) smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"

12) stroke: 1 if the patient had a stroke or 0 if not


** Load the Required Libraries**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import random
cmap = plt.get_cmap('Spectral')
colors = [cmap(i) for i in np.linspace(0, 1, 15)]


**Load the dataset**

In [None]:
st=pd.read_csv('../input/stroke-prediction-dataset/healthcare-dataset-stroke-data.csv')

In [None]:
st.head()

In [None]:
st.tail()

In [None]:
st.info()

In [None]:
st.isnull().sum()

Found null value in Bmi Columns. Need to remove null values.

In [None]:
st['bmi'] = st['bmi'].fillna(st['bmi'].mean())

In [None]:
st.isnull().sum()

Null Value of BMI column removed Sucessfully.

In [None]:
st.describe()

In [None]:
st.dtypes

In [None]:
st['stroke'].value_counts()

In [None]:
st.duplicated()

In [None]:
sns.countplot(st['gender'],palette=('#003f7f','#ff007f'))
st['gender'].value_counts()

WE count male and female here.

In [None]:
sns.countplot(st['smoking_status'],palette='RdGy')
st['smoking_status'].value_counts()

People who never smoked holds the maximum count

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="darkgrid")
labels = ["Never Smoked Before", "Unknown", "Ex-Smoker", "Currently Smokes"]
values = st['smoking_status'].value_counts().tolist()
plt.pie(x=values, labels=labels, autopct="%1.2f%%", shadow=True)
plt.title("Smoking Status Pie Chart", fontdict={'fontsize': 14})
plt.show()

In [None]:
sns.countplot(st['hypertension'],palette='Reds')
st['hypertension'].value_counts()

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="whitegrid")

labels = ["Not Present", "Present"]
values = st['hypertension'].value_counts().tolist()

plt.pie(x=values, labels=labels, autopct="%1.2f%%", colors=colors[::-1], shadow=True, explode=[0, 0.2])
plt.title("Hypertension Distribution Pie Chart", fontdict={'fontsize': 14})
plt.show()

only 9.75% People having a hypertension problem.

In [None]:
sns.countplot(st['ever_married'],palette='BuPu')
st['ever_married'].value_counts()

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="darkgrid")
random.shuffle(colors)
labels = ["Urban Residence", "Rural Residence"]
values = st['Residence_type'].value_counts().tolist()
plt.pie(x=values, labels=labels, autopct="%1.2f%%", colors=colors, shadow=True)
plt.title("Residence Type Pie Chart", fontdict={'fontsize': 14})
plt.show()

# Presence of Heart Disease

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="whitegrid")
random.shuffle(colors)
labels = ["Heart Disease Absent", "Heart Disease Present"]
values = st['heart_disease'].value_counts().tolist()
plt.pie(x=values, labels=labels, autopct="%1.2f%%", colors=colors, shadow=True, explode=[0, 0.2])
plt.title("Heart Disease Distribution Pie Chart", fontdict={'fontsize': 14})
plt.show()

only 5.40% people having heart Disease Problem.****

In [None]:
sns.countplot(st['work_type'], palette='magma');
st['work_type'].value_counts()

Most of the People working on private sector.

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="darkgrid")
st['stroke'].value_counts().plot.pie(autopct='%1.1f%%', colors = ['Green', 'r'])
plt.title("Stroke status", fontdict={'fontsize': 14})
st["stroke"].value_counts()

Only 4.9% of people had stroke

In [None]:
sns.lineplot(x='age', y='stroke', data=st)

People between the age of 60 and 80 has the high chance of getting stroke

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="darkgrid")
plt.subplot(1,2,2)
sns.countplot(st['gender'], hue= st['stroke'],palette='flare')

We can see here female having high chances of getting Stroke.

In [None]:
plt.figure(figsize=(7,7))
sns.set(style="whitegrid")
plt.subplot(1,2,2)
sns.countplot(st['hypertension'], hue= st['stroke'],palette='rocket')

In [None]:
plt.subplot(1,2,2)
sns.countplot(st['heart_disease'], hue= st['stroke'],palette='Reds')

People with no heart disease has very high chance of not receiving a stroke.

In [None]:
plt.subplot(1,2,2)
sns.countplot(st['ever_married'], hue= st['stroke'],palette='BuPu')

st['ever_married'].value_counts()

Unmarried people has less chance of getting stroke

In [None]:
sns.countplot(x="Residence_type", hue= st['stroke'], data=st,palette='summer')

There is no much difference in people who may receive or not receive depending on their residence type that is Rural or Urban.



In [None]:
sns.countplot(x="smoking_status", hue= st['stroke'], data=st,palette='RdGy')

Being a smoker or a former smoker increases your risk of having a stroke.

In [None]:
st.plot(kind='box', figsize=(10,8))

In [None]:
p=sns.pairplot(st, hue = 'stroke',corner=True)

In [None]:
plt.figure(figsize=(12,10))
p=sns.heatmap(st.corr(), annot=True,cmap ='RdYlGn') 

# Conclusions:

1.Females are more prone to have a stroke.

2.More than 25% of stroke case patients have hypertension.

3.Very few cases of people who have a heart disease have had a stroke.

4.Most of the patients who have a stroke were married.

5.Doing private work increases chances of having a stroke. Those who have never worked barely have experienced a stroke.

6.The type of residence did not impact the chances of having a stroke.

7.Being a smoker or a former smoker increases your risk of having a stroke.