## **Stroke Prediction**

### Context

According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.

### **Downloading the Dataset**

In [None]:
dataset_url = 'https://www.kaggle.com/fedesoriano/stroke-prediction-dataset'

In [None]:
!pip install opendatasets
import opendatasets as od
od.download(dataset_url)


### **Read Dataset by using Pandas**

In [None]:
import pandas as pd
import numpy as np
data = pd.read_csv('./stroke-prediction-dataset/healthcare-dataset-stroke-data.csv')
data.head(10)

### **Attribute Information**

- id: unique identifier
- gender: "Male", "Female" or "Other"
- age: age of the patient
- hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension
- heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
- ever_married: "No" or "Yes"
- work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
- Residence_type: "Rural" or "Urban"
- avg_glucose_level: average glucose level in blood
- bmi: body mass index
- smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*
- stroke: 1 if the patient had a stroke or 0 if not

*Note: "Unknown" in smoking_status means that the information is unavailable for this patient*

## **Data Preparation and Cleaning**

In [None]:
data.shape

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.isnull().sum() # to check Null Value

In [None]:
data['bmi'].fillna((data['bmi'].mean()),inplace=True)

In [None]:
data.isnull().sum() # to check Null Value

Now we find the correlation using .corr()

In [None]:
data.corr()

for visualization we use Heatmap(recommended)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
## that is insane and beautiful

corr=data.corr()
plt.figure(figsize=(15,10))
sns.heatmap(corr,annot=True,annot_kws={"size":15})
plt.title('heatgraph of correlation')

## **Analysis and Visualization**

here, We find out meaningful graph and make a Inference

For visualization we import matplotlib and seaborn

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [None]:
plt.figure(dpi=100)
sns.countplot(data['stroke'])
plt.xlabel('had a stroke or not')
plt.ylabel('number of people')
plt.title('stroke V/s number of people')
plt.show()

In [None]:
sns.catplot(y="work_type", hue="stroke", kind="count",
            palette="rainbow", edgecolor=".6",
            data=data)

In [None]:
sns.catplot(y="Residence_type", hue="stroke", kind="count",
            palette="rainbow", edgecolor=".6",
            data=data)

In [None]:
sns.catplot(y="ever_married", hue="stroke", kind="count",
            palette="rainbow", edgecolor=".6",
            data=data)

In [None]:
data.plot.hexbin(x='bmi', y='avg_glucose_level', gridsize=15,colormap='rainbow')

In [None]:
data.plot.hexbin(x='age', y='avg_glucose_level', gridsize=15,colormap='YlOrRd')

In [None]:
plt.figure(dpi=100)
fig = px.scatter_3d(data, x='avg_glucose_level', y='age', z='stroke')
fig.show()

In [None]:
plt.figure(dpi=100)
fig = px.parallel_categories(data[['gender', 'age', 'hypertension', 'heart_disease', 'ever_married',
       'work_type', 'Residence_type',
       'smoking_status', 'stroke']], color='stroke', color_continuous_scale=px.colors.sequential.Inferno)
fig.show()

## **Prediction Part**
### **Native Bayes**

In [None]:
# number of people those had stroke
Y_stroke = data["stroke"][data["stroke"] == 1].count()
# number of people those had not stroke
N_stroke = data["stroke"][data["stroke"] == 0].count()
# total number of people
total_people = data["stroke"].count()

print("Number of people those had stroke : {}   Number of people those had not stroke : {}   Total number of people : {}".format(Y_stroke,N_stroke,total_people))

In [None]:
p_yes=Y_stroke/total_people
p_no=N_stroke/total_people
print ("Probability of had stroke =",p_yes," and had not stroke =",p_no)

In [None]:
data_means = data.groupby("stroke").mean()
# view the values
data_means

In [None]:
data_variance = data.groupby("stroke").var()
# view the values
data_variance

In [None]:
data_variance[["age","hypertension","heart_disease","avg_glucose_level","bmi"]]

In [None]:
data_variance[["age","hypertension","heart_disease","avg_glucose_level","bmi"]][data_variance.index == 0]

In [None]:
data_variance[["age","hypertension","heart_disease","avg_glucose_level","bmi"]][data_variance.index == 0].values[0]

In [None]:
people_had_not_stroke_age_mean = data_means["age"][data_means.index == 0].values[0]
people_had_not_stroke_hypertension_mean = data_means["hypertension"][data_means.index == 0].values[0]
people_had_not_stroke_heart_disease_mean = data_means["heart_disease"][data_means.index == 0].values[0]
people_had_not_stroke_avg_glucose_level_mean = data_means["avg_glucose_level"][data_means.index == 0].values[0]
people_had_not_stroke_bmi_mean = data_means["bmi"][data_means.index == 0].values[0]
print(people_had_not_stroke_age_mean,people_had_not_stroke_hypertension_mean,people_had_not_stroke_heart_disease_mean,
      people_had_not_stroke_avg_glucose_level_mean,people_had_not_stroke_bmi_mean)

In [None]:
people_had_stroke_age_mean = data_means["age"][data_means.index == 1].values[0]
people_had_stroke_hypertension_mean = data_means["hypertension"][data_means.index == 1].values[0]
people_had_stroke_heart_disease_mean = data_means["heart_disease"][data_means.index == 1].values[0]
people_had_stroke_avg_glucose_level_mean = data_means["avg_glucose_level"][data_means.index == 1].values[0]
people_had_stroke_bmi_mean = data_means["bmi"][data_means.index == 1].values[0]
print(people_had_stroke_age_mean,people_had_stroke_hypertension_mean,people_had_stroke_heart_disease_mean,
      people_had_stroke_avg_glucose_level_mean,people_had_stroke_bmi_mean)

In [None]:
people_had_not_stroke_age_var = data_variance["age"][data_variance.index == 0].values[0]
people_had_not_stroke_hypertension_var = data_variance["hypertension"][data_variance.index == 0].values[0]
people_had_not_stroke_heart_disease_var = data_variance["heart_disease"][data_variance.index == 0].values[0]
people_had_not_stroke_avg_glucose_level_var = data_variance["avg_glucose_level"][data_variance.index == 0].values[0]
people_had_not_stroke_bmi_var = data_variance["bmi"][data_variance.index == 0].values[0]
print(people_had_not_stroke_age_var,people_had_not_stroke_hypertension_var,people_had_not_stroke_heart_disease_var,
      people_had_not_stroke_avg_glucose_level_var,people_had_not_stroke_bmi_var)

In [None]:
people_had_stroke_age_var = data_variance["age"][data_variance.index == 1].values[0]
people_had_stroke_hypertension_var = data_variance["hypertension"][data_variance.index == 1].values[0]
people_had_stroke_heart_disease_var = data_variance["heart_disease"][data_variance.index == 1].values[0]
people_had_stroke_avg_glucose_level_var = data_variance["avg_glucose_level"][data_variance.index == 1].values[0]
people_had_stroke_bmi_var = data_variance["bmi"][data_variance.index == 1].values[0]
print(people_had_stroke_age_var,people_had_stroke_hypertension_var,people_had_stroke_heart_disease_var,
      people_had_stroke_avg_glucose_level_var,people_had_stroke_bmi_var)

In [None]:
# create a function that calculates p(x | y):
def p_x_given_y(x,mean_y,variance_y):
    # input the argiments into a probability density function
    p = 1 / (np.sqrt(2 * np.pi * variance_y)) * \
                    np.exp((- (x - mean_y) ** 2) / (2 * variance_y))
    return p

In [None]:
pr_stroke=pd.DataFrame()
pr_stroke["age"]=[int(input('Enter age : '))]
pr_stroke["hypertension"]=[int(input('Enter hypertension (0 or 1): '))]
pr_stroke["heart_disease"]=[int(input('Enter heart_disease (0 or 1) : '))]
pr_stroke["avg_glucose_level"]=[float(input('Enter avg_glucose_level : '))]
pr_stroke["bmi"]=[float(input('Enter bmi : '))]
pr_stroke

In [None]:
posterior_numerator_had_not_stroke = N_stroke * \
p_x_given_y(pr_stroke["age"],people_had_not_stroke_age_mean,people_had_not_stroke_age_var) * \
p_x_given_y(pr_stroke["hypertension"],people_had_not_stroke_hypertension_mean,people_had_not_stroke_hypertension_var) * \
p_x_given_y(pr_stroke["heart_disease"],people_had_not_stroke_heart_disease_mean,people_had_not_stroke_heart_disease_var) * \
p_x_given_y(pr_stroke["avg_glucose_level"],people_had_not_stroke_avg_glucose_level_mean,people_had_not_stroke_avg_glucose_level_var) * \
p_x_given_y(pr_stroke["bmi"],people_had_not_stroke_bmi_mean,people_had_not_stroke_bmi_var)
posterior_numerator_had_stroke = Y_stroke * \
p_x_given_y(pr_stroke["age"],people_had_stroke_age_mean,people_had_stroke_age_var) * \
p_x_given_y(pr_stroke["hypertension"],people_had_stroke_hypertension_mean,people_had_stroke_hypertension_var) * \
p_x_given_y(pr_stroke["heart_disease"],people_had_stroke_heart_disease_mean,people_had_stroke_heart_disease_var) * \
p_x_given_y(pr_stroke["avg_glucose_level"],people_had_stroke_avg_glucose_level_mean,people_had_stroke_avg_glucose_level_var) * \
p_x_given_y(pr_stroke["bmi"],people_had_stroke_bmi_mean,people_had_stroke_bmi_var)

In [None]:
print ("Numerator of Posterior for 'had stroke' is\n",posterior_numerator_had_stroke.values[0])
print ("Numerator of Posterior for 'had not stroke' is\n",posterior_numerator_had_not_stroke.values[0])
if (posterior_numerator_had_stroke.values[0] >= posterior_numerator_had_stroke.values[0]):
    print ("Prediction of stroke is +ve...")
else:
    print ("Prediction of stroke is -ve...")