<a href="https://colab.research.google.com/github/nawrintabassumorpita/project/blob/main/best_of_1908043.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Stroke Prediction Dataset
Context: According to the World Health Organization (WHO) stroke is the 2nd leading cause of death globally, responsible for approximately 11% of total deaths. This dataset is used to predict whether a patient is likely to get stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relavant information about the patient.

Attribute Information:

id: unique identifier

gender: "Male", "Female" or "Other"

age: age of the patient

hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension

heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease

ever_married: "No" or "Yes"

work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"

Residence_type: "Rural" or "Urban"

avg_glucose_level: average glucose level in blood

bmi: body mass index

smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"*

stroke: 1 if the patient had a stroke or 0 if not *Note: "Unknown" in smoking_status means that the information is unavailable for this patient


In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Dataset

In [None]:
# Download the data
!wget -O stroke-data.csv https://www.dropbox.com/s/zgburk3yces5tee/healthcare-dataset-stroke-data.csv?dl=0

--2024-09-24 05:50:18--  https://www.dropbox.com/s/zgburk3yces5tee/healthcare-dataset-stroke-data.csv?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.2.18, 2620:100:6017:18::a27d:212
Connecting to www.dropbox.com (www.dropbox.com)|162.125.2.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/dxgo5oqy5fd1pho88bkn4/healthcare-dataset-stroke-data.csv?rlkey=wozpcthdtj4xjquskkkk451t2&dl=0 [following]
--2024-09-24 05:50:18--  https://www.dropbox.com/scl/fi/dxgo5oqy5fd1pho88bkn4/healthcare-dataset-stroke-data.csv?rlkey=wozpcthdtj4xjquskkkk451t2&dl=0
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucccf3ee8a62c79be97b88d8f73e.dl.dropboxusercontent.com/cd/0/inline/CbJ5PLOtmYr6BSTxio1-3xYJsNQfwm64M275ru4JCg6mZyRJqPskkkMYr9-MbLXZ-g3qG335M-Sjitxi7McAf4w_-mMO1OYtnXb_AhickUCNHOXa6uGOQBGnX1E-Hg6zCQNaeBjic_DHyCdXLbpxo0n-/file# [following]
--2024-09-24 05:50:1

In [None]:
"""importing the dataset """

dataset = pd.read_csv('stroke-data.csv')
dataset

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,work_type,Residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,Male,67.0,0,1,Yes,Private,Urban,228.69,36.6,formerly smoked,1
1,51676,Female,61.0,0,0,Yes,Self-employed,Rural,202.21,,never smoked,1
2,31112,Male,80.0,0,1,Yes,Private,Rural,105.92,32.5,never smoked,1
3,60182,Female,49.0,0,0,Yes,Private,Urban,171.23,34.4,smokes,1
4,1665,Female,79.0,1,0,Yes,Self-employed,Rural,174.12,24.0,never smoked,1
...,...,...,...,...,...,...,...,...,...,...,...,...
5105,18234,Female,80.0,1,0,Yes,Private,Urban,83.75,,never smoked,0
5106,44873,Female,81.0,0,0,Yes,Self-employed,Urban,125.20,40.0,never smoked,0
5107,19723,Female,35.0,0,0,Yes,Self-employed,Rural,82.99,30.6,never smoked,0
5108,37544,Male,51.0,0,0,Yes,Private,Rural,166.29,25.6,formerly smoked,0


In [None]:
features = dataset[['age',	'hypertension',	'heart_disease',	'bmi','Residence_type','avg_glucose_level',	'smoking_status']]  # for independent
target = dataset[['stroke']]   # for dependent

In [None]:
features

Unnamed: 0,age,hypertension,heart_disease,bmi,Residence_type,avg_glucose_level,smoking_status
0,67.0,0,1,36.6,Urban,228.69,formerly smoked
1,61.0,0,0,,Rural,202.21,never smoked
2,80.0,0,1,32.5,Rural,105.92,never smoked
3,49.0,0,0,34.4,Urban,171.23,smokes
4,79.0,1,0,24.0,Rural,174.12,never smoked
...,...,...,...,...,...,...,...
5105,80.0,1,0,,Urban,83.75,never smoked
5106,81.0,0,0,40.0,Urban,125.20,never smoked
5107,35.0,0,0,30.6,Rural,82.99,never smoked
5108,51.0,0,0,25.6,Rural,166.29,formerly smoked


In [None]:
target

Unnamed: 0,stroke
0,1
1,1
2,1
3,1
4,1
...,...
5105,0
5106,0
5107,0
5108,0


## Taking care of missing values

In [None]:
from sklearn.impute import SimpleImputer

In [None]:
imputer = SimpleImputer(missing_values=np.nan,strategy = "mean") # imputer is an object of Imputer class
imputer = imputer.fit(features[['age',	'hypertension',	'heart_disease','bmi','avg_glucose_level']])

In [None]:
features[['age',	'hypertension',	'heart_disease','bmi','avg_glucose_level']]= imputer.transform(features[['age',	'hypertension',	'heart_disease','bmi','avg_glucose_level']])

In [None]:
features

Unnamed: 0,age,hypertension,heart_disease,bmi,Residence_type,avg_glucose_level,smoking_status
0,67.0,0.0,1.0,36.600000,Urban,228.69,formerly smoked
1,61.0,0.0,0.0,28.893237,Rural,202.21,never smoked
2,80.0,0.0,1.0,32.500000,Rural,105.92,never smoked
3,49.0,0.0,0.0,34.400000,Urban,171.23,smokes
4,79.0,1.0,0.0,24.000000,Rural,174.12,never smoked
...,...,...,...,...,...,...,...
5105,80.0,1.0,0.0,28.893237,Urban,83.75,never smoked
5106,81.0,0.0,0.0,40.000000,Urban,125.20,never smoked
5107,35.0,0.0,0.0,30.600000,Rural,82.99,never smoked
5108,51.0,0.0,0.0,25.600000,Rural,166.29,formerly smoked


In [None]:
imputer = SimpleImputer(missing_values=np.nan,strategy = "mean") # imputer is an object of Imputer class
imputer = imputer.fit(target[['stroke']])

In [None]:
target[['stroke']]= imputer.transform(target[['stroke']])

In [None]:
target

Unnamed: 0,stroke
0,1.0
1,1.0
2,1.0
3,1.0
4,1.0
...,...
5105,0.0
5106,0.0
5107,0.0
5108,0.0


## Encoding

In [None]:
from sklearn.preprocessing import LabelEncoder,OneHotEncoder

In [None]:
labelencoder_f = LabelEncoder()
#the country column is represented by numeric value
features['Residence_type'] = labelencoder_f.fit_transform(features['Residence_type'])

In [None]:
encoder=OneHotEncoder(sparse_output=False)
encoded_labels = pd.DataFrame (encoder.fit_transform(features[['smoking_status']]))

In [None]:
encoded_labels.columns = encoder.get_feature_names_out(['smoking_status'])
dataset= pd.concat([features, encoded_labels ], axis=1)

In [None]:
dataset

Unnamed: 0,age,hypertension,heart_disease,bmi,Residence_type,avg_glucose_level,smoking_status,smoking_status_Unknown,smoking_status_formerly smoked,smoking_status_never smoked,smoking_status_smokes
0,67.0,0.0,1.0,36.600000,1,228.69,formerly smoked,0.0,1.0,0.0,0.0
1,61.0,0.0,0.0,28.893237,0,202.21,never smoked,0.0,0.0,1.0,0.0
2,80.0,0.0,1.0,32.500000,0,105.92,never smoked,0.0,0.0,1.0,0.0
3,49.0,0.0,0.0,34.400000,1,171.23,smokes,0.0,0.0,0.0,1.0
4,79.0,1.0,0.0,24.000000,0,174.12,never smoked,0.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...
5105,80.0,1.0,0.0,28.893237,1,83.75,never smoked,0.0,0.0,1.0,0.0
5106,81.0,0.0,0.0,40.000000,1,125.20,never smoked,0.0,0.0,1.0,0.0
5107,35.0,0.0,0.0,30.600000,0,82.99,never smoked,0.0,0.0,1.0,0.0
5108,51.0,0.0,0.0,25.600000,0,166.29,formerly smoked,0.0,1.0,0.0,0.0


In [None]:
dataset.columns

Index(['age', 'hypertension', 'heart_disease', 'bmi', 'Residence_type',
       'avg_glucose_level', 'smoking_status', 'smoking_status_Unknown',
       'smoking_status_formerly smoked', 'smoking_status_never smoked',
       'smoking_status_smokes'],
      dtype='object')

In [None]:
new_features = dataset[['age', 'hypertension', 'heart_disease', 'bmi',
       'avg_glucose_level','Residence_type',
       'smoking_status_Unknown',
       'smoking_status_formerly smoked', 'smoking_status_never smoked',
       'smoking_status_smokes']]

## Splitting Dataset

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
"""Spliting the Dataset into Training Set and Test Set """

X_train,X_test,y_train,y_test=train_test_split(new_features,target,test_size=0.3,random_state=0)

In [None]:
print(X_train.shape)
print(X_test.shape)

(3577, 10)
(1533, 10)


In [None]:
X_test

Unnamed: 0,age,hypertension,heart_disease,bmi,avg_glucose_level,Residence_type,smoking_status_Unknown,smoking_status_formerly smoked,smoking_status_never smoked,smoking_status_smokes
42,82.0,0.0,1.0,26.4,144.90,1,0.0,0.0,0.0,1.0
380,4.0,0.0,0.0,16.7,106.22,0,1.0,0.0,0.0,0.0
3524,58.0,0.0,0.0,25.9,79.95,1,0.0,0.0,1.0,0.0
352,20.0,0.0,0.0,34.1,96.57,0,0.0,0.0,1.0,0.0
4927,10.0,0.0,0.0,13.7,69.84,0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
929,75.0,0.0,1.0,30.5,207.64,1,0.0,1.0,0.0,0.0
2629,45.0,0.0,0.0,45.3,172.33,1,0.0,1.0,0.0,0.0
663,38.0,0.0,0.0,24.3,98.73,0,0.0,0.0,1.0,0.0
2303,75.0,0.0,0.0,33.4,219.39,0,0.0,0.0,0.0,1.0


In [None]:
# Importing necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Initialize the Neural Network
model = Sequential()

# Adding the input layer and first hidden layer
# Assume input shape is equal to the number of features in the dataset
model.add(Dense(units=32, activation='relu', input_shape=(X_train.shape[1],)))

# Adding the second hidden layer
model.add(Dense(units=16, activation='relu'))

# Adding the output layer (binary classification, so units=1 with sigmoid activation)
model.add(Dense(units=1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Predict the test set results
y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype("int32")  # Convert probabilities to binary output (0 or 1)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Print the results
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Accuracy: 94.59%
Confusion Matrix:
[[1446   11]
 [  72    4]]
Classification Report:
              precision    recall  f1-score   support

         0.0       0.95      0.99      0.97      1457
         1.0       0.27      0.05      0.09        76

    accuracy                           0.95      1533
   macro avg       0.61      0.52      0.53      1533
weighted avg       0.92      0.95      0.93      153

## Different types of Regression Algorithm

1. Logistic Regression
2. K Nearest Neighbor (KNN)
3. Decision Tree Classifier
4. Random Forest Classifier
5. Naive Bayes
6. Support Vector Machine (SVM)




## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

# Fitting Logistic Regression to the training dataset
lr = LogisticRegression()

lr.fit(X_train,y_train)

In [None]:
# prediction
y_pred = lr.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score,f1_score,precision_score,recall_score

In [None]:
# Making confusing matrix (actual,prediction)
confusion_matrix(y_test,y_pred)

array([[1457,    0],
       [  76,    0]])

In [None]:
# Making confusing matrix
# it is used to check the accuracy of the classification
print(classification_report(y_test,y_pred,target_names = ['No','Yes']))

              precision    recall  f1-score   support

          No       0.95      1.00      0.97      1457
         Yes       0.00      0.00      0.00        76

    accuracy                           0.95      1533
   macro avg       0.48      0.50      0.49      1533
weighted avg       0.90      0.95      0.93      1533



## KNN Classifiers

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Classifier Model
classifier = KNeighborsClassifier(n_neighbors=3, metric = 'minkowski')
classifier.fit(X_train,y_train)
# Prediction
y_pred = classifier.predict(X_test)

In [None]:
# Making confusing matrix
print(confusion_matrix(y_test,y_pred))


[[1431   26]
 [  71    5]]


In [None]:
# Making confusing matrix
# it is used to check the accuracy of the classification
print(classification_report(y_test,y_pred,target_names = ['No','Yes']))

              precision    recall  f1-score   support

          No       0.95      0.98      0.97      1457
         Yes       0.16      0.07      0.09        76

    accuracy                           0.94      1533
   macro avg       0.56      0.52      0.53      1533
weighted avg       0.91      0.94      0.92      1533



## Support Vector Machine

In [None]:
from sklearn.svm import SVC

# Classifier Model
classifier = SVC(kernel = 'linear', random_state = 42)
classifier.fit(X_train,y_train)
# Prediction
y_pred = classifier.predict(X_test)

In [None]:
# Making confusing matrix
print(confusion_matrix(y_test,y_pred))


[[1457    0]
 [  76    0]]


In [None]:
# Making confusing matrix
# it is used to check the accuracy of the classification
print(classification_report(y_test,y_pred,target_names = ['No','Yes']))

              precision    recall  f1-score   support

          No       0.95      1.00      0.97      1457
         Yes       0.00      0.00      0.00        76

    accuracy                           0.95      1533
   macro avg       0.48      0.50      0.49      1533
weighted avg       0.90      0.95      0.93      1533



## Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
# Classifier Model
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train,y_train)


In [None]:
# Prediction
y_pred = classifier.predict(X_test)

In [None]:
# Making confusing matrix
print(confusion_matrix(y_test,y_pred))


[[1395   62]
 [  64   12]]


In [None]:
# Making confusing matrix
# it is used to check the accuracy of the classification
print(classification_report(y_test,y_pred,target_names = ['No','Yes']))

              precision    recall  f1-score   support

          No       0.96      0.96      0.96      1457
         Yes       0.16      0.16      0.16        76

    accuracy                           0.92      1533
   macro avg       0.56      0.56      0.56      1533
weighted avg       0.92      0.92      0.92      1533



In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier



# Define the base classifier (a more complex decision tree)
base_classifier = DecisionTreeClassifier(criterion='entropy', max_depth=3, random_state=0)

# Initialize the AdaBoost classifier with the base classifier
ada_classifier = AdaBoostClassifier(
    base_estimator=base_classifier,  # Use the base classifier defined above
    n_estimators=50,  # Number of weak learners (decision trees)
    random_state=42
)

# Fit the model on the resampled training data
ada_classifier.fit(X_train, y_train)

# Make predictions
y_pred = ada_classifier.predict(X_test)

# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No', 'Yes']))

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")


Confusion Matrix:
[[171   3]
 [  6  20]]

Classification Report:
              precision    recall  f1-score   support

          No       0.97      0.98      0.97       174
         Yes       0.87      0.77      0.82        26

    accuracy                           0.95       200
   macro avg       0.92      0.88      0.90       200
weighted avg       0.95      0.95      0.95       200

Accuracy: 95.50%


In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split


# Define the base classifier
base_classifier = SVC(kernel='linear', random_state=42)

# Initialize the AdaBoost classifier with the base classifier using SAMME
ada_classifier = AdaBoostClassifier(
    base_estimator=base_classifier,
    n_estimators=50,
    algorithm='SAMME',  # Use SAMME algorithm
    random_state=42
)

# Fit the model
ada_classifier.fit(X_train, y_train)

# Make predictions
y_pred = ada_classifier.predict(X_test)

# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No', 'Yes']))

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")


Confusion Matrix:
[[161  13]
 [  6  20]]

Classification Report:
              precision    recall  f1-score   support

          No       0.96      0.93      0.94       174
         Yes       0.61      0.77      0.68        26

    accuracy                           0.91       200
   macro avg       0.79      0.85      0.81       200
weighted avg       0.92      0.91      0.91       200

Accuracy: 90.50%


In [None]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split


# Define the base classifier
base_classifier = RandomForestClassifier(n_estimators=32, criterion ='entropy', random_state = 40)

# Initialize the AdaBoost classifier with the base classifier using SAMME
ada_classifier = AdaBoostClassifier(
    base_estimator=base_classifier,
    n_estimators=50,
    algorithm='SAMME.R',  # Use SAMME algorithm
    random_state=42
)

# Fit the model
ada_classifier.fit(X_train, y_train)

# Make predictions
y_pred = ada_classifier.predict(X_test)

# Evaluate the model
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No', 'Yes']))

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")


Confusion Matrix:
[[172   2]
 [  6  20]]

Classification Report:
              precision    recall  f1-score   support

          No       0.97      0.99      0.98       174
         Yes       0.91      0.77      0.83        26

    accuracy                           0.96       200
   macro avg       0.94      0.88      0.91       200
weighted avg       0.96      0.96      0.96       200

Accuracy: 96.00%


## Random Forest Classifier

In [None]:
from sklearn.ensemble import RandomForestClassifier
# Classifier Model
classifier = RandomForestClassifier(n_estimators=32, criterion ='entropy', random_state = 40)
classifier.fit(X_train,y_train)


In [None]:
from sklearn.ensemble import RandomForestClassifier
# Classifier Model
classifier = RandomForestClassifier(n_estimators=32, criterion ='entropy', random_state = 40)
classifier.fit(X_train,y_train)


In [None]:
# Prediction
y_pred = classifier.predict(X_test)

In [None]:
# Making confusing matrix
print(confusion_matrix(y_test,y_pred))


[[1456    1]
 [  74    2]]


In [None]:
# Making confusing matrix
# it is used to check the accuracy of the classification
print(classification_report(y_test,y_pred,target_names = ['No','Yes']))

              precision    recall  f1-score   support

          No       0.95      1.00      0.97      1457
         Yes       0.67      0.03      0.05        76

    accuracy                           0.95      1533
   macro avg       0.81      0.51      0.51      1533
weighted avg       0.94      0.95      0.93      1533



## Result Analysis

In this project, different classification algorithm was used to classify or predict stroke from dataset. The evaluation matrices to evaluate classification algorithms are Precision, Recall, F1 Score and Accuracy. Here for comparision purposes, the weighted average of models with accuracy will be considered.

### Logistic Regression
1. Weighted Average Precision  : 90%
2. Weighted Average Recall     : 95%
3. Weighted Average F1-Score   : 93%
4. Accuracy                    : 95%

### KNN Classifier
1. Weighted Average Precision  : 91%
2. Weighted Average Recall     : 94%
3. Weighted Average F1-Score   : 92%
4. Accuracy                    : 94%
### Support Vector Machine
1. Weighted Average Precision  : 90%
2. Weighted Average Recall     : 95%
3. Weighted Average F1-Score   : 93%
4. Accuracy                    : 95%

### Decision Tree Classification
1. Weighted Average Precision  : 92%
2. Weighted Average Recall     : 92%
3. Weighted Average F1-Score   : 92%
4. Accuracy                    : 92%

### Random Forest Classification
1. Weighted Average Precision  : 94%
2. Weighted Average Recall     : 95%
3. Weighted Average F1-Score   : 93%
4. Accuracy                    : 95%

### AdaBoost Classification with base decision tree
1. Weighted Average Precision  : 95%
2. Weighted Average Recall     : 95%
3. Weighted Average F1-Score   : 95%
4. Accuracy                    : 95.50%
### AdaBoost Classification with SVC
1. Weighted Average Precision  : 92%
2. Weighted Average Recall     : 91%
3. Weighted Average F1-Score   : 91%
4. Accuracy                    : 90.50%
### AdaBoost Classification with RandomForest
1. Weighted Average Precision  : 96%
2. Weighted Average Recall     : 96%
3. Weighted Average F1-Score   : 96%
4. Accuracy                    : 96%

## Discussion
In this project, several classifiers are used to predict Stroke from **"Stroke Dataset"**. The classifiers used in this project are **Logistic Regression, K-Nearest Neighbor, Support Vector Machine, Decision Tree , Random Forest Classifier,neuralnetwork,Adaboost classifier with base decisiontree and with base svc**. First the libraries for procedure were imported and then the dataset was stored from dropbox. Then some data pre-processing were done which includes **encoding (label encoding, one hot encoding), feature scaling, taking care of missing files.** Then the dataset was splitted and trained into the classifiers. Then some evaluation matrices were used to evaluate the the classifiers and the comparision based on evaluation was shown in Result Analysis. From the comparision, it is seen that the AdaBoost Classifier with base classifier Random forest  has the highest evalution values.