# Heart Failure Prediction using PyCaret

The following notebook uses PyCaret, a low code machine learning libarary, for rapid development of machine learning models. The attempted goal was to develop an accurate model without modifying the original dataset in any way.  

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Read CSV File

In [None]:
#Read the CSV file into a dataframe

data_file_path = '/kaggle/input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv'
hf_Data = pd.read_csv(data_file_path)

hf_Data.info()

### Install PyCaret

In [None]:
!pip install pycaret

### Import PyCaret Classification Library

In [None]:
from pycaret.classification import *

### Generate Pandas Profile report

The report was used to check for null values and to achieve a further understanding of the dataset to be enable a better setup of the PyCaret environment.

In [None]:
from pandas_profiling import ProfileReport

profile = ProfileReport(hf_Data, title="Heart Failure Data")
profile.to_notebook_iframe()

### Split Dataset into Modelling and Unseen Data for Prediction

The dataset was split, 80% for modelling and validation, 20% to be used as Unseen Prediction Data

In [None]:
data = hf_Data.sample(frac=0.8, random_state=42)
evaluationData = hf_Data.drop(data.index)

data.reset_index(drop=True, inplace=True)
evaluationData.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(evaluationData.shape))

### Create PyCaret Environment

From the Pandas profiling report it was decided to use the following options: -
* Normalisation of the dataset.
* Fix Imbalance of the target attribute, two thirds of the data was for survival. Setting this to true would enable the use of SMOTE to correct the imbalance.
* The attributes of 'age', 'time' and 'ejection_fraction', were binned, divide up into equal sized 'bins', and then one-hot encoded into new binary attributes.

In [None]:
envSetup = setup(data = data, target = 'DEATH_EVENT', session_id=42, normalize = True, fix_imbalance = True, silent=True,
                bin_numeric_features = ['age','time', 'ejection_fraction'], 
                numeric_features=['creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine',
                                  'serum_sodium', 'anaemia', 'diabetes', 'high_blood_pressure','sex', 'smoking'])

### Find the top three performing ML algorithm models

In [None]:
#Select Top 3 best performing models
top3 = compare_models(n_select = 3)

#### Blend the top 3 ML algorithm models

In [None]:
# blend top3 models from compare_models

model = blend_models(top3, method='soft')

### Plot modelling evaluation data

In [None]:
evaluate_model(model)

### Prediction of validation data

In [None]:
predict_model(model)

### Prediction of the Unseen Prediction data extracted for the dataset previously

In [None]:
unseen_predictions = predict_model(model, data=evaluationData)
unseen_predictions.head()

### Get accuracy of the model against the Unseen Prediction data

In [None]:
unseen_predictions["Label"] = pd.to_numeric(unseen_predictions["Label"])

In [None]:
from pycaret.utils import check_metric
check_metric(unseen_predictions['DEATH_EVENT'], unseen_predictions['Label'], metric = 'Accuracy')

### Unseen Prediction results to CSV

In [None]:
filename = 'Heart_Failure_Unseen_Data_Prediction.csv'

unseen_predictions.to_csv(filename, index=False)