# **Heart Failure Prediction Using SVM**

## Objective

To **predict the occurrence of a heart failure event** based on factors such as **age**, **anaemia**, **creatinine phosphokinase levels**, **diabetes**, **ejection fraction**, **blood pressure**, **platelets**, **serum levels**, **smoking**, and **time** using a Support Vector Machine (SVM).

## Dataset Overview

| Attribute | Description |
| :--- | :--- |
| **Age** | The age of the patient in years. |
| **Anaemia** | Whether the patient has anaemia: 1 = Yes, 0 = No. |
| **Creatinine_Phosphokinase** | Level of creatine phosphokinase (CPK) in the blood. |
| **Diabetes** | Whether the patient has diabetes: 1 = Yes, 0 = No. |
| **Ejection_Fraction** | The percentage of blood pumped out of the heart during each beat. |
| **High_Blood_Pressure** | Whether the patient has high blood pressure: 1 = Yes, 0 = No. |
| **Platelets** | The number of platelets in the blood. |
| **Serum_Creatinine** | Serum creatinine level in the blood. |
| **Serum_Sodium** | Serum sodium level in the blood. |
| **Sex** | The sex of the patient: 1 = Male, 0 = Female. |
| **Smoking** | Whether the patient is a smoker: 1 = Yes, 0 = No. |
| **Time** | The follow-up period in days. |
| **DEATH_EVENT** | Whether the patient died due to heart failure: 1 = Yes, 0 = No. (Target variable) |

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')   

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/iamnaveen1401/Datasets/refs/heads/main/heart_failure_clinical_records.csv")
df.head()

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   age                       299 non-null    float64
 1   anaemia                   299 non-null    int64  
 2   creatinine_phosphokinase  299 non-null    int64  
 3   diabetes                  299 non-null    int64  
 4   ejection_fraction         299 non-null    int64  
 5   high_blood_pressure       299 non-null    int64  
 6   platelets                 299 non-null    float64
 7   serum_creatinine          299 non-null    float64
 8   serum_sodium              299 non-null    int64  
 9   sex                       299 non-null    int64  
 10  smoking                   299 non-null    int64  
 11  time                      299 non-null    int64  
 12  DEATH_EVENT               299 non-null    int64  
dtypes: float64(3), int64(10)
memory usage: 30.5 KB


In [4]:
df['DEATH_EVENT'].unique()

array([1, 0])

In [5]:
X = df.drop(['DEATH_EVENT'],axis=1)
y = df['DEATH_EVENT']

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

In [7]:
from sklearn.svm import SVC
svm_clf = SVC(kernel='linear') # kernel : {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}
svm_clf.fit(X_train, y_train)
acc_linear_kernal = svm_clf.score(X_test,y_test)
print("acc_linear_kernal :",acc_linear_kernal)

acc_linear_kernal : 0.8666666666666667


In [8]:
# SVM model with rbf Kernal without scalining
svm_clf = SVC() # kernel : 'rbf' is default
svm_clf.fit(X_train, y_train)
acc_rbf_kernal_without_scale = svm_clf.score(X_test,y_test)
print("acc_rbf_kernal_without_scale :",acc_rbf_kernal_without_scale)

acc_rbf_kernal_without_scale : 0.6533333333333333


In [9]:
# Some times SVM sencitive to scalinig
# But try
from sklearn.preprocessing import StandardScaler
scalar = StandardScaler()
X_train_scaled = scalar.fit_transform(X_train)
X_test_scaled = scalar.transform(X_test)

In [10]:
# SVM model with rbf Kernal with scalining
svm_clf = SVC() # kernel : 'rbf' is default
svm_clf.fit(X_train_scaled, y_train)
acc_rbf_kernal_with_scale = svm_clf.score(X_test_scaled,y_test)
print("acc_rbf_kernal_with_scale :",acc_rbf_kernal_with_scale)

acc_rbf_kernal_with_scale : 0.7733333333333333


In [11]:
# SVM model with Sigmoid Kernal
svm_clf = SVC(kernel='sigmoid') # kernel : {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}
svm_clf.fit(X_train_scaled, y_train)
acc_sigmoid_kernal_with_scle = svm_clf.score(X_test_scaled,y_test)
print("acc_sigmoid_kernal_with_scle :",acc_sigmoid_kernal_with_scle)

acc_sigmoid_kernal_with_scle : 0.8133333333333334


In [12]:
print("acc_linear_kernal :",acc_linear_kernal)
print("acc_rbf_kernal_without_scale :",acc_rbf_kernal_without_scale)
print("acc_rbf_kernal_with_scale :",acc_rbf_kernal_with_scale)
print("acc_sigmoid_kernal_with_scle :",acc_sigmoid_kernal_with_scle)

acc_linear_kernal : 0.8666666666666667
acc_rbf_kernal_without_scale : 0.6533333333333333
acc_rbf_kernal_with_scale : 0.7733333333333333
acc_sigmoid_kernal_with_scle : 0.8133333333333334


In [13]:
# max acc - svm with linear kernal 
# So data as linear sperateable
# Scaling give best acc for this dataset