## **Heart Failure Prediction using Logistic Regression**

### __Project Overview__
### This project aims to build a machine learning model to predict the mortality of patients with heart failure based on their clinical records. The dataset used is the *Heart Failure Clinical Records* dataset, which contains 13 clinical features such as age, blood pressure, and ejection fraction.

   ### The primary goal is to develop a reliable classification model that can serve as an early warning tool for medical professionals.

### 1. Import Libraries

In [14]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

### 2. Load the Dataset

In [15]:
df = pd.read_csv('Data/heart_failure_clinical_records_dataset.csv')

print(df.head())

    age  anaemia  creatinine_phosphokinase  diabetes  ejection_fraction  \
0  75.0        0                       582         0                 20   
1  55.0        0                      7861         0                 38   
2  65.0        0                       146         0                 20   
3  50.0        1                       111         0                 20   
4  65.0        1                       160         1                 20   

   high_blood_pressure  platelets  serum_creatinine  serum_sodium  sex  \
0                    1  265000.00               1.9           130    1   
1                    0  263358.03               1.1           136    1   
2                    0  162000.00               1.3           129    1   
3                    0  210000.00               1.9           137    1   
4                    0  327000.00               2.7           116    0   

   smoking  time  DEATH_EVENT  
0        0     4            1  
1        0     6            1  
2       

### 3. Data Exploration

In [16]:
df.shape

(299, 13)

In [17]:
df.value_counts('DEATH_EVENT')

DEATH_EVENT
0    203
1     96
Name: count, dtype: int64

### 4. Separate Features and Target

In [18]:
X = df.drop('DEATH_EVENT', axis=1)

y = df['DEATH_EVENT']

### 5. Split Data into Training and Testing Sets

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

### 6. Feature Scaling

In [20]:
n_cols = ['age', 'creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine', 'serum_sodium', 'time']

scaler = StandardScaler()
X_train[n_cols] = scaler.fit_transform(X_train[n_cols])

X_test[n_cols] = scaler.transform(X_test[n_cols])

### 7. Train the Logistic Regression Model

In [21]:
model = LogisticRegression(random_state=42)

model.fit(X_train, y_train)

### 8. Accuracy Check

In [22]:
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}\n")

Accuracy: 0.82



### 9. Inference on New Data

In [23]:
input_data = (65, 0, 146, 0, 20, 0, 162000, 1.3, 129, 1, 1, 7)
input_df = pd.DataFrame([input_data], columns=X_train.columns)
input_df[n_cols] = scaler.transform(input_df[n_cols])
prediction = model.predict(input_df)

if prediction[0] == 0:
    print("Prediction: The patient is likely to survive")
else:
    print("Prediction: A death event is likely")

Prediction: A death event is likely


### 10. Save the Model

In [24]:
import pickle

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)