# Heart Failure Prediction Model Training

This notebook demonstrates how to train a machine‑learning model to predict heart failure using a CSV dataset.
We will use **scikit‑learn** and avoid any GPU‑specific libraries.

**Goals**:
- Load and explore the dataset
- Preprocess features (handle missing values, encode categoricals)
- Split data into train/test sets
- Train several models and improve accuracy with simple hyper‑parameter tuning
- Evaluate the final model


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import matplotlib.pyplot as plt
%matplotlib inline


## Load the CSV dataset
Replace `your_dataset.csv` with the actual filename located in the repository.

In [None]:
# Path to the CSV file – adjust if needed
csv_path = 'data/heart_failure.csv'  # example path
df = pd.read_csv(csv_path)
df.head()

## Quick data inspection

In [None]:
print('Shape:', df.shape)
print('Columns:', df.columns.tolist())
print(df.isnull().sum())
df.describe()

## Define target and features
Assuming the target column is named `target` (adjust accordingly).

In [None]:
target_col = 'target'  # change to actual label column name
X = df.drop(columns=[target_col])
y = df[target_col]

## Preprocess: numeric scaling + categorical encoding
Identify numeric and categorical columns automatically.

In [None]:
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_features = X.select_dtypes(include=['object', 'category']).columns.tolist()

numeric_transformer = Pipeline(steps=[
        ('scaler', StandardScaler())
    ])
categorical_transformer = Pipeline(steps=[
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])

preprocess = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ])

## Train‑test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y)

## Baseline model – Random Forest

In [None]:
rf_clf = Pipeline(steps=[
        ('preprocess', preprocess),
        ('classifier', RandomForestClassifier(n_estimators=200, random_state=42, n_jobs=1))
    ])
rf_clf.fit(X_train, y_train)
y_pred = rf_clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

## Simple hyper‑parameter tuning with GridSearchCV (still CPU‑only)

In [None]:
param_grid = {
        'classifier__n_estimators': [100, 200, 300],
        'classifier__max_depth': [None, 10, 20],
        'classifier__min_samples_split': [2, 5, 10]
    }
grid_search = GridSearchCV(rf_clf, param_grid, cv=5, scoring='accuracy', n_jobs=1)
grid_search.fit(X_train, y_train)
print('Best parameters:', grid_search.best_params_)
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
print('Tuned Accuracy:', accuracy_score(y_test, y_pred_best))
print(classification_report(y_test, y_pred_best))

## Optional: Try Gradient Boosting for potentially higher accuracy

In [None]:
gb_clf = Pipeline(steps=[
        ('preprocess', preprocess),
        ('classifier', GradientBoostingClassifier(random_state=42))
    ])
gb_clf.fit(X_train, y_train)
y_pred_gb = gb_clf.predict(X_test)
print('GB Accuracy:', accuracy_score(y_test, y_pred_gb))
print(classification_report(y_test, y_pred_gb))

## Visualize Confusion Matrix (for the best model)

In [None]:
import seaborn as sns
cm = confusion_matrix(y_test, y_pred_best)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix - Tuned Random Forest')
plt.show()