Random Forest Classifier for Heart Disease Prediction
In this notebook, I am using Random Forest Classifier to predict the presence of heart disease in patients based on their medical information. We will use the popular "heart" dataset, which contains information on the medical history of 303 patients.

Step 1: Import Required Libraries
The first step is to import the necessary libraries, including pandas, numpy, and scikit-learn. We will use pandas to load the dataset, numpy for numerical operations, and scikit-learn for building and evaluating the model.

In [60]:

import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score, confusion_matrix, roc_auc_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_selection import SelectFromModel


Step 2: Load the Dataset
Next, we will load the "heart" dataset using pandas.

In [61]:

data = pd.read_csv("C:\\Users\\sjkom\\Desktop\\AI Project\\heart.csv")


Dropping the missing values

Step 3: Split the Dataset
We will split the dataset into training and testing sets. We will use 80% of the data for training and 20% for testing. We will also set a random seed for reproducibility.

In [62]:

X = data.drop('target', axis=1)
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Step 4: Create a Random Forest Classifier Object
I will create a Random Forest Classifier object using scikit-learn's RandomForestClassifier function. I will set a random seed for reproducibility.

In [63]:

rfc = RandomForestClassifier(random_state=42)


Step 5: Define Hyperparameters to Tune
I will define a dictionary of hyperparameters to tune using scikit-learn's GridSearchCV function. I will tune the number of trees in the forest (n_estimators), the maximum depth of the trees (max_depth), and the maximum number of features to consider when splitting a node (max_features).

In [64]:

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, 15],
    'max_features': ['sqrt', 'log2']
}


Step 6: Tune Hyperparameters using Grid Search
We will use scikit-learn's GridSearchCV function to tune the hyperparameters. This function performs a grid search over the hyperparameter space defined in param_grid. We will use 5-fold cross-validation to evaluate the performance of the model for each combination of hyperparameters.

In [65]:

grid_search = GridSearchCV(estimator=rfc, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)


print("Best Hyperparameters:", grid_search.best_params_)


Best Hyperparameters: {'max_depth': 10, 'max_features': 'sqrt', 'n_estimators': 200}


Step 7: Train the Model with Best Hyperparameters
Using the best hyperparameters obtained from the grid search, we will create a new Random Forest Classifier object and train it on the training set.

In [66]:
rfc_best = RandomForestClassifier(**grid_search.best_params_, random_state=42)
rfc_best.fit(X_train, y_train)

feature selection to select the most important features

In [67]:
sfm = SelectFromModel(rfc_best, threshold='median')
sfm.fit(X_train, y_train)


Transform the datasets

In [68]:
X_train_transformed = sfm.transform(X_train)
X_test_transformed = sfm.transform(X_test)

Train the model on the transformed datasets

In [69]:
rfc_best.fit(X_train_transformed, y_train)

Make predictions on the testing set

In [70]:
y_pred = rfc_best.predict(X_test_transformed)

Calculate evaluation metrics

In [71]:
accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
specificity = tn / (tn + fp)
sensitivity = recall_score(y_test, y_pred)

Print evaluation metrics

In [72]:
print("Accuracy:", accuracy)
print("Sensitivity:", sensitivity)
print("Precision:", precision)
print("F1 Score:", f1)
print("Specificity:", specificity)
print("AUC:", roc_auc_score(y_test, y_pred))


Accuracy: 0.9853658536585366
Sensitivity: 0.970873786407767
Precision: 1.0
F1 Score: 0.9852216748768473
Specificity: 1.0
AUC: 0.9854368932038835
