Cardinova: Smart Heart Disease Risk Predictor

**Author:** Shilpa Roy  
**Date:** 2025-06-19

1. Introduction

Cardinova is a machine learning-powered application designed to predict the risk of heart disease based on patient data. It provides healthcare professionals and individuals with a tool for early risk detection, using an accessible web interface and interpretable machine learning models.

2. Dataset Description

- **Source:** Proprietary dataset provided by the company
- **Features:**  
    Age, Sex, Chest pain type, Resting blood pressure, Cholesterol, Fasting blood sugar, Resting ECG, Max heart rate, Exercise-induced angina, Oldpeak, Slope of peak exercise ST segment
- **Target:** Presence of heart disease (0 = No, 1 = Yes)

3. Data Loading and Exportation

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv('dataset.csv')

# Show total rows and columns
print(f"\nTotal Rows: {df.shape[0]}, Total Columns: {df.shape[1]}")

# Standardize column names for easier access
df.columns = [c.strip().replace(' ', '_').lower() for c in df.columns]
print("\nStandardized Columns:")
print(df.columns.tolist())

# Convert certain columns to categorical type
cat_cols = [
    'sex', 'chest_pain_type', 'fasting_blood_sugar',
    'resting_ecg', 'exercise_angina', 'st_slope'
]
for col in cat_cols:
    if col in df.columns:
        df[col] = df[col].astype('category')
        print(f"Converted {col} to 'category' dtype.")

# One-hot encode multiclass categorical columns (not binary)
multi_class_cols = ['chest_pain_type', 'resting_ecg', 'st_slope']
df = pd.get_dummies(df, columns=[c for c in multi_class_cols if c in df.columns], drop_first=True)
print("\nApplied one-hot encoding to:", [c for c in multi_class_cols if c in df.columns])

# Save processed DataFrame
df.to_csv('processed_dataset.csv', index=False)
print("\nProcessed data saved as 'processed_dataset.csv'.")

# Show shape after processing
print(f"\nProcessed Data Shape: {df.shape}")


Total Rows: 1190, Total Columns: 12

Standardized Columns:
['age', 'sex', 'chest_pain_type', 'resting_bp_s', 'cholesterol', 'fasting_blood_sugar', 'resting_ecg', 'max_heart_rate', 'exercise_angina', 'oldpeak', 'st_slope', 'target']
Converted sex to 'category' dtype.
Converted chest_pain_type to 'category' dtype.
Converted fasting_blood_sugar to 'category' dtype.
Converted resting_ecg to 'category' dtype.
Converted exercise_angina to 'category' dtype.
Converted st_slope to 'category' dtype.

Applied one-hot encoding to: []

Processed data saved as 'processed_dataset.csv'.

Processed Data Shape: (1190, 17)


4. Data Processing

In [2]:
import pandas as pd

# Load the dataset FIRST
df = pd.read_csv('dataset.csv')

# (Optional) Standardize column names to snake_case for easier handling
df.columns = [c.strip().replace(' ', '_').lower() for c in df.columns]

# Now you can access the columns safely
df['chest_pain_type'] = df['chest_pain_type'].astype('category')

# Example: Print data types to verify
print(df.dtypes)

# Save processed data
df.to_csv('processed_dataset.csv', index=False)
print("Processed data saved as 'processed_dataset.csv'.")

age                       int64
sex                       int64
chest_pain_type        category
resting_bp_s              int64
cholesterol               int64
fasting_blood_sugar       int64
resting_ecg               int64
max_heart_rate            int64
exercise_angina           int64
oldpeak                 float64
st_slope                  int64
target                    int64
dtype: object
Processed data saved as 'processed_dataset.csv'.


5. Model Training

We perform hyperparameter tuning using GridSearchCV to find the best parameters for our Random Forest Classifier.

In [3]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report

# Load your processed data
df = pd.read_csv('processed_dataset.csv')
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Parameter grid for tuning
param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10]
}

rf = RandomForestClassifier(random_state=42)
grid = GridSearchCV(rf, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid.fit(X_train, y_train)

print("Best parameters:", grid.best_params_)

# Evaluate
y_pred = grid.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Best parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}
Accuracy: 0.9243697478991597
              precision    recall  f1-score   support

           0       0.92      0.92      0.92       112
           1       0.93      0.93      0.93       126

    accuracy                           0.92       238
   macro avg       0.92      0.92      0.92       238
weighted avg       0.92      0.92      0.92       238



6. Train and Save Final Model

Now we'll train a final Random Forest model and save it for use in the app.

In [4]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import joblib

# Load data
df = pd.read_csv('processed_dataset.csv')

X = df.drop('target', axis=1)
y = df['target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale numerical columns
num_cols = ['age', 'resting_bp_s', 'cholesterol', 'max_heart_rate', 'oldpeak']
scaler = StandardScaler()
X_train[num_cols] = scaler.fit_transform(X_train[num_cols])
X_test[num_cols] = scaler.transform(X_test[num_cols])

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Save model to a joblib file
joblib.dump(model, 'best_random_forest_model.joblib')

print("✅ Model training and saving complete!")

✅ Model training and saving complete!


7. Train-Test Split and Scaling

We split our data into train and test sets and scale the numeric columns before model building.

In [5]:
import joblib
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load processed data
df = pd.read_csv('processed_dataset.csv')

# Separate features and target
X = df.drop('target', axis=1)
y = df['target']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale numeric columns
num_cols = ['age', 'resting_bp_s', 'cholesterol', 'max_heart_rate', 'oldpeak']
scaler = StandardScaler()
X_train[num_cols] = scaler.fit_transform(X_train[num_cols])
X_test[num_cols] = scaler.transform(X_test[num_cols])

# Save split and scaled data (optional)
X_train.to_csv('X_train.csv', index=False)
X_test.to_csv('X_test.csv', index=False)
y_train.to_csv('y_train.csv', index=False)
y_test.to_csv('y_test.csv', index=False)

print("Train-test split and scaling complete!")

Train-test split and scaling complete!


8. App Overview

The trained model is integrated into a Streamlit app for interactive heart disease risk prediction. Users can input clinical features and receive instant predictions, model confidence, feature importance, and recommendations.

9.streamlit App Code

Below is the full code for the Streamlit web application (`heart_disease_app.py`) used for interactive heart disease risk prediction.

In [7]:
import streamlit as st
import joblib
import numpy as np
import matplotlib.pyplot as plt
import io
import pandas as pd

# --- PAGE CONFIG ---
st.set_page_config(
    page_title="Cardinova - Heart Disease Predictor",
    page_icon="❤️",
    layout="centered"
)

# --- CENTERED APP NAME AND SUBTITLE ---
st.markdown("""
<style>
.cardi-title {
    text-align: center;
    font-size: 2.8rem;
    font-weight: 900;
    letter-spacing: 1px;
    line-height: 1.07;
    margin-bottom: 0.2em;
}
.cardi-gradient {
    background: linear-gradient(90deg, #ff4b2b, #ff416c);
    -webkit-background-clip: text;
    -webkit-text-fill-color: transparent;
}
.cardi-sub {
    text-align: center;
    color: #e6005c;
    margin-top: -12px;
    font-size: 1.35rem;
    font-weight: 600;
}
</style>
<h1 class="cardi-title">
    <span class="cardi-gradient">Cardinova</span>
</h1>
<div class="cardi-sub">
    Your Smart Heart Risk Predictor
</div>
""", unsafe_allow_html=True)

st.sidebar.markdown("---")
st.sidebar.header("Enter Patient Data")

# --- PATIENT DATA INPUTS ---
patient_name = st.sidebar.text_input("Patient Name (optional)", value="")
age = st.sidebar.number_input("Age (years)", min_value=1, max_value=120, value=50)
sex = st.sidebar.radio("Sex", ["Female", "Male"])
chest_pain_type = st.sidebar.selectbox("Chest Pain Type", [
    "Typical Angina (1)",
    "Atypical Angina (2)",
    "Non-anginal Pain (3)",
    "Asymptomatic (4)"
])
resting_bp_s = st.sidebar.slider("Resting Blood Pressure (mmHg)", 50, 250, 120)
cholesterol = st.sidebar.slider("Serum Cholesterol (mg/dl)", 100, 600, 200)
fasting_blood_sugar = st.sidebar.radio("Fasting Blood Sugar > 120 mg/dl", ["No (0)", "Yes (1)"])
resting_ecg = st.sidebar.selectbox("Resting ECG", [
    "Normal (0)",
    "ST-T wave abnormality (1)",
    "Left Ventricular Hypertrophy (2)"
])
max_heart_rate = st.sidebar.slider("Max Heart Rate Achieved", 60, 220, 150)
exercise_angina = st.sidebar.radio("Exercise Induced Angina", ["No (0)", "Yes (1)"])
oldpeak = st.sidebar.slider("Oldpeak (ST depression)", 0.0, 10.0, 1.0, step=0.1)
st_slope = st.sidebar.selectbox("Slope of Peak Exercise ST Segment", [
    "Upward (1)",
    "Flat (2)",
    "Downward (3)"
])

# --- TIPS FOR HEART SAFETY ---
st.markdown("""
<div style='background-color:#fff3e6; border-radius:10px; padding: 1.5em; margin-bottom:1em; border: 2px solid #ffd1a9'>
<h3 style='color:#d7263d;'>❤️ Tips for a Healthy Heart</h3>
<ul style='color:#333; font-size:1.1em; line-height:1.5;'>
  <li>Eat a balanced diet with plenty of fruits and vegetables 🥗</li>
  <li>Exercise regularly (at least 30 min a day) 🏃‍♂️</li>
  <li>Avoid smoking and limit alcohol ❌</li>
  <li>Manage stress with relaxation and mindfulness 🧘</li>
  <li>Monitor blood pressure and cholesterol 📈</li>
  <li>Get regular checkups 👩‍⚕️</li>
</ul>
</div>
""", unsafe_allow_html=True)

# --- CONVERT CATEGORICAL INPUTS ---
sex_num = 1 if sex == "Male" else 0
chest_pain_type_num = int(chest_pain_type.split("(")[-1][0])
fasting_blood_sugar_num = 1 if fasting_blood_sugar == "Yes (1)" else 0
resting_ecg_num = int(resting_ecg.split("(")[-1][0])
exercise_angina_num = 1 if exercise_angina == "Yes (1)" else 0
st_slope_num = int(st_slope.split("(")[-1][0])

# --- LOAD MODEL ---
model = joblib.load("best_random_forest_model.joblib")
feature_names = [
    "Age", "Sex", "Chest Pain Type", "Resting BP", "Cholesterol",
    "Fasting Blood Sugar", "Resting ECG", "Max Heart Rate",
    "Exercise Angina", "Oldpeak", "ST Slope"
]

# --- MAIN PREDICTION BUTTON ---
btn_col1, btn_col2, btn_col3 = st.columns([1,2,1])
with btn_col2:
    predict_clicked = st.button("💖 Predict Heart Disease Risk", use_container_width=True)

if predict_clicked:
    input_data = np.array([[age, sex_num, chest_pain_type_num, resting_bp_s, cholesterol,
                            fasting_blood_sugar_num, resting_ecg_num, max_heart_rate,
                            exercise_angina_num, oldpeak, st_slope_num]])
    prediction = model.predict(input_data)[0]
    probability = model.predict_proba(input_data)[0][1]

    st.markdown("<hr>", unsafe_allow_html=True)
    st.markdown("### 🩺 Prediction Result")
    if patient_name.strip():
        st.markdown(f"**Patient Name:** <span style='color:#ff8c00;font-weight:700'>{patient_name.strip()}</span>", unsafe_allow_html=True)

    if prediction == 1:
        st.error("⚠️ <b>High risk of Heart Disease!</b>", unsafe_allow_html=True)
        st.markdown(f"<b>Model confidence:</b> <span style='color:#d7263d'>{probability*100:.1f}%</span>", unsafe_allow_html=True)
    else:
        st.success("✅ <b>Low risk of Heart Disease.</b>", unsafe_allow_html=True)
        st.markdown(f"<b>Model confidence:</b> <span style='color:green'>{(1-probability)*100:.1f}%</span>", unsafe_allow_html=True)
    st.info("For best results, consult a medical professional with this report.")
    st.write("---")

    # --- FEATURE IMPORTANCE CHART ---
    st.markdown("#### 🔑 Model Feature Importance")
    importance = model.feature_importances_
    sorted_idx = np.argsort(importance)
    fig, ax = plt.subplots(figsize=(8, 5))
    ax.barh(range(len(importance)), importance[sorted_idx], color='#e6005c')
    ax.set_yticks(range(len(importance)))
    ax.set_yticklabels([feature_names[i] for i in sorted_idx])
    ax.set_xlabel('Importance')
    ax.set_title('Feature Importance (Random Forest)')
    st.pyplot(fig)
    plt.clf()

    # --- Recommendations ---
    st.write("---")
    st.markdown("## 📝 Recommendations")
    if prediction == 1:
        st.markdown("""
        - Consult your cardiologist for further evaluation.
        - Begin/continue a heart-healthy diet and regular exercise.
        - Consider quitting smoking and reducing alcohol intake.
        - Monitor and manage blood pressure, cholesterol, and blood sugar closely.
        - Practice daily stress-reduction techniques (yoga, meditation, etc).
        """)
    else:
        st.markdown("""
        - Maintain your healthy lifestyle—keep up the good work!
        - Continue regular physical activity and balanced nutrition.
        - Schedule annual health checkups.
        - Avoid smoking and excessive alcohol.
        - Manage stress with relaxation or hobbies.
        """)

# --- RATING SECTION ---
st.markdown("---")
st.markdown("### ⭐ Rate Your Experience")
rating = st.slider("How would you rate this app?", min_value=1, max_value=5, value=5, format="%d ⭐")
if rating:
    st.write(f"Thank you for rating us {rating} star{'s' if rating > 1 else ''}! 🌟")

# --- FOOTER ---
st.markdown("""
<div style='text-align: center; color:gray; font-size:12px; margin-top:2em;'>
Made with ❤️ by <b>Shilpa Roy</b> | © 2025 Cardinova
</div>
""", unsafe_allow_html=True)





DeltaGenerator()

10.Example: App Screenshot

Below is a screenshot of the project opened in VS Code, showing the app being launched:  
![Cardinova Screenshot](cardinova.png)

11. Conclusion

Cardinova demonstrates a robust and interpretable approach to heart disease risk prediction using machine learning and an interactive web interface. This workflow can be extended for other clinical risk prediction tasks.