# Heart Attack Prediction using Logistic Regression and Random Forest

Following CRISP-DM methodology to predict heart attacks.

## Step 1: Importing necessary libraries

We first import the required libraries for data manipulation, model training, and evaluation.

In [None]:

# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
import joblib


## Step 2: Loading the Dataset

Load the heart attack dataset from a CSV file and display the first few rows to understand the structure.

In [None]:

# Load dataset
file_path = '/content/heart.csv'
heart_data = pd.read_csv(file_path)

# Step 2: Data Understanding
# Display first few rows of the dataset
heart_data.head()


## Step 3: Data Preparation

In this step, we define our features and target variable. We also split the data into training and testing sets and apply scaling to ensure that the features are properly normalized.

In [None]:

# Step 3: Data Preparation
# Define features and target variable
X = heart_data.drop(columns='output')
y = heart_data['output']

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply StandardScaler to continuous variables for scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Step 4: Model Training

We will train two machine learning models: Logistic Regression and Random Forest. Both models will be trained using the training set.

In [None]:

# Step 4: Modeling - Logistic Regression and Random Forest
log_reg = LogisticRegression()
rf_clf = RandomForestClassifier(random_state=42)

# Train models on the training data
log_reg.fit(X_train_scaled, y_train)
rf_clf.fit(X_train_scaled, y_train)

# Predict on the test data
log_reg_preds = log_reg.predict(X_test_scaled)
rf_clf_preds = rf_clf.predict(X_test_scaled)


## Step 5: Model Evaluation

We evaluate the models using metrics such as accuracy, precision, recall, F1-score, and ROC-AUC. This helps us understand how well the models are performing.

In [None]:

# Step 5: Evaluation
def evaluate_model(y_test, predictions):
    return {
        "Accuracy": accuracy_score(y_test, predictions),
        "Precision": precision_score(y_test, predictions),
        "Recall": recall_score(y_test, predictions),
        "F1-Score": f1_score(y_test, predictions),
        "ROC-AUC": roc_auc_score(y_test, predictions)
    }

# Get evaluation metrics for Logistic Regression and Random Forest
log_reg_metrics = evaluate_model(y_test, log_reg_preds)
rf_clf_metrics = evaluate_model(y_test, rf_clf_preds)

# Display the evaluation results
print("Logistic Regression Metrics:", log_reg_metrics)
print("Random Forest Metrics:", rf_clf_metrics)


## Step 6: Model Deployment

We save the Logistic Regression model as a file so it can be deployed in a real-world system for making predictions.

In [None]:

# Step 6: Deployment
# Exporting the Logistic Regression model
model_file_path = '/content/logistic_regression_heart_attack_model.pkl'
joblib.dump(log_reg, model_file_path)

print("Model saved as logistic_regression_heart_attack_model.pkl")
