**<h3>Semester Project: Conclusions and Results of Heart Disease Prediction Analysis</h3>**

**Introduction:**

>In this semester-long project, I developed a logistic regression model to predict the likelihood of a heart attack based on various health attributes. This comprehensive analysis involved data preprocessing, feature selection, model training, and evaluation using the Cleveland heart disease dataset.

>The initial exploration of the dataset aimed to understand its structure and identify any missing values or outliers. The dataset contained 303 records with 14 attributes each.

In [None]:
import pandas as pd
import seaborn as sns # We'll try using this, because it's very neat :D
import matplotlib.pyplot as plt

# Load the dataset
data = pd.read_csv('heart.csv')

# Data info and stats
print(data.info())
print(data.describe())

# Correlation heatmap: :) - Figure 1 fyi
plt.figure(figsize=(12, 8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap')
plt.show()

# Distribution of target variable :> - Figure 2 btw
sns.countplot(data['target'])
plt.title('Distribution of Target Variable')
plt.show()

**Feature Selection:**

> Using Recursive Feature Elimination (RFE), I selected the most relevant features for the logistic regression model. This step was crucial in reducing dimensionality and improving model performance.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE

# Select relevant features based on domain knowledge and correlation analysis :D
features = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
target = 'target'

X = data[features]
y = data[target]

# Handle categorical features using one-hot encoding
X = pd.get_dummies(X, columns=['cp', 'restecg', 'slope', 'thal'], drop_first=True) # it goess brrrr

# Scale numerical features
scalar = StandardScaler()
X_scaled = scalar.fit_transform(X)

# FINALLY now we can do da feature selection :D
model = LogisticRegression(max_iter=1000)
rfe = RFE(model, n_features_to_select=8)
rfe.fit(X_scaled, y)
selected_features = X.columns[rfe.support_]
print(f"Selected Features: {selected_features}")

# Selected X
X_selected = X[selected_features]

**Model Training and Hyperparameter Tuning**

In [None]:
from sklearn.model_selection import GridSearchCV

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_selected, y, test_size=0.3, random_state=42)

# Hyperparameter tuning (why do I do this to myself?)
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100]}
grid_search = GridSearchCV(LogisticRegression(max_iter=1000), param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_

# Train the logistic regression model with the best hyperparameters
best_model = LogisticRegression(max_iter=1000, C=best_params['C'])
best_model.fit(X_train, y_train)

**Model Evaluation:**

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score, roc_curve
# We boutta evaluate da model :D ^^

# Make predictions
y_pred = best_model.predict(X_test)
y_pred_prob = best_model.predict_proba(X_test)[:, 1]

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
auc_roc = roc_auc_score(y_test, y_pred_prob)

# PRINT ALL THE THINGS
print(f"Best Parameters: {best_params}")
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
print(f"AUC-ROC: {auc_roc}")

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_prob)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'AUC = {auc_roc:.2f}')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()