# **04. Model Results and Interpretation**
The Logistic Regression model was trained using data that had undergone preprocessing (standardization of numerical features and one-hot encoding of categorical features), with an 80:20 stratified train–test split. Evaluation on the test data produced an accuracy of about 0.73, indicating that the model correctly classified roughly 73% of the samples into either the “experiencing heart attack” or “not experiencing heart attack” classes.

In addition to accuracy, other metrics observed were precision, recall, and f1-score for each class. For class 0 (no heart attack), precision and recall are around 0.75 and 0.82, meaning the model performs fairly well in identifying individuals who do not experience a heart attack. However, for class 1 (heart attack), precision is about 0.69 and recall about 0.58, so there are still quite a number of positive cases that are not detected (false negatives), which is important to note because the medical context is particularly sensitive to failures in detecting high-risk patients. 

Analysis of the Logistic Regression coefficients shows that several clinical features have a stronger influence on the predicted likelihood of heart attack. The features with the highest absolute coefficient values are previous_heart_disease, hypertension, and diabetes, all of which have positive coefficients. This indicates that having a history of heart disease, hypertension, and diabetes is associated with an increased likelihood that a person will be predicted by the model as experiencing a heart attack, assuming other factors remain at the same level. 

In addition, several lifestyle-related features, such as never smoking, past smoking, or never consuming alcohol, have negative coefficients with relatively small absolute values. In general, these negative coefficients suggest that these categories tend to be associated with a reduced likelihood of heart attack compared with their reference categories, although their contribution is not as large as that of the main clinical factors such as history of heart disease, hypertension, and diabetes. Thus, the model reinforces the EDA finding that classical clinical factors remain the primary determinants of risk, while lifestyle factors provide a more subtle additional influence. 

Overall, the Logistic Regression model provides an interpretable view of the factors associated with increased or decreased heart attack risk in this dataset. However, the model only reflects statistical relationships in the data and cannot be used as a direct medical diagnostic tool. The predictions should be regarded as analytical support for understanding risk patterns, not as a substitute for clinical decisions made by healthcare professionals. 

## **Current model limitations**  
- The recall for class 1 (heart_attack = 1) is still around 0.58, which means many patients who are actually positive are predicted as negative (high false negatives). 
- There is likely class imbalance (the number of class 0 samples is larger than class 1), so the model tends to “play it safe” by predicting the majority class more often. 
- Only one model (Logistic Regression) is used, so there is no comparison to see whether other methods could improve recall or f1-score. 

## **Technical improvement ideas**  
- Adjust the class_weight parameter in Logistic Regression, for example class_weight="balanced", to give a higher penalty to errors in class 1 so that its recall can potentially increase (with the consequence of possibly more false positives). 
- Change the decision threshold (not fixed at 0.5). For example, label heart_attack = 1 if the predicted probability is ≥ 0.4, making the model more “sensitive” to positive cases, which can increase recall even though precision may slightly decrease. 
- Try other models such as Random Forest or Gradient Boosting for comparison, then compare metrics, especially recall and f1-score for class 1. 