# Title: Heart-Failure-Condition-And-Survival-Analysis

#### Group Member Names : Jemin Shrestha, Biplove Jaisi



### INTRODUCTION:
Heart failure is a serious condition where the heart doesn’t pump blood effectively, leading to life-threatening complications. In this project, I explored machine learning to predict survival in heart failure patients. The paper had models like SVM, decision trees, random forests, XGBoost, and LightGBM. I tried to use MLP model to improve the accuracy but couldn't get the best result. While the MLP model was fine-tuned extensively, it didn’t achieve the desired accuracy. To improve results, I built an ensemble method that combined multiple models, and it performed much better, offering balanced and reliable predictions. I also used LIME chart to make the predictions more transparent, identifying important factors like ejection fraction and serum creatinine.
*********************************************************************************************************************
#### AIM :
1. **Improve Survival Prediction for Heart Failure Patients:**  
   Leverage machine learning techniques to develop accurate models for predicting survival in patients diagnosed with heart failure.

2. **Identify Key Risk Factors:**  
   Use survival analysis techniques to uncover the most significant clinical features, such as ejection fraction and serum creatinine, that influence patient outcomes.

3. **Optimize Model Performance:**  
   Explore and fine-tune various machine learning models, including SVM, decision trees, random forests, XGBoost, LightGBM, and MLP, to identify the most effective approach for this task.

4. **Enhance Prediction Reliability with Ensemble Learning:**  
   Design an ensemble model to combine the strengths of individual algorithms, achieving more balanced and robust predictions.

5. **Interpret Model Predictions:**  
   Use LIME chart to provide transparency into the models’ decision-making processes, helping clinicians understand the impact of each feature on survival outcomes.

*********************************************************************************************************************
#### Github Repo: https://github.com/jemin007/Heart-Failure-Condition-And-Survival-Analysis

*********************************************************************************************************************
#### DESCRIPTION OF PAPER: 
This project investigates survival analysis and prediction for heart failure patients, focusing on 299 individuals with left ventricular systolic dysfunction. Using Kaplan-Meier and Cox regression methods, key factors influencing survival—such as ejection fraction, serum creatinine, age, and time—were identified. Machine learning models, including SVM, decision trees, random forests, XGBoost, and LightGBM, were employed to predict survival outcomes, with ensemble methods enhancing accuracy. The study highlights the potential of combining statistical analysis and machine learning for better prognosis, risk assessment, and tailored healthcare solutions in managing heart failure.
*********************************************************************************************************************
#### PROBLEM STATEMENT :
The problem addressed in this study is the challenge of accurately predicting the survival outcomes of heart failure patients using machine learning (ML) models. Traditional clinical methods provide limited insights, often lacking the ability to handle complex, high-dimensional medical data. This study leverages advanced AI and ML techniques, including Support Vector Machines (SVM), Decision Trees, Random Forest, XGBoost, and LightGBM, to predict survival outcomes based on clinical features such as ejection fraction, serum creatinine, age, and time. Despite achieving promising results with these models, further optimization is necessary to improve their performance. An ensemble approach was explored to enhance prediction accuracy, demonstrating its superiority over individual models. The study highlights the potential of AI in transforming heart failure prognosis and supporting medical professionals with more reliable and efficient decision-making tools.
*********************************************************************************************************************
#### CONTEXT OF THE PROBLEM:
Heart failure is a chronic condition where the heart is unable to pump blood efficiently, affecting millions globally and resulting in a significant number of deaths annually. Accurate survival prediction for heart failure patients is critical for timely interventions and personalized care. Traditional clinical methods, though valuable, often fall short due to the complexity of heart failure's progression and the various factors influencing patient outcomes. Machine learning (ML) models present a promising solution, as they can handle large datasets and identify complex relationships between clinical variables to predict survival more accurately. However, despite their potential, predicting heart failure outcomes remains challenging due to the interplay of numerous risk factors such as ejection fraction, age, serum creatinine levels, and others. This research aims to explore how advanced ML techniques can improve the accuracy and reliability of heart failure survival predictions.


*********************************************************************************************************************
#### SOLUTION:
This study addresses the challenges of heart failure survival prediction by applying a suite of machine learning models, including Support Vector Machine (SVM), Decision Tree, Random Forest, XGBoost, and LightGBM, to clinical data from heart failure patients. Through feature analysis, key risk factors such as ejection fraction, serum creatinine, age, and time since diagnosis were identified as crucial predictors of survival. The individual models were first tested and evaluated, and then combined using an ensemble method to enhance performance, achieving higher accuracy compared to individual models. Additionally, LIME (Local Interpretable Model-agnostic Explanations) was used to provide insights into the decision-making process of these models, ensuring that predictions were interpretable and actionable for healthcare providers. The ensemble approach, coupled with model explainability, resulted in a more reliable and accurate system for predicting heart failure survival, showcasing the potential of AI and machine learning to improve decision-making and patient outcomes in the healthcare domain.


# Background
*********************************************************************************************************************


|Reference|Explanation|Dataset/Input|Weakness|
|------|------|------|------|



*********************************************************************************************************************






# Implement paper code :
*********************************************************************************************************************

* https://github.com/jemin007/Heart-Failure-Condition-And-Survival-Analysis/blob/master/Heart_Failure_Prediction.ipynb



*********************************************************************************************************************
### Contribution  Code & Results:
#### Implementing MLP Model 
* ![image.png](attachment:image.png)
* ![image-2.png](attachment:image-2.png)

#### Hyperparameter tuning with Randomized CV
* ![image-3.png](attachment:image-3.png)
* ![image-4.png](attachment:image-4.png)

#### LIME Feature contribution
* ![image-5.png](attachment:image-5.png)
* ![image-6.png](attachment:image-6.png)
* ![image-7.png](attachment:image-7.png)

#### Comparison of accuracy between Ensemble Method vs Random Forest
* ![image-8.png](attachment:image-8.png)
* ![image-9.png](attachment:image-9.png)

#### Observations :
*******************************************************************************************************************************
1. **Feature Importance**: Ejection fraction, serum creatinine were found to be the most significant risk factors for heart failure survival. Age and time were also important, while hypertension and anemia contributed to increased risk.
2. **Model Selection**: Various machine learning models, including SVM, Decision Tree, Random Forest, XGBoost, and LightGBM, were tested for their ability to predict heart failure survival out of which Tuned Random Forest had the highest accuracy. 
3. **Hyperparameter Tuning**: All the models were fine-tuned using techniques like RandomizedSearchCV to optimize their performance out of which Random Forest had the best result.
4. **Ensemble Method**: The ensemble method, which combined the predictions from multiple models, achieved better accuracy and robustness in comparison to individual models.
5. **Performance Metrics**: The ensemble method outperformed the tuned Random Forest model in terms of accuracy, precision, and recall, especially for predicting heart failure (class "1").
6. **LIME Explanations**: LIME was used to visualize the decision-making process of the models, providing interpretability and transparency in model predictions.
7. **Practical Application**: The ensemble approach and model explainability can support clinicians in making data-driven decisions, improving heart failure prognosis and potentially saving lives.


#### CONCLUSION:  
This study demonstrates the potential of machine learning models in predicting heart failure survival, leveraging clinical data and advanced algorithms. The ensemble method proved to be the most effective, achieving superior accuracy and balanced performance compared to individual models. By identifying significant clinical features like ejection fraction, serum creatinine, and age, the models provide actionable insights that can guide clinicians in assessing patient risk. The use of explainability tools like LIME further enhances trust in AI by providing interpretable predictions, making it a valuable tool in healthcare decision-making.  

*******************************************************************************************************************************

#### FUTURE DIRECTION:  
1. **Larger Datasets**: Incorporating larger and more diverse datasets could enhance the models’ robustness and generalizability, ensuring better predictions across various populations.  
2. **Integration with EHR Systems**: Embedding these predictive models into electronic health record systems for real-time decision support could revolutionize clinical workflows.  
3. **Advanced Algorithms**: Exploring other cutting-edge algorithms like transformers or deep learning architectures might uncover further performance gains.  
4. **Continuous Learning Systems**: Developing systems that can learn incrementally as new data becomes available will keep the models updated and accurate.  
5. **Personalized Medicine**: Using these models to tailor treatment plans for individual patients based on their clinical profiles could significantly improve outcomes.  
6. **Clinical Trials**: Validating these models in real-world clinical settings through trials will help establish their practical utility and impact.  

*******************************************************************************************************************************

#### Learnings:  
- Machine learning algorithms can effectively identify significant clinical factors influencing heart failure survival.  
- Ensemble methods outperform individual models by balancing weaknesses and leveraging strengths of multiple algorithms.  
- Explainability tools like LIME are crucial for interpreting AI-driven decisions, especially in sensitive applications like healthcare.  
- Feature selection from survival analysis enhances the relevance and efficiency of predictive models.  

*******************************************************************************************************************************

#### Results Discussion:  
- The ensemble method demonstrated a **blended accuracy of 92.7%**, significantly outperforming the tuned Random Forest.  
- Key clinical features such as ejection fraction, serum creatinine, and age were identified as critical predictors of survival, aligning with domain knowledge.  
- High precision and recall for both classes indicate the reliability of the ensemble model in minimizing false negatives and false positives, crucial in medical applications.  
- Visualizations of model predictions and feature importance provide transparency and improve understanding of model behavior.  

*******************************************************************************************************************************

#### Limitations:  
- The dataset used is relatively small, which may limit the generalizability of the results.  
- Computational complexity of the ensemble method could make deployment challenging in resource-constrained settings.  
- The imbalance between the "survived" and "not survived" classes may affect model fairness and performance.  
- Dependency on retrospective data may not fully capture dynamic changes in patient health conditions.  

*******************************************************************************************************************************

#### Future Extension:  
- Expanding the dataset to include more diverse populations and longitudinal data for better model training.  
- Incorporating additional clinical parameters and genetic data for a more comprehensive analysis.  
- Developing real-time prediction systems integrated into clinical workflows for proactive decision-making.  
- Exploring deep learning models, such as recurrent neural networks, for time-series patient data.  
- Conducting clinical trials to validate the practical impact of the predictive models in real-world healthcare settings.  

# References:

[1] :  https://github.com/sauravmishra1710/Heart-Failure-Condition-And-Survival-Analysis/tree/master?tab=readme-ov-file

[2] : https://jeeemi.org/index.php/jeeemi/article/view/225/94

[3] : Mishra, S. (2022) “A Comparative Study for Time-to-Event Analysis and Survival Prediction for Heart Failure Condition using Machine Learning Techniques”, Journal of Electronics, Electromedical Engineering, and Medical Informatics, 4(3), pp. 115-134. doi: 10.35882/jeeemi.v4i3.225.