Ensemble-Imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis

This code uses ensemble and imbalance learning approaches to improve identifying short-survival amyotrophic lateral sclerosis patients at diagnosis time. Furthermore, we utilized the SHAP framework to explain how the best model performed the patient classifications.
The results of this work have been published in the research article "Ensemble-Imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis" (Papaiz et al., 2023).

If you use this code for your research please cite this paper:

Papaiz F, Dourado MET, Valentim RAdM, Pinto R, de Morais AHF, Arrais JP. Ensemble-Imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis. 2023.

LICENSE

For those wanting to try it out, this is what you need:

A working version of Python (version 3.9+) and jupyter-notebook.

Install the following Python packages:
- numpy (1.23.5)
- pandas (1.5.3)
- matplotlib (3.7.0)
- seaborn (0.12.2)
- scikit-learn (1.2.1)
- imbalanced-learn (0.10.1)
- shap (0.41.0)

Download the patient data analyzed from the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) website (https://ncri1.partners.org/ProACT)
- Register and log in to the website
- Access the Data menu and download the ALL FORMS dataset
- Extract the zipped data file into the 01_raw_data folder
- The 01_raw_data folder will contain the following CSV files

Perform the Extract-Load-Transform (ETL) step:
- Start the jupyter-notebook environment
- Open and execute all code of the 02.01 - Preprocess raw data.ipynb file, which is inside the 02_ETL folder
- After execution, the preprocessed data will be saved in the 03_preprocessed_data and 04_data_to_analyze folders

Perform the Machine Learning (ML) pipeline:
- Execute the python program exec_grid_search_both_scenarios.py in the 05_Train_Validate_Models folder
- This program will:
  - Split the dataset into Training and Validation subsets
  - Train and validate the ML models for both scenarios (Single-Model and Ensemble-Imbalance)
    - NOTE: It can take a long time to accomplish (even days).
  - Save the performance results into CSV files in the 05_Train_Validate_Models/exec_results folder
- Pipeline Overview:
- Validation performance obtained by each scenario and algorithm:

Execute the SHAP explanations over the model that reached the best performance for the Ensemble-Imbalance scenario(i.e., BalancedBagging model using Neural Networks as a base estimator)
- Create a SHAP Kernel Explainer instance using the best model and the Validation set:
  - explainer = shap.KernelExplainer(<<BEST_MODEL>>.predict, X_valid)
- Generate the SHAP values: (Note: It can take many hours)
  - shap_values = explainer.shap_values(X_valid)
- Analyze the SHAP results by plotting SHAP graphs. See the examples below:
  - Decision plot:
  - Summary plot: (Bar and Dotted plots)

Grid-Search hyperparameters used for each algorithm.

Best models' hyperparameters

Additional Information:

Exploratory Data Analysis

Finally, please let us know if you have any comments or suggestions, or if you have questions about the code or the procedure (correspondence e-mail: fabianopapaiz at gmail dot com).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
01_raw_data		01_raw_data
02_ETL		02_ETL
03_preprocessed_data		03_preprocessed_data
04_data_to_analyze		04_data_to_analyze
05_Train_Validate_Models		05_Train_Validate_Models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py
utils_exec_models.py		utils_exec_models.py
utils_exec_models_new.py		utils_exec_models_new.py
utils_preprocessing.py		utils_preprocessing.py
utils_time_series_generation.py		utils_time_series_generation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble-Imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis

About

Releases

Packages

Languages

License

fabianopapaiz/ensemble_imbalance_model_for_als_prognosis

Folders and files

Latest commit

History

Repository files navigation

Ensemble-Imbalance-based classification for amyotrophic lateral sclerosis prognostic prediction: identifying short-survival patients at diagnosis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages