Early sepsis prediction using binary classification

Intro

This work is for ADIN (Data analsis and business intelligence) subject.

Solution

LightGBM classifier was used here. All the models can be ensembled and tested with the same splits.

LightGBM is fed with raw data with no future extraction.

LightGBM saves 5 models that should be used in ensembling, as an average prediction. It also performs the threshold search for the best threshold and save features importance.

All the runs perform extensive logging that should faciliate the process of preparing the solutions.

Results

LightGBM scores 0.40072137437968547 in Stratified 5 fold local CV, without tuning. It can be even better in the real test because of ensembling.

Classifier works on almost raw data!

Installation

pip install -r requirements.txt

Download the data, directory training into

data/raw/

Running

Convert data to .csv python psv_to_csv.py

it create data/processed/training directory with .csv files
Convert data to pickle and hdf, with different NaN filling python data_segregation.py
training python training.py

Training as is will train lightgbm models and will save logs in data/logs
Classfier(this should not be run, its a supporting file) python classifier.py

Classifer is our LGBM classfier as per the solution we want to obtain. it is used in training script for taining the model

5.Score_all python score_all.py

This file takes a input directory(training files) and reads all files and gives score for each prediction and label it in every file.(input and output directories needs to be given as arguments).

6.Compute_scores python compute_scores.py This file is used to find out some statistics about the model and predictions. it takes input as directories of output files from above script and gives out AUROC,AUPRC,ACCURACY,f-MEASURE,UTILITY SCORE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Early sepsis prediction using binary classification

Intro

Solution

LightGBM classifier was used here. All the models can be ensembled and tested with the same splits.

Results

Installation

Running

Sampleoutput (imporatnce of features)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
__pycache__		__pycache__
data		data
path		path
.DS_Store		.DS_Store
README.md		README.md
classifier.py		classifier.py
compute_scores.py		compute_scores.py
data_segregation.py		data_segregation.py
psv_to_csv.py		psv_to_csv.py
requirements.txt		requirements.txt
score_all.py		score_all.py
training.py		training.py

kalakondasrikanth/ADIN

Folders and files

Latest commit

History

Repository files navigation

Early sepsis prediction using binary classification

Intro

Solution

LightGBM classifier was used here. All the models can be ensembled and tested with the same splits.

Results

Installation

Running

Sampleoutput (imporatnce of features)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages