The project aims to analyze computed tomography (CT) images and classify them into one of three classes: COV, Normal, OtherPneumonia. The dataset is publicly available under this link (data version used in the project: link).
The project consists of Jupyter notebooks and additional py files with model architectures and some useful functions.
The analysis of the dataset and other visualisations can be found in:
- Analysis and Visualisations - analysis of the dataset
- Data Augmentation - images after data augmentation
Preprocessing steps are presented in:
-
Patient Segmentation - finding patient's contour and background removal
-
Lungs Segmentation - lungs segmentation using skimage segmentation
Several model architectures have been used:
- Simple model created from scratch,
- Tiny, Small, LargeW, LargeT architectures from Transfusion: Understanding Transfer Learning for Medical Imaging,
- ResNet-50 from Google Big Transfer,
- EfficientNet B0, B3, B7.
Models were trained on images at different stages of processing:
The influence of class weight (classw) and data augmentation (dataaug) were examined. The lack of these methods is referred to as baseline.
The highest f1 scores achieved on the validation set:
Model | Configuration | F1 Score | AUC |
---|---|---|---|
Simple | original-classw-dataaug | 0.810 | 0.925 |
Tiny | original-baseline | 0.861 | 0.954 |
Small | original-baseline | 0.860 | 0.944 |
LargeW | original-classw | 0.853 | 0.944 |
LargeT | orginal-baseline | 0.888 | 0.958 |
ResNet-50 | original-baseline | 0.766 | 0.893 |
EfficientNet B3 - weights ImageNet | lungs-baseline | 0.768 | 0.896 |
EfficientNet B3 - weights None | lungs-nocrop-baseline | 0.761 | 0.866 |
Scores achieved on the testset by the above best models:
Model | Configuration | F1 Score | AUC |
---|---|---|---|
Simple | original-classw-dataaug | 0.582 | 0.810 |
Tiny | original-baseline | 0.580 | 0.811 |
Small | original-baseline | 0.600 | 0.798 |
LargeW | original-classw | 0.634 | 0.792 |
LargeT | orginal-baseline | 0.577 | 0.796 |
ResNet-50 | original-baseline | 0.711 | 0.886 |
EfficientNet B3 - weights ImageNet | lungs-baseline | 0.629 | 0.845 |
EfficientNet B3 - weights None | lungs-nocrop-baseline | 0.570 | 0.760 |
Metrics and results plots are spread over several notebooks:
- F1 Scores - f1 scores achieved by models on validation set
- Confusion Matrices - confusion matrices for each model configuration
- AUC Scores - AUC scores for all models configurations along with ROC and Precision-Recall curves for the best models
Explainability of the solution is taken up here:
- GradCAM - GradCAM analysis
This project is intended for educational purposes only. It is not a substitute for professional medical advice, diagnosis or treatment.