The main goal of our project is prediction of pneumonia and defining the stage of the disease depending on chest X-rays.
Pneumonia is a lung infection that can sometimes lead to severe or life-threatening illness and even death. It's usually caused by a bacterial infection or a virus. This disease is a great threat to people, especially children and adults over the age of 65. Pneumonia is the leading cause of death for children under 5: pneumonia killed 740 180 children under the age of 5 in 2019, accounting for 14% of all deaths of children.
To understand the general situation here is an infographic provided. We can see that although the number of pneumonia cases slightly decreases, it still causes a lot of deaths.
And of course, the coronavirus disease had a major impact on the world. About 15% of COVID-19 cases are severe and cause pneumonia. And about 5% of people have critical infections and need a ventilator.
It is very important to make a diagnosis in time and start treatment immediately. Computed tomography (CT), magnetic resonance imaging (MRI), or radiography (X-rays) is frequently used for diagnosis. X-ray imaging is considered a non-invasive and relatively inexpensive examination of the lungs. But the problem is that the contrast in chest X-ray images is rather low, and it sometimes makes manual evaluation inefficient. Besides, X-ray images have similar region information for different diseases, such as lung cancer.
And that is where computers and AI can help. Computer-aided diagnosis can enhance efficiency and lead to timely treatment.
Our dataset:
It is well splitted - it consists of 3 directories: train, test and validation, each of which contains other 2 directories: the first one with X-Rays of pneumonia and the second one with normal lungs. Besides, it suits well for the use of machine and deep learning.
Example of X-Rays with normal lungs:
Example of X-Rays with pneumonia:
Distibution of data in each dataset (train, test and validation):
Pneumonia detection is a classification problem for which various algorithms of machine learning and deep learning mostly are used. To make it more interesting, we decided to use and analyze both of the approaches.
We have implemented two models of machine learning:
- SVM (support vector machine model)
- XGBoost (Extreme Gradient Boosting model)
Regarding deep learning, we implemented three models:
- CNN
- Resnet
- MobileNet
-
Data processing
- Getting train, validation and test datasets.
-
Images processing
- Improvement of images, getting away unwanted distortions
-
Investigation of Machine Learning models: SVM, Random Forest, Xgboost, K-nearest neighbor.
- Validation of model hyperparameters
- Training and testing data
- Results evaluation
-
Investigation of Deep learning models: CNN, ResNet50, MobileNet:
- Validation of model hyperparameters
- Training and testing data
- Results evaluation
-
Choosing the best model
In our project we used confusion matrix to evaluate results of ML and NN models.
SVM
XGBoost
CNN
ResNet
MobileNet
From thise pictires cam be seen that the best model is MobileNet and we decided to use it.
We analysed different ML and NN approaches, chose ones that suited our problem. Mostly we worked on finding better ML or NN among chosen ones to get bigger accuracy number.
From there can be seen with what probability were found images with pneumonia and normal lungs on trained MobileNet NN.
- Kokolius Khrystyna
- Kyba Sofia