New Segmentation and Feature Extraction Algorithm for Classification of White Blood Cells in Peripheral Smear Images
We address a new method for the classification of white blood cells (WBCs) using image processing techniques and machine learning methods. The proposed method consists of three steps: detecting the nucleus and cytoplasm, extracting features, and classification. At first, a new algorithm is designed to segment the nucleus. For the cytoplasm to be detected, only a part of it located inside the convex hull of the nucleus is involved in the process. This attitude helps us overcome the difficulties of segmenting the cytoplasm. In the second phase, three shapes and four novel color features are devised and extracted. Finally, by using an SVM model, the WBCs are classified. The segmentation algorithm can detect the nucleus with a dice similarity coefficient of 0.9675. The proposed method can categorize WBCs in RaabinWBC, LISC, and BCCD datasets with accuracies of 94.65 %, 92.21 %, and 94.20 %, respectively. It is worth mentioning that the hyperparameters of the classifier are fixed only with the Raabin-WBC dataset, and these parameters are not readjusted for LISC and BCCD datasets. The obtained results demonstrate that the proposed method is robust, fast, and accurate. The paper is available at: https://doi.org/10.1038/s41598-021-98599-0
The method is developed in python3 and the following libraries should be installed:
- Python: 3.7
- Numpy: 1.19.1
- opencv-python: 184.108.40.206
- scikit-image: 0.16.2
- scikit-learn: 0.23.2
- scipy: 1.5.2
- pyhdust: 1.3.26
Steps to set up and execute the code
Step1: Data preparation
- We cropped the white blood cells of the LISC dataset and made them suit for our own work. So if you use the LISC dataset, you must cite its paper. Download the cropped images of the LISC dataset from here. Also, you can download the original LISC dataset from here.
- Besides the LISC dataset, we also used the BCCD dataset. The original BCCD dataset is available from Kaggle. We made this dataset suit for our own work. Download the dataset from here.
- Finally, download the Double-labeled Raabin-WBC dataset from here. Note that these data are the same as the original version, except these data have been prepared for this repo and the related paper.
- After downloading the datasets, extract and put them beside the main.py . Then, you can run the main.py. Type 1 or 2 or 3 to select the dataset.
- Tavakoli, S., Ghaffari, A., Kouzehkanan, Z.M. et al. New segmentation and feature extraction algorithm for classification of white blood cells in peripheral smear images. Sci Rep 11, 19428 (2021). https://doi.org/10.1038/s41598-021-98599-0
- Kouzehkanan, Z.M., Saghari, S., Tavakoli, S. et al. A large dataset of white blood cells containing cell locations and types, along with segmented nuclei and cytoplasm. Sci Rep 12, 1123 (2022). https://doi.org/10.1038/s41598-021-04426-x