Supervised learning of Plasmodium falciparum life cycle stages using single-cell transcriptomes identifies crucial proteins
Table of Contents
Malaria, spread by the female Anopheles mosquito, is a highly fatal disease widespread in many parts of the world, causing 0.4 million deaths globally. Vital gene expressions form the basis in the detection of malaria infection levels. Quantification of malaria parasite infected RBCs and classification of its life cycle stages are done at macroscopic level by experts, for making informed decisions. Off late multiple computational ap- proaches have been proposed to circumvent the problem of dimensionality leading to accurate predicted results. In this work a dimensionality reduction technique based on Genetic Algorithm (GA) is applied on P. falciparum single-cell transcriptomics to arrive at an optimized subset of features from the larger dataset. Features are chosen based on their class variants considering increased efficiency and accuracy, to sepa- rately transform the selected elements into a lower dimension. For the classification of the life cycle of malaria parasite based on single cell transcriptome data, a three- pronged approach employing the multiclass Support Vector Machine (SVM), Logistic Regression (LR) and Random Forest (RF) techniques is used. Distribution of cells was visualised and mapped using the R-based Seurat package. Further, we constructed pro- tein interaction networks of the genes identified by the feature selection method and elucidated the role of the proteins in progression of the parasite through it’s life cycle. Our approach presents a novel protocol to implement ML techniques on scRNA seq datasets and subsequently harnessing the extracted information for biomarker/drug target detection.
- Python 3.5
- sklearn
- sklearn-genetic
These are the steps to run the code locally on your pc:
- pip install all the required libraries.
- Clone the repo
git clone https://github.com/swarnimshukla/Machine-learning-approaches-for-classification-of-Plasmodium-falciparum-life-cycle.git
- Install pip packages
pip3 install ....
Run ga_feature_selection.ipynb on jupyter notebook after installing all the libraries.
- Data.zip -> input data
- ExploratoryDataAnalysis.ipynb -> input data analysis
- ga_feature_selection.ipynb -> main file with feature selection code
- classification_without_feature_selection.ipynb -> code for classification without feature selection
- Classification_of_selected_features.ipynb -> code for classification with feature selection
- random-378-features.ipynb -> randomly 378 features classifcation
- MI_bar_graph.ipynb -> bar plot generated in the paper
Distributed under the MIT License. See LICENSE
for more information.
Your Name - Swarnim Shukla - swarnim.shukla@research.iiit.ac.in