fangzc@smu.edu.cn or hzhou@smu.edu.cn
2023.10.11 The manual was updated because of the change of the virtual machine link.
DeePVP is a tool for the identification and classification of phage virion protein (PVP) using deep learning approach. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the ten major classes of PVPs, namely head-tail joining, collar, tail sheath, tail fiber, portal, minor tail, major tail, baseplate, minor capsid, and major capsid, which are the dominant categories in PVP. DeePVP takes a phage protein sequence file in "fasta" format as input, and outputs a tabular file. The program is optimized in both a virtual machine with GUI and a docker.
Please refer to DeePVP manual.pdf for the DeePVP usage.
DeePVP takes a protein sequence in "one-hot" encoding form as input. The main module uses CNN to extract the sequence features to determine whether the given protein is a PVP. The CNN contains a 1D convolution layer, 1D global max pooling layer, batch normalization layer, full connection layer, and finally, a sigmoid layer that outputs a PVP score between 0 and 1. By default, a protein with a PVP score higher than 0.5 is regarded as a PVP. The extended module also uses CNN to classify the protein into a specific class of PVP. The CNN in the extended module contains a 1D convolution layer, 1D global max pooling layer, batch normalization layer, full connection layer, and finally, a softmax layer that outputs 10 likelihood scores representing the probability that the protein belongs to the head-tail joining, collar, tail sheath, tail fiber, portal, minor tail, major tail, baseplate, minor capsid, or major capsid class. By default, the category with the highest score will be chosen as the final prediction.
Performance comparison using PhANNs benchmark dataset
All the supporting data for results reproduction is available in the GigaScience repository, GigaDB. By referring to the "document_for_results_reproduction.txt" file and all the “readme” files, the readers can easily reproduce all the results.
DeePVP is distributed under a GPL-3.0 license.
Fang Z, Feng T, Zhou H and Chen M. DeePVP: Identification and classification of phage virion proteins using deep learning.