- 2024.6.5 Update: We have uploaded the
Dataset of PLMSearch & PLMAlign
in Zenodo. - 2024.5.30 Update: We have uploaded the
Dataset of PLMSearch Web Server
in Zenodo.
This is the implement of "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". By using a protein language model, PLMSearch can achieve a sensitivity close to SOAT structure search methods
while being versatile and fast because it is only based on sequences
.
- Webserver
- Requirements
- Data preparation
- Reproduce all our experiments with only one file
- Run PLMSearch locally
- Citation
PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀
PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign
PLMAlign source code : github.com/maovshao/PLMAlign 🚁
Follow the steps in requirements.sh
We have released our experiment data, which can be downloaded from plmsearch_data or Zenodo.
# Include experiment data, PLMSearch model, ESM-1b model, etc.
# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMSearch/static/download/plmsearch_data.tar.gz
tar zxvf plmsearch_data.tar.gz
- Reproduce all our experiments with good visualization by following the steps in main.ipynb
Notice: Detailed results are saved in scientist_figures/
.
- Run PLMSearch locally by following the example in pipeline.ipynb
Notice: the inputs and outputs of the example are saved in example/
.
Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5