Skip to content

maovshao/PLMAlign

Repository files navigation

PLMAlign

This is the implement of PLMAlign, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.

Specifically, PLMAlign can achieve local and global alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.

Quick links

Webserver

PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign ✈️

PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀

PLMSearch source code : github.com/maovshao/PLMSearch 🚁

Requirements

Follow the steps in requirements.sh

Data preparation

We have released our experiment data in plmalign_data.

wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz

Reproduce all our experiments

Reproduce all our experiments with good visualization by following the steps in:

Notice: Detailed results are saved in data/alignment_benchmark/result/.

Notice: Detailed results are saved in data/scope40_test/output/.

Run PLMAlign locally

Notice: the inputs and outputs of the example are saved in example/.

Citation

Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5

About

PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published