A protein language model for accurately predicting the fitness of SARS-CoV-2 variants
Continually emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated spreading potential (i.e., fitness). This repository hosts the software and code of CoVFit, a protein language model able to predict the fitness of newly emerged variants based solely on their spike protein sequences.
- The CoVFit models can be used as a stand-alone command line tool to predict the viral fitness of SARS-CoV-2 spike protein sequences in fasta format. Download link and instructions are available here.
- Training
- Figures
- Dataset
- Code used for performing domain adaptation to produce the ESM2coronaviridae model and a download link for the model itself are available here.
- Description
- Figures
A Protein Language Model for Exploring Viral Fitness Landscapes. Jumpei Ito, Adam Strange, Wei Liu, Gustav Joas, Spyros Lytras, The Genotype to Phenotype Japan (G2P-Japan) Consortium, Kei Sato. 2024. bioRxiv https://doi.org/10.1101/2024.03.15.584819
jampei@g.ecc.u-tokyo.ac.jp (Jumpei Ito)