Skip to content

A protein language model for learning the SARS-CoV-2 fitness landscape

License

Notifications You must be signed in to change notification settings

pengsihua2023/CoVFit

 
 

Repository files navigation

CoVFit

A protein language model for accurately predicting the fitness of SARS-CoV-2 variants

Summary

Continually emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated spreading potential (i.e., fitness). This repository hosts the software and code of CoVFit, a protein language model able to predict the fitness of newly emerged variants based solely on their spike protein sequences.

Contents

CoVFit-CLI

  • The CoVFit models can be used as a stand-alone command line tool to predict the viral fitness of SARS-CoV-2 spike protein sequences in fasta format. Download link and instructions are available here.

CoVFit Training

  • Training
  • Figures
  • Dataset

Domain Adaptation

  • Code used for performing domain adaptation to produce the ESM2coronaviridae model and a download link for the model itself are available here.

ML Comparison Models

  • Description
  • Figures

Citation

A Protein Language Model for Exploring Viral Fitness Landscapes. Jumpei Ito, Adam Strange, Wei Liu, Gustav Joas, Spyros Lytras, The Genotype to Phenotype Japan (G2P-Japan) Consortium, Kei Sato. 2024. bioRxiv https://doi.org/10.1101/2024.03.15.584819

Contact

jampei@g.ecc.u-tokyo.ac.jp (Jumpei Ito)

About

A protein language model for learning the SARS-CoV-2 fitness landscape

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.5%
  • Python 4.7%
  • Other 0.8%