Skip to content

liseda-lab/genAAV

Repository files navigation

Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

This repository contains the code accompanying the arXiv submission of the paper:

“Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids”

It includes two sets of scripts, each with one or more software environments.


Repository overview

Set 1: Core analysis scripts

This set contains the main scripts used for data processing, analysis, and visualization:

  • 01_preliminaryAnalysis_and_mutationalLandscape.py – Preliminary data processing and mutation landscape analysis
  • 02_distanceFinetunedSequences.py – Calculates novelty scores for all fine-tuned generated sequences
  • 03_plottingNovelty.py – Analyzes a set of randomly chosen sequences and generates the plots presented in the paper
  • 04_biophysicalAnalysis.py – Implements polarity and charge filter analysis and the grid-based selection strategy

Environment for Set 1: environment1.yml


Set 2: Model training and sequence generation scripts

This set contains scripts used for model training and sequence generation:

  • 05_supervisedFineTuning.py – Supervised fine-tuning of ProGen2-small on viable AAV sequences, with masked loss on the generated middle region
  • 06_rlTraining_viabilityDiversity.py – Reinforcement learning with viability and reference-diversity rewards (KL-regularized policy updates)

Environments for Set 2: requirements2.yml and requirements3.yml


Note

All scripts are provided as used by the authors. Users may need to adapt them to their own data or setup.
For questions or issues, please contact Ana Filipa Rodrigues (afdrodrigues@fc.ul.pt).


Authors

  • Lucas Ferraz
  • Ana Filipa Rodrigues
  • Pedro Giesteira Cotovio
  • Mafalda Ventura
  • Gabriela M. Silva
  • Ana Sofia Coroadinha
  • Miguel Machuqueiro
  • Cátia Pesquita

Acknowledgments

This work was supported by FCT – Fundação para a Ciência e Tecnologia, I.P. under the LASIGE Research Unit, ref. UID/00408/2025, DOI: https://doi.org/10.54499/UID/00408/2025, and partially supported by project 41, HfPT: Health from Portugal, funded by the Portuguese Plano de Recuperação e Resiliência.

It was also partially supported by the CancerScan project, which received funding from the European Union’s Horizon Europe Research and Innovation Action (EIC Pathfinder Open) under grant agreement No. 101186829.

Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Innovation Council and SMEs Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages