Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

This repository contains the code accompanying the arXiv submission of the paper:

“Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids”

It includes two sets of scripts, each with one or more software environments.

Repository overview

Set 1: Core analysis scripts

This set contains the main scripts used for data processing, analysis, and visualization:

01_preliminaryAnalysis_and_mutationalLandscape.py – Preliminary data processing and mutation landscape analysis
02_distanceFinetunedSequences.py – Calculates novelty scores for all fine-tuned generated sequences
03_plottingNovelty.py – Analyzes a set of randomly chosen sequences and generates the plots presented in the paper
04_biophysicalAnalysis.py – Implements polarity and charge filter analysis and the grid-based selection strategy

Environment for Set 1: environment1.yml

Set 2: Model training and sequence generation scripts

This set contains scripts used for model training and sequence generation:

05_supervisedFineTuning.py – Supervised fine-tuning of ProGen2-small on viable AAV sequences, with masked loss on the generated middle region
06_rlTraining_viabilityDiversity.py – Reinforcement learning with viability and reference-diversity rewards (KL-regularized policy updates)

Environments for Set 2: requirements2.yml and requirements3.yml

Note

All scripts are provided as used by the authors. Users may need to adapt them to their own data or setup.
For questions or issues, please contact Ana Filipa Rodrigues (afdrodrigues@fc.ul.pt).

Authors

Lucas Ferraz
Ana Filipa Rodrigues
Pedro Giesteira Cotovio
Mafalda Ventura
Gabriela M. Silva
Ana Sofia Coroadinha
Miguel Machuqueiro
Cátia Pesquita

Acknowledgments

This work was supported by FCT – Fundação para a Ciência e Tecnologia, I.P. under the LASIGE Research Unit, ref. UID/00408/2025, DOI: https://doi.org/10.54499/UID/00408/2025, and partially supported by project 41, HfPT: Health from Portugal, funded by the Portuguese Plano de Recuperação e Resiliência.

It was also partially supported by the CancerScan project, which received funding from the European Union’s Horizon Europe Research and Innovation Action (EIC Pathfinder Open) under grant agreement No. 101186829.

Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Innovation Council and SMEs Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

Repository overview

Set 1: Core analysis scripts

Set 2: Model training and sequence generation scripts

Note

Authors

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01_preliminaryAnalysis_and_mutationalLandscape.py		01_preliminaryAnalysis_and_mutationalLandscape.py
02_distanceFinetunedSequences.py		02_distanceFinetunedSequences.py
03_plottingNovelty.py		03_plottingNovelty.py
04_biophysicalAnalysis.py		04_biophysicalAnalysis.py
05_supervisedFineTuning.py		05_supervisedFineTuning.py
06_rlTraining_viabilityDiversity.py		06_rlTraining_viabilityDiversity.py
README.md		README.md
requirements1.yml		requirements1.yml
requirements2.yml		requirements2.yml
requirements3.yml		requirements3.yml

Folders and files

Latest commit

History

Repository files navigation

Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

Repository overview

Set 1: Core analysis scripts

Set 2: Model training and sequence generation scripts

Note

Authors

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages