Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding pretraining #6

Closed
paulrbuckley opened this issue Apr 12, 2022 · 1 comment
Closed

Question regarding pretraining #6

paulrbuckley opened this issue Apr 12, 2022 · 1 comment

Comments

@paulrbuckley
Copy link

Hi,

Thanks for your help with the python version issue.

I had a question about a workflow for training models from 'scratch'. From what I gather, the 'flexible_training.py' script permits training of a TITAN model. Does this script pretrain the model on BindingDB? I assume that from there, one would finetune this model on TCR sequence data and epitope data of choice - e.g., using the semi_frozen_finetuning.py script?

Best,

Paul

@jannisborn
Copy link
Member

Hi @paulrbuckley,

Indeed, the flexible_training.py trains a TITAN model from scratch. Now it depends on how you use the flexible training script. You can either pass the TCR-eptope binding data to it, in that case, you omit the pretraining. Alternatively, you can use the model to pretrain a TITAN model on BindingDB and afterwards you can use the semi_frozen_finetuning.py to finetune your model on TCR-epitope binding data. As per results in our paper, this should give the best results. Just keep in mind that we did not release the preprocessed BindingDB data for this paper. If you want to re-do this step, I recommend looking at our related paper in the Journal of Chemical Information & Modeling and its codebase: https://github.com/PaccMann/paccmann_kinase_binding_residues
If you follow the Box link there, you can access processed BindingDB data that should be quite straightforward to be fed to the flexible training script.

Depends on your needs, but it might be easier to start from the pretrained model that we provide rather than re-doing the pretraining on BindingDB. Also keep in mind to keep a low LR for finetuning otherwise you might induce catastrophic interference and overfit on your TCR data. Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants