Decoy discrimination benchmark using finetuned model

Hi Phil!

Thank you so much for all the work into creating and curating this great resource! I've been interested in leveraging TCRdock for predicting TCR binding specificity for some peptides of interest, and I've been trying to reproduce the decoy discrimination results in the paper as a first step sanity check. The main difference from the procedure described in the paper is that I used the finetuned model and the updated protocol described in #7, namely:

1. add the flag --new_docking when running setup_for_alphafold.py
2. add the flags --model_names model_2_ptm_ft4 --model_params_files /path/to/tcrpmhc_run4_af_mhc_params_891.pkl when running run_prediction.py, where that .pkl file is the fine-tuned parameter set downloaded from the dropbox link.

The reasons why I used the finetuned model are that it is more efficient to run, and supposed to generate higher quality predicted structures (which, by the results of the paper, correlates with better decoy discrimination), but I'm wondering whether you would expect decoy discrimination to generally improve with the finetuned model? I'm asking because from the initial analyses I did, the results seem to be quite different from the results in the paper, so I want to double check if it is what you would expect:

1. For the true TCR binders, the predicted score for each TCR-pMHC pair (after correcting for background TCRs for each pMHC, and mean centering for each TCR) don't have significant correlation with the `wt_binding_score` column in `datasets_from_the_paper/table_S2_specificity_benchmark_tcrs.csv`. I've limited my analyses to the 6 human peptides from the paper, and used the 50 human background TCRs in the aforementioned csv file for the pMHC-intrinsic effect correction.
2. For each peptide, the classification performance (as measured by AUROC) improves for some peptides but decreases for others, often significantly. For this evaluation, I do want to clarify: in Figure 3E of the paper, for each peptide, is the ROC ranking the scores for the 50 positive pairs (50 TCRs x 1 true peptide) vs. 450 negative pairs (the same 50 TCRs x 9 decoys), so the positive-to-negative ratio is always 1:9? If it is the case that the evaluation set is imbalanced, is the precision-recall curve also reported somewhere?

Again, thank you so much for this great resource!

Best,
Bear



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Decoy discrimination benchmark using finetuned model #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Decoy discrimination benchmark using finetuned model #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions