Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we preprocess .pdbqt files before using RTMScore? #8

Open
zzzzzx-1115 opened this issue Oct 3, 2022 · 6 comments
Open

Comments

@zzzzzx-1115
Copy link

We are now considering using RTMScore to rank differenct results given by AutoDock Vina (the dataset is PDBBind). Unfortunately, the output of Vina is always .pdbqt file, which cannot be directly treated as input of RTMScore.

So could you please give us advice on what we should do? We have already used Open Babel to convert .pdbqt to .sdf, but there are a lot of confusing bugs......

Thank you!

@sc8668
Copy link
Owner

sc8668 commented Oct 6, 2022

In this study we used Open Babel to convert .pdbqt to .sdf as well, and the molecules failing in conversion are just skipped. Another strategy is that you can just record the 3D coordinates of the docking poses, and then update the corrdinates of the input molecules (.sdf or .mol2) with the newly-generated ones.
I hope these suggestions will help you!

@zzzzzx-1115
Copy link
Author

In this study we used Open Babel to convert .pdbqt to .sdf as well, and the molecules failing in conversion are just skipped. Another strategy is that you can just record the 3D coordinates of the docking poses, and then update the corrdinates of the input molecules (.sdf or .mol2) with the newly-generated ones. I hope these suggestions will help you!

That strategy sounds really cool! We have run several examples by replacing coordinates of the reference molecule with newly-generated ones, but it seems that we are still unable to pick the molecule matching the reference best, which should have got the highest score. Our setup is discribed as follows:

We first use rdkit.Chem.RemoveHs() to remove hydrogen atoms in both the generate molecule and the reference molecule (because the former has no implict hydrogen atoms discribed in its .sdf file but the latter has), and then update the corrdinates in the reference file according to those in the generated one, finally use rdkit.Chem.addHs(addCoords=True) to add hydrogen atoms back. But the results are not very satisfying. Is the last step necessary? Or is our approach in line with your suggestion?

@sc8668
Copy link
Owner

sc8668 commented Oct 7, 2022

To my understanding, you have successfully rescored the molecules with RTMScore, but they could not obtain the satisfactory results just using RTMScore for rescoring. It should be noticed that our method just exhibit excellent docking and screening powers rather than scoring and ranking powers. Additionally, the performance is evaluated in terms of overall statistics, and it is just common to see the bad performance of our method for some targets.

@zzzzzx-1115
Copy link
Author

Sorry I did not introduce the background clearly so probably you misunderstood our purpose...

We feed a protein-ligand pair (e.g. 1t7j_protein.pdb and 1t7j_ligand.mol2) into AutoDock Vina and get different binding poses (ligand_out_1t7j.sdf, converted from .pdbqt) for only this one pair. In this scenario RTMScore is supposed to give the highest score to the binding pose which matches the real one best among all the output ones, but we failed to do that.

I would appreciate it if you would spare some time to help us check what led to our failure. The attachment is the example mentioned above (1t7j), and the problem is that the highest ranked one (the 331st in ligand_out_1t7j.sdf) is obviously worse than the 15th one (we visualize them via PyMol btw).
1t7j.zip

@sc8668
Copy link
Owner

sc8668 commented Oct 7, 2022

The following are the results generated with the command "python rtmscore.py -p 1t7j_protein.pdb -l ligand_out_1t7j.sdf -m ../trained_models/rtmscore_model1.pth -o xxxqq -c 10.0 -rl 1t7j_ligand.mol2 -gen_pocket" just based on your file, and here we can successfully identify the near-native poses. Are you doing anything right?

xxxqq.csv

@zzzzzx-1115
Copy link
Author

Right, we finally find that using rdkit to read molecule files with sanitizing disabled will lead to this unexpected situation... Thanks to your patient replies, everything is ok now.

I am very grateful for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants