Getting a consistent set of conformers from EmbedMultipleConfs #9187
Replies: 2 comments 1 reply
-
|
It's not an issue. Conformer generation using distance geometry is a stochastic process and the only way to get the same conformer out of it is by specifying the random number seed. If you generate multiple conformers and do not do RMS pruning, it's possible to generate individual conformers in later runs if you know their IDs. Here's an example of re-generating conformer 487 from 1000: The output of this for me is: You should get roughly the same result. Unfortunately, at the moment there's no way to do this if you have done RMS pruning. I think it should be straightforward to add the option to allow it though. |
Beta Was this translation helpful? Give feedback.
-
|
I don't want to find the exact conformer in the ensemble. I just want to be confident that if I generate 2 ensembles from the same molecule I get roughly the same conformations in each, such that every conformer in ensemble 1 has a conformer in ensemble 2 with a shape tanimoto of >0.95 or so and vice versa. I was hoping that if I generated enough conformations that would be the case, but it appears not to be. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm having problems getting a consistent set of conformers from EmbedMultipleConfs. I understand that as a distance geometry algorithm it is subject to the vagaries of random numbers, but I was expecting that if I generated enough conformers I would at least in general get similar ensembles from run to run. This doesn't seem to be the case.
This is a colab notebook showing what I get.
https://colab.research.google.com/drive/1bSV_kz-sVWcDfWAYkZdJg-JFH-iCm9YL?usp=sharing
I start with the PubChem conformation of osimertinib and generate 1000 conformations, pruning at 0.5 RMS. I then find the conformation with the lowest RMS to the input conformation, and also find the largest shape match using the PubChem shape overlay code. I repeat this 5 times, using as a start the conformer that with the best RMS to the previous starting point, apart from in the first round. Thus, after the first round, it is always generating conformations from one it generated on the previous round. As the notebook shows, there isn't a conformation generated that is a close match to the input structure, either by RMS or shape. This is a problem for my use case, which is searching databases of conformations of molecules by shape - the results will vary every time I generate a new database, and I run the risk of compounds not finding themselves because the conformations in the database don't resemble one generated in a different run of the embedding code.
Is there are way of ensuring consistency of conformation ensemble from run to run? Obviously I could set the random number seed each time, but that's just hiding the issue.
Thanks,
Dave
Beta Was this translation helpful? Give feedback.
All reactions