Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Embed a molecule #2996

Open
MherMatevosyan opened this issue Mar 10, 2020 · 12 comments
Open

Can't Embed a molecule #2996

MherMatevosyan opened this issue Mar 10, 2020 · 12 comments

Comments

@MherMatevosyan
Copy link

  • RDKit Version: '2019.09.3'
  • Operating system: Windows, Mac
  • Python version: 3.7
  • installed rdkit with conda install -c conda-forge rdkit

Description:

It seems rdkit can't embed some molecules. Tried with enforceChirality = False, ignoreSmoothingFailures = False. Those didn't work. Could you help, please?
Example SMILES:
C=CC1=C(N)Oc2cc1c(-c1cc(C(C)O)cc(=O)cc1C1NCC(=O)N1)c(OC)c2OC
C=CC1=C(N)Oc2cc1c(C1=CC(C(O)CN)=CC(O)C=C1C1NCC(=O)N1)c(OC)c2OC

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> smiles = 'C=CC1=C(N)Oc2cc1c(-c1cc(C(C)O)cc(=O)cc1C1NCC(=O)N1)c(OC)c2OC'
>>> mol = Chem.MolFromSmiles(smiles)
>>> mol = Chem.AddHs(mol)
>>> embed_success = AllChem.EmbedMolecule(mol)
>>> embed_success
-1
>>> AllChem.MMFFOptimizeMolecule(mol, maxIters=500)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Bad Conformer Id
@iiitmjay
Copy link

@MherMatevosyan
Hi, I think you should put the last statement in try-except to get rid of the above problem.
try : AllChem.MMFFOptimizeMolecule(mol, maxIters=500) except : pass

@iiitmjay
Copy link

@MherMatevosyan
Be sure to have correct indentation before proceeding to the above mention fix.

@MherMatevosyan
Copy link
Author

@iiitmjay, thank you for your suggestion, but I need to use the energy minimized version in other calculations, so I would like to understand if those molecules are even possible to embed and have their energy minimized. And if it is not possible to do, I would like to understand the underlying reason for that.

@iiitmjay
Copy link

@greglandrum Sir, if you have time then please have a look on this conversation.
Actually the error can;t be displayed when passed int try except block as suggested above but still have the following doubt.
Actually when I went through the documentation with respect to MMFFOptimizeMolecule , its written over there that the maximum iteration should be 200, but it's really so that the iteration should not exceed 200?
I am just confused that whether its just a theory or it's just implemented like that?

@greglandrum
Copy link
Member

@MherMatevosyan sorry for the slow reply on this one. Finishing up the most recent RDKit release has kept me overly busy.

As you're seeing, when AllChem.EmbedMolecule() returns -1 it means that the embedding failed; the code was unable to generate a conformer. Often when this happens the best solution is to try starting from random coordinates, but this doesn't help with your molecule:

n [6]: ps = AllChem.ETKDGv2()                                                                                                               

In [7]: ps.useRandomCoords = True                                                                                                            

In [8]: AllChem.EmbedMolecule(m,ps)                                                                                                          
Out[8]: -1

When this fails, I generally then take a close look at the molecule. Here's what you have:
image

That includes this substructure, which I believe is the source of the problem:
image

I think it's going to be difficult to find a 3D structure of that ring system that is even close to being physically reasonable. Because the RDKit's conformer generator uses rules that attempt to generate physically reasonable structures, they fail here.

We might be able to figure out how to get a conformer for things like this, but I want to check first: do you believe that these are physically reasonable molecules?

@den-run-ai
Copy link

@greglandrum is there a simpler version of 3d conformation that checks for viability of fragment structures based on some rules like it is done in the sanitizer to avoid these failures?

@greglandrum
Copy link
Member

@denfromufa I imagine that it's possible to put together a set of heuristics to recognize substructures which may be problematic, but the RDKit doesn't have anything like that built in.

@leeping
Copy link

leeping commented Nov 1, 2022

Just wanted to chime in - I recently ran into a similar problem and I found this thread from Google. The molecule is an oligosaccharide and the SMILES string is:

[C@]1(C[C@@H]([C@H](C(O1)[C@@H]([C@@H](CO)O)O)NC(=O)C)O)(C(=O)O)O[C@@H]([C@@H](O)C2O[C@@](C[C@@H]([C@H]2NC(=O)C)O)(O[C@@H]3[C@H]([C@@H](O[C@@H]([C@@H]3O)CO)O[C@H]4[C@@H]([C@H]([C@@H](O[C@@H]4CO)OC[C@@H]([C@@H](/C=C/C)O)NC(=O)C)O)O)O)C(=O)O)CO

I'm also attaching a screenshot of the notebook cells where I saw this problem. I went up to 100,000 attempts and still no luck.

The GLYCAM carbohydrate builder has no problem with building a structure, I've attached that as well. Maybe that's the solution I should've tried in the beginning, but my sense was that RDKit can routinely embed molecules bigger than this one.

image
GLYCAM-structures.zip

@jasondbiggs
Copy link
Contributor

@leeping - what happened after you followed the advice given in the error message and added explicit hydrogen atoms?

@greglandrum
Copy link
Member

@leeping , as @jasondbiggs implied: you should be adding Hs to molecules before attempting to embed them. It would be good to know if you still see the problem then

@leeping
Copy link

leeping commented Nov 2, 2022

Hi Jason and Greg,

Thanks for the advice - calling AddHs() before attempting to embed resolved the issue for that molecule. My apologies for not checking.

That molecule is actually a truncated version of the original that also included two lipid tails. I had previously confirmed that AddHs() did not resolve the issue for the larger molecule, but I neglected to check it for the truncated version. If the failure for the larger molecule is of interest, I'm including the screenshot and SMILES string here:

(This is with RDKit version 2022.03.5).

from rdkit import Chem
from rdkit.Chem import AllChem
m1 = Chem.MolFromSmiles('[C@]1(C[C@@H]([C@H](C(O1)[C@@H]([C@@H](CO)O)O)NC(=O)C)O)(C(=O)O)O[C@@H]([C@@H](O)C2O[C@@](C[C@@H]([C@H]2NC(=O)C)O)(O[C@@H]3[C@H]([C@@H](O[C@@H]([C@@H]3O)CO)O[C@H]4[C@@H]([C@H]([C@@H](O[C@@H]4CO)OC[C@@H]([C@@H](/C=C/CCCCCCCCCCCCC)O)NC(=O)CCCCCCCCCCCCCCC)O)O)O)C(=O)O)CO')
m1h = Chem.AddHs(m1)
AllChem.EmbedMolecule(m1h,maxAttempts=100000)

image

@stefdoerr
Copy link

One more example case where rdkit fails at embedding which is less extreme than the above molecules. Surprisingly if you remove the hydrogens it works (so if you run the EmbedMolecule on m1 instead of m1h) and you can add the hydrogens back in afterwards but I assume that's not very correct.

from rdkit import Chem
from rdkit.Chem import AllChem
m1 = Chem.MolFromSmiles('C1CC2CCC21')
m1h = Chem.AddHs(m1)
AllChem.EmbedMolecule(m1h, maxAttempts=100000)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants