-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenEye Toolkit 2023.1.0 fails to assign am1bccelf10 charges that work in 2022.1.1 #1736
Comments
It looks like the error stems from conformer generation, not charge assignment per se. I wonder if any of these settings could be fiddled with? The toolkit could do a better job reporting what went wrong (or maybe Omega doesn't provide detail)? In [6]: mol = Molecule.from_mapped_smiles("[H:16][c:1]1[c:3]([c:7]([c:11]([c:8]([c:4]1[H:19])[H:23])[C:14]([H:29])([H:30])[N@:1
...: 5]([c:12]2[c:9]([c:5]([c:2]([c:6]([c:10]2[H:25])[H:21])[H:17])[H:20])[H:24])[C:13]([H:26])([H:27])[H:28])[H:22])[H:18]"
...: )
In [7]: from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper
...: oe = OpenEyeToolkitWrapper()
...: oe.assign_partial_charges(mol, partial_charge_method="am1bccelf10")
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
---------------------------------------------------------------------------
ConformerGenerationError Traceback (most recent call last)
Cell In[7], line 3
1 from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper
2 oe = OpenEyeToolkitWrapper()
----> 3 oe.assign_partial_charges(mol, partial_charge_method="am1bccelf10")
File ~/mambaforge/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/openeye_wrapper.py:2420, in OpenEyeToolkitWrapper.assign_partial_charges(self, molecule, partial_charge_method, use_conformers, strict_n_conformers, normalize_partial_charges, _cls)
2418 mol_copy._conformers = None
2419 else:
-> 2420 self.generate_conformers(
2421 mol_copy,
2422 n_conformers=charge_method["rec_confs"],
2423 rms_cutoff=0.25 * unit.angstrom,
2424 make_carboxylic_acids_cis=True,
2425 )
2426 # TODO: What's a "best practice" RMS cutoff to use here?
2427 else:
2428 mol_copy._conformers = None
File ~/mambaforge/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/openeye_wrapper.py:2182, in OpenEyeToolkitWrapper.generate_conformers(self, molecule, n_conformers, rms_cutoff, clear_existing, make_carboxylic_acids_cis)
2180 new_status = omega(oemol)
2181 if new_status is False:
-> 2182 raise ConformerGenerationError(
2183 "OpenEye Omega conformer generation failed"
2184 )
2186 molecule2 = self.from_openeye(
2187 oemol, allow_undefined_stereo=True, _cls=molecule.__class__
2188 )
2190 if clear_existing:
ConformerGenerationError: OpenEye Omega conformer generation failed |
Ah yes, I should have mentioned that in the issue! The root problem is indeed the Omega conformer generation. I also noticed that despite there being 52 records, there are only 13 unique molecules. So a smaller, complete test set is here:
|
I tried adjusting some of these settings to no avail. I tried slapping some zeros on things ( |
I don't actually know what the warning means, or how severe it is in OpenEye's jargon
except that CT is the connection table |
Glancing at this quickly I don't see anything hugely obvious. My remaining possibilities are roughly 50% this being a representation/molecule sanitization issue with us that OE has started policing (maybe we're doing something bad in If we can make a reproducing case with an un-mapped version of one of these SMILES using pure OpenEye code, then it'll both be clear that it's a regression on their side, and we'll have a repro example we can immediately send to them. I don't have time to do this today but if either of you do, this is the route I'd recommend for debugging! |
Oh, actually, I take back the CT thing. It probably does mean connection table in this context. |
Here are the un-mapped SMILES. I'll try to reproduce with just OpenEye code.
|
This fails with 2023.1.0 and works with 2022.1.1: from openeye import oechem, oeomega
smiles = "[H]C1(C2=C(N=C(N2C([H])([H])[H])N3C(C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[N@](C([N@]1C([H])([H])[H])([H])[H])C([H])([H])[H])[H]"
# steps taken from
# https://docs.eyesopen.com/toolkits/python/oechemtk/molctordtor.html#construction-from-smiles
mol = oechem.OEMol()
assert oechem.OESmilesToMol(mol, smiles)
omega = oeomega.OEOmega()
assert omega(mol) |
And checking all of them with the SMILES pasted into from openeye import oechem, oeomega
win = 0
with open('smiles.dat') as inp:
for smiles in inp:
mol = oechem.OEMol()
assert oechem.OESmilesToMol(mol, smiles[:-1])
omega = oeomega.OEOmega()
win += omega(mol)
print(win) prints 13 for 2022.1.1 and 0 for 2023.1.0. |
It looks like each molecule contains a chiral nitrogen bound to an aromatic ring. Most of these are bound to the carbon between nitrogens in pyrimidine, but this one doesn't have that
I'd expect much more than 13 structures (of a few thousand?) to have a nitrogen near a ring, at least with my experience in these datasets I get the impression that aromatic rings are ubiquitous and chiral nitrogens are common |
I'm still running the benchmark to see if this is the root cause of the differences I've observed between the original Sage 2.1.0 fit and my attempt to reproduce it. I expected there to be more than 13 molecules too, but those are the only records I found in the opt-set-for-fitting-2.1.0.json in the Sage-2.1.0 repo. There are 5580 records therein, but only 1701 when converting the list of This must not account for all of the difference I've seen because I have 23 fewer opt-geo batches than the Sage 2.1.0 repo, which should contain ~600 structures total. But this conformer generation error definitely prevented me from running ForceBalance on the Sage 2.1.0 inputs with my environment. |
I wonder if this is the issue. "Which nitrogens are chiral?" is an open question and different cheminformatics toolkits don't always agree. In particular, some recent omega releasenotes say:
So maybe it's that Omega is getting strict here - for example The releasenotes mention that we can set the (ominously-named) |
I'm not sure if I'm using the from openeye import oechem, oeomega
win = 0
with open("smiles.dat") as inp:
for smiles in inp:
mol = oechem.OEMol()
assert oechem.OESmilesToMol(mol, smiles[:-1])
builder = oeomega.OEMolBuilder()
options = oeomega.OEMolBuilderOptions()
options.SetIgnoreStereo(True)
builder.SetOptions(options)
assert builder.Build(mol) == oeomega.OEOmegaReturnCode_Success
omega = oeomega.OEOmega()
assert omega.Build(mol) == oeomega.OEOmegaReturnCode_FailedCTBuild
win += omega(mol)
print(win) Edit: This fails for every conformer in the 2023 version too. Sorry I forgot to mention that originally! I brought up the 2022 performance to show that I was using new code ( I have also now tried the latest release mamba create -n openeye-2023.1.1 -c openeye 'openeye-toolkits=2023.1.1' openff-qcsubmit with the same output:
|
I wonder if this is an OpenEye support issue; if the OpenEye toolkit previously could generate conformers for these but now it cannot, is that on their end? |
I updated my last comment with some additional details and just sent an email to OpenEye support! |
Should I close this? Based on the response from OpenEye support, it sounds like a known/expected change from their perspective. And unless we want to propagate the I'm looking through some of my opened issues today and thought I would close some if they are no longer relevant. I'll leave it if it's useful to have around, though. |
I think so; if we're not likely to do anything about it and if it's too esoteric to document here, there is no reason to keep this open |
Describe the bug
The newer version of
openeye-toolkits
fails to assign partial charges where the previous version did so successfully. The core of the attached Python script iswhere
mol
is anopenff.toolkit.Molecule
.To Reproduce
Create two conda environments:
Run the included example.py script with both environments
Output
The first one will print
52 succeeded, 0 failed
, while the second will print0 succeeded, 52 failed
.Additional context
For an even smaller test set, the 2 SMILES below fail to charge in the new version but charge in the old version:
And these two charge successfully in both cases:
repro.zip
The text was updated successfully, but these errors were encountered: