Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenEye Toolkit 2023.1.0 fails to assign am1bccelf10 charges that work in 2022.1.1 #1736

Closed
ntBre opened this issue Oct 5, 2023 · 17 comments

Comments

@ntBre
Copy link
Contributor

ntBre commented Oct 5, 2023

Describe the bug
The newer version of openeye-toolkits fails to assign partial charges where the previous version did so successfully. The core of the attached Python script is

from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper
oe = OpenEyeToolkitWrapper()
oe.assign_partial_charges(mol, partial_charge_method="am1bccelf10")

where mol is an openff.toolkit.Molecule.

To Reproduce
Create two conda environments:

mamba create -n openeye-2022.1.1 -c openeye 'openeye-toolkits=2022.1.1' openff-qcsubmit
mamba create -n openeye-2023.1.0 -c openeye 'openeye-toolkits=2023.1.0' openff-qcsubmit

Run the included example.py script with both environments

mamba run -n openeye-2022.1.1 --no-capture-output python example.py
mamba run -n openeye-2023.1.0 --no-capture-output python example.py

Output
The first one will print 52 succeeded, 0 failed, while the second will print 0 succeeded, 52 failed.

Additional context
For an even smaller test set, the 2 SMILES below fail to charge in the new version but charge in the old version:

[H:11][c:1]1[c:2]([c:4]([n:9][c:5]([c:3]1[H:13])[N@:10]([C:7]([H:18])([H:19])[H:20])[C:8]([H:21])([H:22])[C:6]([H:15])([H:16])[H:17])[H:14])[H:12]
[H:16][c:1]1[c:3]([c:7]([c:11]([c:8]([c:4]1[H:19])[H:23])[C:14]([H:29])([H:30])[N@:15]([c:12]2[c:9]([c:5]([c:2]([c:6]([c:10]2[H:25])[H:21])[H:17])[H:20])[H:24])[C:13]([H:26])([H:27])[H:28])[H:22])[H:18]

And these two charge successfully in both cases:

[H:20][c:1]1[c:2]([c:8]([c:14]([c:9]([c:3]1[H:22])[H:28])[c:15]2[c:10]([c:4]([c:5]([c:11]([c:16]2[c:17]3[c:12]([c:6]([c:7]([c:13]([c:18]3[C:19]([H:33])([H:34])[H:35])[H:32])[H:26])[H:25])[H:31])[H:30])[H:24])[H:23])[H:29])[H:27])[H:21]
[H:17][c:1]1[c:2]([c:5]([c:9]([c:8]([c:4]1[H:20])[C:10]2=[C:6]([C:3](=[C:7]([C:11](=[O:15])[N:14]2[C:12]([H:24])([H:25])[H:26])[H:23])[H:19])[H:22])[O:16][C:13]([H:27])([H:28])[H:29])[H:21])[H:18]

repro.zip

@mattwthompson
Copy link
Member

It looks like the error stems from conformer generation, not charge assignment per se. I wonder if any of these settings could be fiddled with? The toolkit could do a better job reporting what went wrong (or maybe Omega doesn't provide detail)?

In [6]: mol = Molecule.from_mapped_smiles("[H:16][c:1]1[c:3]([c:7]([c:11]([c:8]([c:4]1[H:19])[H:23])[C:14]([H:29])([H:30])[N@:1
   ...: 5]([c:12]2[c:9]([c:5]([c:2]([c:6]([c:10]2[H:25])[H:21])[H:17])[H:20])[H:24])[C:13]([H:26])([H:27])[H:28])[H:22])[H:18]"
   ...: )

In [7]: from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper
   ...: oe = OpenEyeToolkitWrapper()
   ...: oe.assign_partial_charges(mol, partial_charge_method="am1bccelf10")
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
---------------------------------------------------------------------------
ConformerGenerationError                  Traceback (most recent call last)
Cell In[7], line 3
      1 from openff.toolkit.utils.toolkits import OpenEyeToolkitWrapper
      2 oe = OpenEyeToolkitWrapper()
----> 3 oe.assign_partial_charges(mol, partial_charge_method="am1bccelf10")

File ~/mambaforge/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/openeye_wrapper.py:2420, in OpenEyeToolkitWrapper.assign_partial_charges(self, molecule, partial_charge_method, use_conformers, strict_n_conformers, normalize_partial_charges, _cls)
   2418         mol_copy._conformers = None
   2419     else:
-> 2420         self.generate_conformers(
   2421             mol_copy,
   2422             n_conformers=charge_method["rec_confs"],
   2423             rms_cutoff=0.25 * unit.angstrom,
   2424             make_carboxylic_acids_cis=True,
   2425         )
   2426         # TODO: What's a "best practice" RMS cutoff to use here?
   2427 else:
   2428     mol_copy._conformers = None

File ~/mambaforge/envs/openff-interchange-env/lib/python3.11/site-packages/openff/toolkit/utils/openeye_wrapper.py:2182, in OpenEyeToolkitWrapper.generate_conformers(self, molecule, n_conformers, rms_cutoff, clear_existing, make_carboxylic_acids_cis)
   2180     new_status = omega(oemol)
   2181     if new_status is False:
-> 2182         raise ConformerGenerationError(
   2183             "OpenEye Omega conformer generation failed"
   2184         )
   2186 molecule2 = self.from_openeye(
   2187     oemol, allow_undefined_stereo=True, _cls=molecule.__class__
   2188 )
   2190 if clear_existing:

ConformerGenerationError: OpenEye Omega conformer generation failed

@ntBre
Copy link
Contributor Author

ntBre commented Oct 5, 2023

Ah yes, I should have mentioned that in the issue! The root problem is indeed the Omega conformer generation.

I also noticed that despite there being 52 records, there are only 13 unique molecules. So a smaller, complete test set is here:

[H:19][C:4]1([C:1]2=[C:2]([N:14]=[C:3]([N:15]2[C:11]([H:33])([H:34])[H:35])[N:17]3[C:8]([C:6]([C:5]([C:7]([C:9]3([H:29])[H:30])([H:25])[H:26])([H:21])[H:22])([H:23])[H:24])([H:27])[H:28])[N@:16]([C:10]([N@:18]1[C:13]([H:39])([H:40])[H:41])([H:31])[H:32])[C:12]([H:36])([H:37])[H:38])[H:20]
[H:19][C:4]1([C:1]2=[C:2]([N:13]=[C:3]([N:14]2[C:10]([H:31])([H:32])[H:33])[N:17]3[C:5]([C:7]([N:18]([C:8]([C:6]3([H:23])[H:24])([H:27])[H:28])[C:12]([H:37])([H:38])[H:39])([H:25])[H:26])([H:21])[H:22])[N@:16]([C:9]([N:15]1[H:40])([H:29])[H:30])[C:11]([H:34])([H:35])[H:36])[H:20]
[H:18][c:1]1[c:2]([n:14][c:4]([c:3]([n:13]1)[N@:15]2[C:5]([C@@:7]([C@:8]([C:6]2([H:22])[H:23])([C:10]([H:28])([H:29])[H:30])[O:17][H:37])([H:24])[C:9]([H:25])([H:26])[H:27])([H:20])[H:21])[N:16]([C:11]([H:31])([H:32])[H:33])[C:12]([H:34])([H:35])[H:36])[H:19]
[H:26][c:2]1[c:4]([c:3]([c:6]([n:22][c:5]1[C:19]([H:47])([H:48])[H:49])[N@@:23]2[C:15]([C@@:17]3([C:7](=[O:25])[N:24]([C:13]([C:8]([C:11]3([H:33])[H:34])([H:27])[H:28])([H:37])[H:38])[C:20]([H:50])([H:51])[C:16]4([C:9]([C:10]4([H:31])[H:32])([H:29])[H:30])[H:43])[C:12]([C:14]2([H:39])[H:40])([H:35])[H:36])([H:41])[H:42])[C:1]#[N:21])[C:18]([H:44])([H:45])[H:46]
[H:21][c:1]1[n:14][c:7]([c:2]2[c:8]([n:15]1)[O:20][N:16]=[C:4]2[C:9]([H:22])([H:23])[H:24])[N@:18]([C:12]([H:31])([H:32])[H:33])[C:13]([H:34])([H:35])[C:3]3=[C:6]([O:19][N:17]=[C:5]3[C:10]([H:25])([H:26])[H:27])[C:11]([H:28])([H:29])[H:30]
[H:23][c:1]1[c:2]([c:4]([n:16][c:5]([c:3]1[H:25])[N@:17]2[C:12]([C@:13]3([C:6](=[O:19])[N:18]([C:10]([C:7]([C:8]3([H:28])[H:29])([H:26])[H:27])([H:32])[H:33])[C:14]([H:38])([H:39])[H:40])[C:9]([C:11]2([H:34])[H:35])([H:30])[H:31])([H:36])[H:37])[C:15]([F:20])([F:21])[F:22])[H:24]
[H:26][c:2]1[c:3]([c:6]([n:22][c:5]([c:4]1[C:18]([H:44])([H:45])[H:46])[C:19]([H:47])([H:48])[H:49])[N@@:23]2[C:15]([C@@:17]3([C:7](=[O:25])[N:24]([C:13]([C:8]([C:11]3([H:33])[H:34])([H:27])[H:28])([H:37])[H:38])[C:20]([H:50])([H:51])[C:16]4([C:9]([C:10]4([H:31])[H:32])([H:29])[H:30])[H:43])[C:12]([C:14]2([H:39])[H:40])([H:35])[H:36])([H:41])[H:42])[C:1]#[N:21]
[H:25][c:1]1[c:2]([c:8]([c:12]([c:9]([c:3]1[H:27])[H:33])[N@@:22]([C:20]([H:44])([H:45])[H:46])[C:21]([H:47])([H:48])[c:11]2[c:6]([c:4]([c:10]([c:5]([c:7]2[H:31])[H:29])[C@:18]3([C@@:19]([C:17]([C:15]([C:14]([C:16]3([H:38])[H:39])([H:34])[H:35])([H:36])[H:37])([H:40])[H:41])([H:43])[C:13](=[O:23])[O:24][H:49])[H:42])[H:28])[H:30])[H:32])[H:26]
[H:25][c:1]1[c:2]([c:8]([c:12]([c:9]([c:3]1[H:27])[H:33])[N@@:22]([C:20]([H:44])([H:45])[H:46])[C:21]([H:47])([H:48])[c:11]2[c:6]([c:4]([c:10]([c:5]([c:7]2[H:31])[H:29])[C@:18]3([C@@:19]([C:17]([C:15]([C:14]([C:16]3([H:38])[H:39])([H:34])[H:35])([H:36])[H:37])([H:40])[H:41])([H:43])[C:13](=[O:24])[O-:23])[H:42])[H:28])[H:30])[H:32])[H:26]
[H:20][c:1]1[c:3]([n:16][c:7]([c:5]2[c:6]1[O:19][C:4](=[C:2]2[H:21])[H:23])[N@@:18]([C:12]([H:32])([H:33])[H:34])[C:15]([H:39])([H:40])[C:13]([H:35])([H:36])[C:14]([H:37])([H:38])[N:17]3[C:10]([C:8]([C:9]([C:11]3([H:30])[H:31])([H:26])[H:27])([H:24])[H:25])([H:28])[H:29])[H:22]
[H:20][c:1]1[c:3]([n:16][c:7]([c:5]2[c:6]1[O:19][C:4](=[C:2]2[H:21])[H:23])[N@@:17]([C:12]([H:32])([H:33])[H:34])[C:14]([H:37])([H:38])[C:13]([H:35])([H:36])[C:15]([H:39])([H:40])[N+:18]3([C:10]([C:8]([C:9]([C:11]3([H:30])[H:31])([H:26])[H:27])([H:24])[H:25])([H:28])[H:29])[H:41])[H:22]
[H:11][c:1]1[c:2]([c:4]([n:9][c:5]([c:3]1[H:13])[N@:10]([C:7]([H:18])([H:19])[H:20])[C:8]([H:21])([H:22])[C:6]([H:15])([H:16])[H:17])[H:14])[H:12]
[H:16][c:1]1[c:3]([c:7]([c:11]([c:8]([c:4]1[H:19])[H:23])[C:14]([H:29])([H:30])[N@:15]([c:12]2[c:9]([c:5]([c:2]([c:6]([c:10]2[H:25])[H:21])[H:17])[H:20])[H:24])[C:13]([H:26])([H:27])[H:28])[H:22])[H:18]

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

I tried adjusting some of these settings to no avail. I tried slapping some zeros on things (SetEnergyWindow(1500.0), SetRMSThreshold(100.0)), setting both of these options to zero, and even commenting out all of the omega.* calls hoping to get the default values. It appears Omega just returns a bool saying whether or not it succeeded, so there's not much more that can be done in the toolkit.

@mattwthompson
Copy link
Member

I don't actually know what the warning means, or how severe it is in OpenEye's jargon

Warning: : Failed to build structure from CT

except that CT is the connection table

@j-wags
Copy link
Member

j-wags commented Oct 6, 2023

CT probably means "carbon, tetrahedral" or something like that. I see it a lot in atom-typing-land. (Whoops, I spoke too quickly. Disregard this part but the rest of the comment stands)

Glancing at this quickly I don't see anything hugely obvious. My remaining possibilities are roughly 50% this being a representation/molecule sanitization issue with us that OE has started policing (maybe we're doing something bad in to_openeye), and 50% this being a regression in OE.

If we can make a reproducing case with an un-mapped version of one of these SMILES using pure OpenEye code, then it'll both be clear that it's a regression on their side, and we'll have a repro example we can immediately send to them. I don't have time to do this today but if either of you do, this is the route I'd recommend for debugging!

@j-wags
Copy link
Member

j-wags commented Oct 6, 2023

Oh, actually, I take back the CT thing. It probably does mean connection table in this context.

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

Here are the un-mapped SMILES. I'll try to reproduce with just OpenEye code.

[H]C1(C2=C(N=C(N2C([H])([H])[H])N3C(C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[N@](C([N@]1C([H])([H])[H])([H])[H])C([H])([H])[H])[H]
[H]C1(C2=C(N=C(N2C([H])([H])[H])N3C(C(N(C(C3([H])[H])([H])[H])C([H])([H])[H])([H])[H])([H])[H])[N@](C(N1[H])([H])[H])C([H])([H])[H])[H]
[H]c1c(nc(c(n1)[N@]2C([C@@]([C@](C2([H])[H])(C([H])([H])[H])O[H])([H])C([H])([H])[H])([H])[H])N(C([H])([H])[H])C([H])([H])[H])[H]
[H]c1c(c(c(nc1C([H])([H])[H])[N@@]2C([C@@]3(C(=O)N(C(C(C3([H])[H])([H])[H])([H])[H])C([H])([H])C4(C(C4([H])[H])([H])[H])[H])C(C2([H])[H])([H])[H])([H])[H])C#N)C([H])([H])[H]
[H]c1nc(c2c(n1)ON=C2C([H])([H])[H])[N@](C([H])([H])[H])C([H])([H])C3=C(ON=C3C([H])([H])[H])C([H])([H])[H]
[H]c1c(c(nc(c1[H])[N@]2C([C@]3(C(=O)N(C(C(C3([H])[H])([H])[H])([H])[H])C([H])([H])[H])C(C2([H])[H])([H])[H])([H])[H])C(F)(F)F)[H]
[H]c1c(c(nc(c1C([H])([H])[H])C([H])([H])[H])[N@@]2C([C@@]3(C(=O)N(C(C(C3([H])[H])([H])[H])([H])[H])C([H])([H])C4(C(C4([H])[H])([H])[H])[H])C(C2([H])[H])([H])[H])([H])[H])C#N
[H]c1c(c(c(c(c1[H])[H])[N@@](C([H])([H])[H])C([H])([H])c2c(c(c(c(c2[H])[H])[C@]3([C@@](C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])([H])C(=O)O[H])[H])[H])[H])[H])[H]
[H]c1c(c(c(c(c1[H])[H])[N@@](C([H])([H])[H])C([H])([H])c2c(c(c(c(c2[H])[H])[C@]3([C@@](C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])([H])C(=O)[O-])[H])[H])[H])[H])[H]
[H]c1c(nc(c2c1OC(=C2[H])[H])[N@@](C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])N3C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])[H]
[H]c1c(nc(c2c1OC(=C2[H])[H])[N@@](C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])[N+]3(C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])[H])[H]
[H]c1c(c(nc(c1[H])[N@](C([H])([H])[H])C([H])([H])C([H])([H])[H])[H])[H]
[H]c1c(c(c(c(c1[H])[H])C([H])([H])[N@](c2c(c(c(c(c2[H])[H])[H])[H])[H])C([H])([H])[H])[H])[H]

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

This fails with 2023.1.0 and works with 2022.1.1:

from openeye import oechem, oeomega

smiles = "[H]C1(C2=C(N=C(N2C([H])([H])[H])N3C(C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])([H])[H])[N@](C([N@]1C([H])([H])[H])([H])[H])C([H])([H])[H])[H]"

# steps taken from
# https://docs.eyesopen.com/toolkits/python/oechemtk/molctordtor.html#construction-from-smiles
mol = oechem.OEMol()
assert oechem.OESmilesToMol(mol, smiles)

omega = oeomega.OEOmega()
assert omega(mol)

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

And checking all of them with the SMILES pasted into smiles.dat:

from openeye import oechem, oeomega

win = 0
with open('smiles.dat') as inp:
    for smiles in inp:
        mol = oechem.OEMol()
        assert oechem.OESmilesToMol(mol, smiles[:-1])

        omega = oeomega.OEOmega()
        win += omega(mol)

print(win)

prints 13 for 2022.1.1 and 0 for 2023.1.0.

@mattwthompson
Copy link
Member

It looks like each molecule contains a chiral nitrogen bound to an aromatic ring. Most of these are bound to the carbon between nitrogens in pyrimidine, but this one doesn't have that

C1(=C(C(=C(C(=C1[H])[H])N(C([H])([H])[H])C([H])([H])C2=C(C(=C(C(=C2[H])[H])[C@H]3[C@H](C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])C(=O)O[H])[H])[H])[H])[H])[H]

I'd expect much more than 13 structures (of a few thousand?) to have a nitrogen near a ring, at least with my experience in these datasets I get the impression that aromatic rings are ubiquitous and chiral nitrogens are common

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

I'm still running the benchmark to see if this is the root cause of the differences I've observed between the original Sage 2.1.0 fit and my attempt to reproduce it. I expected there to be more than 13 molecules too, but those are the only records I found in the opt-set-for-fitting-2.1.0.json in the Sage-2.1.0 repo. There are 5580 records therein, but only 1701 when converting the list of record.cmiles entries to a set. So it's something like 13/1701 I guess.

This must not account for all of the difference I've seen because I have 23 fewer opt-geo batches than the Sage 2.1.0 repo, which should contain ~600 structures total. But this conformer generation error definitely prevented me from running ForceBalance on the Sage 2.1.0 inputs with my environment.

@j-wags
Copy link
Member

j-wags commented Oct 6, 2023

chiral nitrogens are common

I wonder if this is the issue. "Which nitrogens are chiral?" is an open question and different cheminformatics toolkits don't always agree. In particular, some recent omega releasenotes say:

Conformer generation with OEOmega.Build or OEMolBuilder.Build now fails correctly when specified stereo signatures on a molecule cannot be honored. The existing behavior, to carry on even if stereo signatures could not be honored, can be obtained by setting IgnoreStereo to true.

So maybe it's that Omega is getting strict here - for example N@ and N@@ must now be honored in the output, but Omega's internal FF/geometry generator for some reason always makes those planar, so it can never match either stereochemistry.

The releasenotes mention that we can set the (ominously-named) IgnoreStereo to True to recover the old behavior. Could you give that a shot?

@ntBre
Copy link
Contributor Author

ntBre commented Oct 6, 2023

I'm not sure if I'm using the Builder correctly, but this does fail to run on 2022.1.1 for AttributeError: 'OEMolBuilderOptions' object has no attribute 'SetIgnoreStereo', so that change must have occurred between the two.

from openeye import oechem, oeomega

win = 0
with open("smiles.dat") as inp:
    for smiles in inp:
        mol = oechem.OEMol()
        assert oechem.OESmilesToMol(mol, smiles[:-1])

        builder = oeomega.OEMolBuilder()
        options = oeomega.OEMolBuilderOptions()
        options.SetIgnoreStereo(True)
        builder.SetOptions(options)

        assert builder.Build(mol) == oeomega.OEOmegaReturnCode_Success

        omega = oeomega.OEOmega()
        assert omega.Build(mol) == oeomega.OEOmegaReturnCode_FailedCTBuild
        win += omega(mol)

print(win)

Edit: This fails for every conformer in the 2023 version too. Sorry I forgot to mention that originally! I brought up the 2022 performance to show that I was using new code (SetIgnoreStereo) but forgot to say that it doesn't change the 2023 result (still fails to generate conformers, as seen in the return code assertion).

I have also now tried the latest release

mamba create -n openeye-2023.1.1 -c openeye 'openeye-toolkits=2023.1.1' openff-qcsubmit

with the same output:

Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
Warning: : Failed to build structure from CT
0

@davidlmobley
Copy link
Contributor

I wonder if this is an OpenEye support issue; if the OpenEye toolkit previously could generate conformers for these but now it cannot, is that on their end?

@ntBre
Copy link
Contributor Author

ntBre commented Oct 10, 2023

I updated my last comment with some additional details and just sent an email to OpenEye support!

@ntBre
Copy link
Contributor Author

ntBre commented Feb 8, 2024

Should I close this? Based on the response from OpenEye support, it sounds like a known/expected change from their perspective. And unless we want to propagate the OEMolBuilderOptions or OEOmegaOptions in their code examples, it's not really a problem with our toolkit.

I'm looking through some of my opened issues today and thought I would close some if they are no longer relevant. I'll leave it if it's useful to have around, though.

@mattwthompson
Copy link
Member

I think so; if we're not likely to do anything about it and if it's too esoteric to document here, there is no reason to keep this open

@ntBre ntBre closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants