Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_simple_topology_from_openmm fails with RecursionError on large topologies #868

Closed
hjuinj opened this issue Jan 5, 2024 · 16 comments
Closed
Labels
bug Something isn't working openmm

Comments

@hjuinj
Copy link

hjuinj commented Jan 5, 2024

I am trying the experimental Interchange.from_openmm() feature and I noticed that topology creation for multi-chain proteins would fail withRecursionError: maximum recursion depth exceeded:

# using this two chain protein: https://files.rcsb.org/view/8U7W.pdb
from openmm.app import PDBFile

# this is called by the Interchange.from_openmm method
from openff.interchange.components.toolkit import _simple_topology_from_openmm

pdb = PDBFile("8u7w.pdb")
_simple_topology_from_openmm(pdb.topology)

The pdb above contains waters and ligand, but even after stripping them the same error manifests, so long as multiple protein chains are present.
Tracing down the errors, it ultimately is raised at _add_molecule_keep_cache function in openff.toolkit. Let me know if I should make the issue there instead.

Thank you.

@hjuinj hjuinj changed the title RecursionError: maximum recursion depth exceeded when creating topology from openmm.app.Topology RecursionError: maximum recursion depth exceeded when creating Interchange.topology from openmm.app.Topology Jan 5, 2024
@mattwthompson mattwthompson changed the title RecursionError: maximum recursion depth exceeded when creating Interchange.topology from openmm.app.Topology _simple_topology_from_openmm fails with RecursionError on large topologies Jan 5, 2024
@mattwthompson
Copy link
Member

This is a totally valid failure; I've run into it many times before but apparently never documented it. I haven't been able to figure out if the issue is here or in the toolkit, but until I can isolate the root cause let's keep it here.

@hjuinj
Copy link
Author

hjuinj commented Jan 5, 2024

Thanks, let me know how it goes.
Just wondering in the meanwhile do you have any recommendations on converting such openmm systems into e.g. prmtops instead of via interchange?

@mattwthompson
Copy link
Member

I have some ideas - could we redirect to https://github.com/orgs/openforcefield/discussions/35?

@mattwthompson
Copy link
Member

I've found what I believe to be the root cause of this issue - Topology.add_molecule relies on deepcopying. Molecule.__deepcopy__ has custom behavior, but _SimpleMoleucle does not, and instead falls back to the extremely careful copy.deepcopy. This will require significant overhauls to the toolkit's behavior, so I'll work on it over there but provide updates here when appropriate.

@mattwthompson
Copy link
Member

This might be fixed by openforcefield/openff-toolkit#1805

@mattwthompson mattwthompson added openmm bug Something isn't working labels Jan 31, 2024
@mattwthompson
Copy link
Member

Hey @hjuinj - think we've fixed this!

In [1]: # using this two chain protein: https://files.rcsb.org/view/8U7W.pdb
   ...: !wget https://files.rcsb.org/view/8U7W.pdb
   ...:
   ...: from openmm.app import PDBFile
   ...:
   ...: # this is called by the Interchange.from_openmm method
   ...: from openff.interchange.components.toolkit import _simple_topology_from_openmm
   ...:
   ...: pdb = PDBFile("8u7w.pdb")
--2024-04-04 13:02:42--  https://files.rcsb.org/view/8U7W.pdb
Resolving files.rcsb.org (files.rcsb.org)... 128.6.159.100
Connecting to files.rcsb.org (files.rcsb.org)|128.6.159.100|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‘8U7W.pdb8U7W.pdb                            [  <=>                                                  ] 738.18K  2.62MB/s    in 0.3s

2024-04-04 13:02:42 (2.62 MB/s) -8U7W.pdbsaved [755892]


In [2]: %%timeit
   ...: _simple_topology_from_openmm(pdb.topology)
   ...:
   ...:
958 ms ± 9.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sorry for the delay - but thanks for the incredibly straightforward MWE, this allowed me to almost instantly drop in and retry this.

@hjuinj
Copy link
Author

hjuinj commented Apr 9, 2024

thanks so much @mattwthompson, how do I pull in this fix? do I just need to upgrade to the latest openff-toolkit release? or also need to upgrade interchange too?

@mattwthompson
Copy link
Member

mattwthompson commented Apr 9, 2024

I think upgrading to toolkit 0.15.0+ would do it, but upgrading Interchange (0.3.25) might bring in bugfixes or useful new features depending on what you're doing

https://docs.openforcefield.org/projects/toolkit/en/stable/releasehistory.html#id3
https://docs.openforcefield.org/projects/interchange/en/stable/releasehistory.html#current-development

@hjuinj
Copy link
Author

hjuinj commented Apr 9, 2024

@mattwthompson

Thanks again, I really would like to use this as a way to create an Interchange object from OpenMM, but for me this effectively halts:



from openmm.app import *
from openmm import *
from openmm.unit import *
import os
os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"
from sys import stdout

pdb = PDBFile('tmp.pdb') # from openmm example https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
forcefield = ForceField('amber14-all.xml', 'amber14/tip3pfb.xml')
system = forcefield.createSystem(pdb.topology, nonbondedMethod=PME, nonbondedCutoff=1*nanometer, constraints=HBonds)

# Create an Interchange object
out = Interchange.from_openmm(
        topology=pdb.topology,
        system=system,
        positions=pdb.positions,
        box_vectors=pdb.topology.getPeriodicBoxVectors(),
    )

Am I not using it correctly or is there still some issues?

@mattwthompson
Copy link
Member

I'm not sure what's causing your code to hang - a modified version below takes me only a few seconds to run. Maybe could you run this script and share the results?

https://github.com/openforcefield/status/blob/main/devtools/support/debug.sh

In [1]: !wget https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
--2024-04-10 15:22:10--  https://raw.githubusercontent.com/openmm/openmm/master/examples/input.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 594292 (580K) [text/plain]
Saving to: ‘input.pdbinput.pdb                              100%[===========================================================================>] 580.36K  --.-KB/s    in 0.09s

2024-04-10 15:22:10 (6.21 MB/s) -input.pdbsaved [594292/594292]


In [2]: import openmm.app
   ...: import openmm.unit
   ...:
   ...: import os
   ...:
   ...: from openff.interchange import Interchange
   ...:
   ...: os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"
   ...:
   ...: pdb_file = openmm.app.PDBFile("input.pdb")
   ...: system = openmm.app.ForceField("amber14-all.xml", "amber14/tip3pfb.xml").createSystem(
   ...:     pdb_file.topology,
   ...:     nonbondedMethod=openmm.app.PME,
   ...:     nonbondedCutoff=1 * openmm.unit.nanometer,
   ...:     constraints=openmm.app.HBonds,
   ...: )
   ...:
   ...: out = Interchange.from_openmm(
   ...:     topology=pdb_file.topology,
   ...:     system=system,
   ...:     positions=pdb_file.positions,
   ...:     box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
   ...: )


In [3]: %%timeit
   ...: Interchange.from_openmm(
   ...:     topology=pdb_file.topology,
   ...:     system=system,
   ...:     positions=pdb_file.positions,
   ...:     box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
   ...:    )
   ...:
5.51 s ± 172 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@hjuinj
Copy link
Author

hjuinj commented Apr 11, 2024

ah @mattwthompson I am so sorry, I mistakenly thought the interchange from openmm conversion is the time blocker, but in fact, is the subsequent file coversion that I didn't include in my above code that grinds the program to a halt, specifically:

out.to_prmtop("./tmp.prmtop")

Converting to e.g. amber format seems to be problematic with the halting and if I try instead out.to_top("tmp.top") I get

MissingBondError: Failed to find parameters for bond with topology indices (0, 1)

@mattwthompson
Copy link
Member

In that case the problem is just that the OpenMM object is missing bond parameters that the GROMACS (and Amber) files need. Because constraints=openmm.app.HBonds was used, OpenMM chooses not to populate bond parameters of constrained atom pairs. Taking that argument out (and ensuring later GROMACS/Amber/etc. files do use whatever constraints are sought!) might work.

Something similar has been reported with a different use case (openforcefield/openff-toolkit#1824) and in general this is similar to problems we have with missing information in water models (#732 #843).

The Amber writer might just be slow for large systems, that's something I unfortunately haven't had the time to work on optimizing. For a few thousand atoms it should work, if slowly

Depending on the protein force field you want to use (and if you have other components) you may be better off using our port of ff14SB? We have an example using it with an OpenFF ligand in case you haven't come across this: https://docs.openforcefield.org/en/latest/examples/openforcefield/openff-interchange/protein_ligand/protein_ligand.html#protein-component

@hjuinj
Copy link
Author

hjuinj commented Apr 15, 2024

Thanks @mattwthompson, the above code snippet passes the protein parametersation when I do constraints=None, however, it subsequently fails because water bonds are not tracked. Is there a straightforward way for me to turn the OpenMM example input.pdb above which contains protein + water into an .gro and .top file via interchange?

@mattwthompson
Copy link
Member

Not right now, I don't think, if the input has missing bonds. I think it's probably worth making a special case out of this, adding in bonds when something really looks like water. I'll take mess around with a stripped-down test case and update here if I run into any surprises. Thanks for your patience and feedback!

@mattwthompson
Copy link
Member

With #965 (soon to be in 0.3.26), I can load that PDB file through. It makes some guesses about what the parameters should be and the bond and angle parameters are pretty arbitrary, so it might take some after-running-a-script editing of the GROMACS files or other hacking for the water model to be accurately represented.

There are still some small energy differences - could be due to water bonds being considered where they shouldn't be, not sure. Please be careful using this code path since there's a lot of sharp edges, still.

import openmm.app
import openmm.unit
import os
from openff.interchange import Interchange
from openff.interchange.drivers import get_openmm_energies, get_gromacs_energies

os.environ["INTERCHANGE_EXPERIMENTAL"] = "1"

pdb_file = openmm.app.PDBFile("input.pdb")
system = openmm.app.ForceField("amber14-all.xml", "amber14/tip3pfb.xml").createSystem(
    pdb_file.topology,
    nonbondedMethod=openmm.app.PME,
    nonbondedCutoff=1 * openmm.unit.nanometer,
    constraints=None,
)

out = Interchange.from_openmm(
    topology=pdb_file.topology,
    system=system,
    positions=pdb_file.positions,
    box_vectors=pdb_file.topology.getPeriodicBoxVectors(),
)

print(get_gromacs_energies(out))
print(get_openmm_energies(out))
Bond:          		535.8565673828125 kilojoule / mole
Angle:         		1261.9296875 kilojoule / mole
Torsion:       		1812.1295166015625 kilojoule / mole
RBTorsion:     		0.0 kilojoule / mole
Nonbonded:     		None
vdW:           		20140.49871826172 kilojoule / mole
Electrostatics:		-138738.99938964844 kilojoule / mole

Energies:

Bond:          		542.2653182463949 kilojoule / mole
Angle:         		1261.687059590436 kilojoule / mole
Torsion:       		1896.524260454296 kilojoule / mole
RBTorsion:     		None
Nonbonded:     		-118597.39807930728 kilojoule / mole
vdW:           		None
Electrostatics:		None

@mattwthompson
Copy link
Member

I think the original issue has been resolved, please feel free to bring other issues to our attention as they arise/persist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working openmm
Projects
None yet
Development

No branches or pull requests

2 participants