-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better interoperability with AMBER #25
Comments
I would strongly object to changing how we write PDB files. PDB is a well defined file format, and that includes atom and residue names. A file that uses different names is simply an invalid PDB file. The fact that AMBER can't read standard PDB files is a serious bug in their code, and they need to fix it. I think all of us in this field need to be less tolerant of programs that produce invalid PDB files, because that's the only way we'll get them to change. For example, Gromacs used to use incorrect atom and residue names, but in recent versions they've changed to always write PDB files with the correct (standard) names. Let's try to get AMBER to do the same. That's the best thing for their users, and for the whole field. |
in would tend to agree with Peter. These issues are annoying and would be great for us to not contribute to the problem. I think a reasonable middle ground would be to write a simple conversion utility that can map between the two, rather than having OpenMM write non-standard files. I also have several cases where I would like to round trip things between Amber and OpenMM (in my case to get per-residue energies out of snapshots). I think this:
would be great though. |
There are a number of programs that (ab)use the PDB format or mol2 format to store nonstandard package-specific information, and have done so for decades. Adopting an attitude of "fuck you" toward these programs with enormous established user bases will not be helpful to OpenMM. Does anybody have a useful suggestion on how to actually interoperate with AMBER besides trying to convince the entire AMBER developer community to break all backwards-compatibility by only working with PDB version 3.30 conformant files? You're welcome to go to their next developer meeting and lobby for that, but this will not actually accomplish anything in the short term. Any useful input on the other proposal that PrmtopFile and InpcrdFile could be extended to write AMBER-format input files? Note that this is not an invitation to rant about how they should be using XML instead, but rather a request for some constructive discussion to come up with a way in which we can implement interoperability within the framework, goals, and ideals of OpenMM. |
Also, I will note that |
I don't want to take a position either way here, but from an implementation perspective, one option is to look at the Also, it is possible to write a PDB using |
No one suggested that AMBER should "break all backwards-compatibility". They certainly should continue to read the files generated by previous versions of their program. But that's no reason to not also support standard PDB files. It has been over 6 years since PDB 3.0 was released, and it's absurd that they still haven't added support for it. AMBER can't even handle files downloaded directly from RCSB. They can't expect everyone else in the field to implement workarounds for their bugs. What would be the reason for creating a PDBFile object from a topology and set of coordinates? What would you do with that object? Writing inpcrd files would be pretty easy, so that would be a reasonable feature. But wouldn't mdcrd be more useful? Writing prmtop files would be a lot more work, but it's also a possibility.
It uses that when reading, not writing. We try as much as possible to deal with whatever files we come across and convert them into something more standard. This is a widely recognized principle in software engineering whenever you need to communicate using a standardized protocol: be as tolerant as possible in the input you accept, and be as strict as possible in the output you generate. |
How about we define an AMBER PDB file type and have a (maybe derived) class for writing that? It seems that this really is a file type to itself. Thanks, Vijay Sent from my phone. Sorry for any brevity or unusual tone. On Jun 17, 2013, at 6:22 PM, John Chodera notifications@github.com wrote:
|
Nor can OpenMM, mind you. The OpenMM app can't handle anything with chain breaks or missing heavy atoms, which is most of the content of the RCSB. |
Right now, Currently, the only thing you can do with the |
This is reasonable, but we still need a way to interoperate with AMBER.
This is one possible solution: Allowing PDBFile to speak different "dialects" or write files with different "flavors". OpenEye's OEChem tools use this scheme to interoperate with different programs by writing PDB or mol2 files in different "flavors", for example. This violates the strictness-of-output principle stated above, however. |
I believe you need
This appears to be the only available route for bidirectional compatibility with AMBER, but has several drawbacks. Notably, For extending writeFile(positions, file=sys.stdout, boxVectors=None, title=None) B. Modify the constructor to allow construction from an input file or construction from def __init__(self, file=None, loadVelocities=False, loadBoxVectors=False, positions=None, boxVectors=None, title=None) C. Create an @staticmethod
generateAmberInpcrdFile(positions, boxVectors=None, title=None) For extending writeFile(system, topology, file=sys.stdout) Note that not all def __init__(self, file=None, system=None, topology=None) C. Create an @staticmethod
generateAmberPrmtopFile(system=None, topology=None) I'm happy to do the coding here (if desired) since I need this interoperability ASAP, but I'd very much like some useful feedback on the most consistent API and implementation so that this would actually be useful to others. |
This could be a workable approach to writing AMBER-compatible PDB files, though it may lead to what might be considered unnecessary duplication of code paths. |
Apologies for the threadjack, but I recently wrote a very basic Python tool for outputting mdcrds to AMBER's netcdf file format (given a prmtop). It can be found here: https://bitbucket.org/mjw99/amberopenmmutils I think the ability to write prmtop files would be very useful, and I've even considered writing such code myself, but I think it is very difficult to get right. This is because the format (and leap) has quirks and quite a few esoteric aspects that I've been burnt with in the past. In addition, I think AMBER should have migrated to an XML format for this a while ago, however, there seems to be an internal resistance to this. That said, the documentation on the parm file format here, http://ambermd.org/formats.html is useful, but recently this has been augmented by Jason Swails http://archive.ambermd.org/201305/att-0256/prmtop.pdf and is more useful. |
Converting OpenMM System+Topology into a prmtop will indeed be challenging because atom, bond, and angle types must be extracted from a System object, and a number of checks must be made before it can be concluded the System can even be correctly represented in a prmtop file. This is certainly orders of magnitude more difficult than translating atom names to what AMBER expects in a PDB file. |
In fact, it may be simpler to enable Forcefield write an Amber prmtop file, effectively replacing the AmberTools path for setting up systems. We could still load in correct PDB files generated by OpenMM (potentially including solvent), assign parameters from any forcefield OpenMM supports, and write prmtop files (for those forcefields that support the limitations of prmtop files). Peter, what would you think about this? It avoids the issues you were concerned about. |
I think it would help to back up for a moment and decide what use cases we're trying to support. "Interoperability" can mean a lot of things. Some examples include
What are the particular things we want to enable? |
On the API-design part of this thread, I want to cast my -1 against writing any more In my opinion, the APIs for file like objects should mirror the python |
All of the use cases @peastman suggested are useful. I think it should mainly be a matter of prioritizing. My suggested priority order is:
We have the functionality for (1) already via the current My most pressing need is to enable (2) in some manner: The ability to take a system from OpenMM to be simulated in AMBER. The reason for this is that we have built a modeling pipeline that does a lot of things in OpenMM that would be difficult to do in other codes (adjusting number of waters, modifying protonation states, refinement in implicit and explicit solvent) and now need to prepare input files for the AMBER GPU port to run straightforward MD simulations on leadership computing resources (e.g. large chunks of Blue Waters or Titan). Note that this is only because AMBER currently has a speed advantage for straightforward MD over OpenMM. (2) is currently not possible because we cannot easily set up an equivalent system in AMBER due to the renaming of residues and atoms; you can't simply read in a PDB file generated by OpenMM into LEaP. "Fixing" LEaP to read PDB v3.30 files and detect protonation states would require a great deal of effort, effectively implementing the same system used in the OpenMM app into LEaP. I've proposed a few ideas already in this thread about how #2 could be achieved (allow writing of AMBER-flavored PDB files; allow writing of prmtop/inpcrd files from (3) would allow more flexibility than #2, but at the cost of the extra hassle of having to go through LEaP before running AMBER's (4) would also be useful to some users due to the currently limited analysis tools in OpenMM. The main analysis tool would likely be AmbertTools |
@rmcgibbo : Can you give an example of how
Same query here. |
It sounds to me like writing inpcrd and prmtop files is the best way to go. Some aspects of that will be challenging - what do we do if you have a CustomGBForce? - but a large fraction of systems created by ForceField should be straightforward to translate. Mark, can you describe the particular issues you ran into with writing prmtop files? We could definitely add a write() method to PDBFile. Note, though, that those static methods were not created arbitrarily! PDBReporter needs to write a file in pieces without ever having all the data in memory at once. It writes the header when you begin the simulation, writes a new model at each report interval, and writes the footer at the end of the simulation. I could have made those into instance methods, but then we would have mixed up the APIs for reading and writing PDB files in a confusing way. |
Hi Peter, apologies for the verbose reply, but these were the main issues I found/was aware of, when I was considering writing a prmtop file generator:
P.S. (re my point 1; a quick google has turned up this, which I was not aware of) |
Writing AMBER prmtop files will definitely be challenging, mainly because the atom types need to be automatically determined from the The best strategy to generate a reasonable prmtop file may be to use the information in the XML forcefield specification (either directly or via The AmberTools distribution contains a Python tool for reading/writing prmtop files (from Jason Swails) that I had not previously been aware of:
I suggest we make use of this tool to read/write prmtop files for this purpose, since this will presumably continue to be supported by the AMBER developers. The tool provides a means of building the arcane formatted contents of the prmtop file from structures that reflect the underlying topology and parameter sets (or vice-versa). I think it can also support "chamber" CHARMM-in-AMBER files. |
Thanks Mark, that's great information to have. And I'll take a look at the AmberTools code. |
Currently, OpenMM can create
System
objects by reading AMBER prmtop/inpcrd or parameterizing proteins with AMBER forcefields from PDB files, but going the other direction (writing AMBER prmtop/inpcrd files or writing PDB files that AmberTools LEaP can read) is not possible. I am opening this issue to discuss the best way to support bidirectional exchange of systems with AMBER and AmberTools.One simple way to allow systems set up in OpenMM to be imported into AMBER would be to ensure that OpenMM can write PDB files that AMBER can read. The current
PDBFile
scheme actually does support residue and atom name translation tables, but hard-codes a standard PDB output schema. This schema eliminates the protonation state sensitive residue naming and uses hydrogen atom names unknown to AMBER. Instead, perhaps the user could request that the schema from the AMBER forcefield XML files be used instead? These PDB files could then be read into LEaP and reparameterized with the same forcefield. One may also have to be cautious of atom ordering: LEaP may be insensitive to this, but there might be good reasons one wants the atom ordering within residues to be the same as in AMBER.Allowing
PDBFile
objects to be created from OpenMMTopology
objects and position arrays (orState
objects) would also be extremely useful. Currently, the only way to create aPDBFile
object is to read a PDB file in the constructor.Another means of interoperability, in addition to writing PDB files AMBER can read, would be to extend the
PrmtopFile
andInpcrdFile
classes to allow the writing of prmtop/inpcrd files, and the creation ofPrmtopFile
andInpcrdFile
objects fromTopology
andSystem
objects (for prmtop) andState
objects or position/velocity arrays (for inpcrd).What do you think? Does this sound like a reasonable way to extend functionality?
The text was updated successfully, but these errors were encountered: