New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow ForceField to store extra descriptors for a residue #2757
Comments
It sounds like this feature would involve two distinct parts. 1) adding the infrastructure for storing extra information, and 2) adding that extra information to all the standard force fields. We would also need to figure out how to generate those labels when applying patches. Do we want to support arbitrary, use defined metadata? Or would it be better to just support a single |
@tristanic : Would you be up for chatting with the Open Force Field Initiative folks who are working on biopolymer force fields? We're very much headed in this direction, including having OFF cc: @j-wags who could help coordinate this call. |
@jchodera Absolutely! Not immediately, though. Crazy busy few weeks coming up. If it's of interest, I found a really nice, fast open-source C++ implementation of a maximum-common-subgraph implementation (this one: https://www.ijcai.org/Proceedings/2017/0099.pdf) which I've wrapped for Python using PyBind11 (code at https://github.com/tristanic/isolde/tree/master/isolde/src/graph). Neatly handles the core problem of matching an incomplete/incorrect residue to potential templates. Feel free to use it if you see a need for it. |
@peastman a single |
What do you think of something like the following approach?
... with most handlers being exceedingly simple, like:
|
Can you give examples of what handlers might do? What are the advantages over simply storing all extra attributes into a dict where they'll be available to user code, but not trying to do anything with them? Of course, neither one addresses the problem of what to do about patched residues. |
One use case for this is if the Chemical Components Dictionary template name is used as an attribute: the handler code could then map this to a separate database containing the info from there: amongst other things, ideal coordinates and explicit definitions of chiral centres. Of course, there are many ways to skin that cat.
Yes, I agree that's much harder. Possibly the patch could provide (a) string(s) to be appended to the name(s) of the affected residue(s) to specify how they've been changed? If, for example, a cysteine's |
Alternatively, the template could directly provide a list of the patches that went into creating it. That might be more useful. Of course |
At the moment I'm not using patches - for a range of reasons including choices made by collaborators and lack of resources to properly support more than one forcefield, ISOLDE's currently pretty much locked to AMBER. That being said, I do find the patch approach more elegant than the "new template for every bonding variant" one, and at some point I'd certainly like to start looking down that path. |
I'm looking into implementing this now. For the code part, I'm thinking it would be best to just do the very simple implementation: add an To be useful, the XML files need to actually contain extra information. Should we try to regenerate some of our force fields adding extra attributes? The Amber input files do contain longer descriptions that we could put into a
|
@tristanic any thoughts on the above questions? |
Sorry for my long radio silence - a lot of time spent working on figuring out my next career move (successfully, I'm happy to say - and in a way that'll substantially increase my ability to keep working in this area). Anyway, yes - I could definitely work with that approach! |
If you wanted a richer source of data to mine, there's always the Chemical Components Dictionary (https://www.wwpdb.org/data/ccd). Encompasses residue name (including synonyms), type (nucleic, peptide, etc.), various flavours of SMILES, and example coordinates (experimental and ideal). Things can get a bit hairy when you get out into the more exotic stuff (erroneous/missing coordinates, only IUPAC rather than common names, etc.) - but for the common stuff it's very reliable. |
Thanks! I'll start by adding the Congratulations on your career move, whatever it is! |
It's in #3604. |
Something that would be really nice for interactive applications of OpenMM would be the ability to store extra information about a residue (e.g. long-format name(s), SMILES string, etc.) directly in the ffXML, so that it's stored as metadata in the
ForceField
when runningloadFile
. This came up because I've been working on adding a tool to ISOLDE for listing possible templates for a residue based on finding the maximum common subgraph between the residue and templates with (leniently) "similar" elemental composition (to recover from cases where the residue is incomplete, has incorrect hydrogens, etc.). The basic mechanism is working, but now I'm faced with the challenge of how to represent the results to the user. For example, if I delete the sidechain amine and run it past this method, I get:Topology matches: LYN, LYS, CLYS, NLYS, PTM_LYZ, PTM_MLY, PTM_MLZ, ZK
... which is the correct answer in a sense, but will be confusing as heck to the user. Having a name associated with each one would make things much more clear:
LYN (neutral lysine)
LYS (lysine)
CLYS (C-terminal lysine)
NLYS (N-terminal lysine)
PTM_LYZ (5-hydroxylysine)
PTM_MLY (N6,N6-dimethyllysine)
PTM_MLZ (N6-methllysine)
ZK (free, zwitterionic lysine)
The text was updated successfully, but these errors were encountered: