Allow ForceField to store extra descriptors for a residue #2757

tristanic · 2020-06-24T12:42:02Z

Something that would be really nice for interactive applications of OpenMM would be the ability to store extra information about a residue (e.g. long-format name(s), SMILES string, etc.) directly in the ffXML, so that it's stored as metadata in the ForceField when running loadFile. This came up because I've been working on adding a tool to ISOLDE for listing possible templates for a residue based on finding the maximum common subgraph between the residue and templates with (leniently) "similar" elemental composition (to recover from cases where the residue is incomplete, has incorrect hydrogens, etc.). The basic mechanism is working, but now I'm faced with the challenge of how to represent the results to the user. For example, if I delete the sidechain amine and run it past this method, I get:
Topology matches: LYN, LYS, CLYS, NLYS, PTM_LYZ, PTM_MLY, PTM_MLZ, ZK
... which is the correct answer in a sense, but will be confusing as heck to the user. Having a name associated with each one would make things much more clear:
LYN (neutral lysine)
LYS (lysine)
CLYS (C-terminal lysine)
NLYS (N-terminal lysine)
PTM_LYZ (5-hydroxylysine)
PTM_MLY (N6,N6-dimethyllysine)
PTM_MLZ (N6-methllysine)
ZK (free, zwitterionic lysine)

The text was updated successfully, but these errors were encountered:

peastman · 2020-06-24T16:15:12Z

It sounds like this feature would involve two distinct parts. 1) adding the infrastructure for storing extra information, and 2) adding that extra information to all the standard force fields. We would also need to figure out how to generate those labels when applying patches.

Do we want to support arbitrary, use defined metadata? Or would it be better to just support a single description attribute?

jchodera · 2020-06-24T16:38:25Z

This came up because I've been working on adding a tool to ISOLDE for listing possible templates for a residue based on finding the maximum common subgraph between the residue and templates with (leniently) "similar" elemental composition (to recover from cases where the residue is incomplete, has incorrect hydrogens, etc.).

@tristanic : Would you be up for chatting with the Open Force Field Initiative folks who are working on biopolymer force fields? We're very much headed in this direction, including having OFF Topology objects (which should provide all this information) and supporting very flexible parameterization of all manner of nonstandard residues.

cc: @j-wags who could help coordinate this call.

tristanic · 2020-06-24T17:23:49Z

@jchodera Absolutely! Not immediately, though. Crazy busy few weeks coming up.

If it's of interest, I found a really nice, fast open-source C++ implementation of a maximum-common-subgraph implementation (this one: https://www.ijcai.org/Proceedings/2017/0099.pdf) which I've wrapped for Python using PyBind11 (code at https://github.com/tristanic/isolde/tree/master/isolde/src/graph). Neatly handles the core problem of matching an incomplete/incorrect residue to potential templates. Feel free to use it if you see a need for it.

tristanic · 2020-06-24T17:26:01Z

@peastman a single description would go a long way. Beyond that, where a corresponding template exists in the Chemical Components Dictionary I'd love to be able to definitively link the two.

tristanic · 2020-07-16T12:10:58Z

What do you think of something like the following approach?

give ForceField a dict called something like _custom_attribute_handlers
add the following method:

class ForceField:
    def registerCustomResidueAttribute(self, attr_name, handler):
        '''
        Add a handler to look for extra residue attributes when loading a ffXML
        file. The handler should take two arguments: a `ForceField._TemplateData`
        and a string containing the value of the given attribute, and should 
        not return anything.
        '''
        self._custom_attribute_handlers[attr_name] = handler

... with most handlers being exceedingly simple, like:

    def common_name_handler(template, name):
        template.common_name = name

then, in ForceField.loadFile():

        for tree in trees:
            if tree.getroot().find('Residues') is not None:
                for residue in tree.getroot().find('Residues').findall('Residue'):
                    resName = prefix+residue.attrib['name']
                    template = ForceField._TemplateData(resName)

>                   for attr_name, handler in self._custom_attribute_handlers.items():
>                       custom_attr = residue.attrib.get(attr_name, '')
>                       handler(template, custom_attr)

                    if 'override' in residue.attrib:
                        template.overrideLevel = int(residue.attrib['override'])
                    atomIndices = template.atomIndices

peastman · 2020-07-16T16:51:19Z

Can you give examples of what handlers might do? What are the advantages over simply storing all extra attributes into a dict where they'll be available to user code, but not trying to do anything with them?

Of course, neither one addresses the problem of what to do about patched residues.

tristanic · 2020-07-16T17:30:36Z

Can you give examples of what handlers might do? What are the advantages over simply storing all extra attributes into a dict where they'll be available to user code, but not trying to do anything with them?

One use case for this is if the Chemical Components Dictionary template name is used as an attribute: the handler code could then map this to a separate database containing the info from there: amongst other things, ideal coordinates and explicit definitions of chiral centres. Of course, there are many ways to skin that cat.

Of course, neither one addresses the problem of what to do about patched residues.

Yes, I agree that's much harder. Possibly the patch could provide (a) string(s) to be appended to the name(s) of the affected residue(s) to specify how they've been changed? If, for example, a cysteine's common_name attribute is "L-cysteine", then the patch to deprotonated cys could provide a suffix1 attribute "deprotonated", so the new residue's common_name becomes L-cysteine, deprotonated. A disulfide patch would provide suffix1="disulfide", suffix2="disulfide", so both affected residues become L-cysteine, disulfide.

peastman · 2020-07-16T18:26:30Z

Alternatively, the template could directly provide a list of the patches that went into creating it. That might be more useful.

Of course getMatchingTemplates() doesn't consider patches anyway, but the private method _matchAllResiduesToTemplates() does, so as long as you don't mind that your code might break in the future you could use that to get patched templates.

tristanic · 2020-07-17T13:20:50Z

At the moment I'm not using patches - for a range of reasons including choices made by collaborators and lack of resources to properly support more than one forcefield, ISOLDE's currently pretty much locked to AMBER. That being said, I do find the patch approach more elegant than the "new template for every bonding variant" one, and at some point I'd certainly like to start looking down that path.

peastman · 2022-02-21T19:49:17Z

I'm looking into implementing this now. For the code part, I'm thinking it would be best to just do the very simple implementation: add an attributes dict to the classes representing residues and patches. Any attributes from the XML tag will be stored in it. We won't try to process them in any way. They're just there for future reference, in case you want to do something with them.

To be useful, the XML files need to actually contain extra information. Should we try to regenerate some of our force fields adding extra attributes? The Amber input files do contain longer descriptions that we could put into a description attribute. For example,

HISTIDINE DELTAH                                                
                                                                
 HID  INT     1                                                 
 CORR OMIT DU   BEG                                             
 ...

peastman · 2022-05-16T18:40:42Z

@tristanic any thoughts on the above questions?

tristanic · 2022-05-17T08:19:10Z

Sorry for my long radio silence - a lot of time spent working on figuring out my next career move (successfully, I'm happy to say - and in a way that'll substantially increase my ability to keep working in this area). Anyway, yes - I could definitely work with that approach!

tristanic · 2022-05-17T08:29:17Z

If you wanted a richer source of data to mine, there's always the Chemical Components Dictionary (https://www.wwpdb.org/data/ccd). Encompasses residue name (including synonyms), type (nucleic, peptide, etc.), various flavours of SMILES, and example coordinates (experimental and ideal). Things can get a bit hairy when you get out into the more exotic stuff (erroneous/missing coordinates, only IUPAC rather than common names, etc.) - but for the common stuff it's very reliable.

peastman · 2022-05-17T14:14:04Z

Thanks! I'll start by adding the attributes field. Then we can consider changes to https://github.com/openmm/openmmforcefields so it will store useful attributes when generating new force fields.

Congratulations on your career move, whatever it is!

peastman · 2022-05-17T21:06:18Z

It's in #3604.

peastman added the enhancement label Jun 24, 2020

tristanic mentioned this issue Jul 6, 2020

Substructure matching for incomplete/incorrect templates #2772

Open

peastman mentioned this issue Jul 28, 2021

Planning for 7.7/8.0 #3191

Closed

peastman added this to the 7.7 milestone Aug 9, 2021

peastman modified the milestones: 7.7, 7.8 Nov 5, 2021

peastman mentioned this issue May 17, 2022

ForceFields can store extra attributes for residues #3604

Merged

peastman closed this as completed in #3604 May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow ForceField to store extra descriptors for a residue #2757

Allow ForceField to store extra descriptors for a residue #2757

tristanic commented Jun 24, 2020

peastman commented Jun 24, 2020

jchodera commented Jun 24, 2020

tristanic commented Jun 24, 2020

tristanic commented Jun 24, 2020

tristanic commented Jul 16, 2020

peastman commented Jul 16, 2020

tristanic commented Jul 16, 2020

peastman commented Jul 16, 2020

tristanic commented Jul 17, 2020

peastman commented Feb 21, 2022

peastman commented May 16, 2022

tristanic commented May 17, 2022

tristanic commented May 17, 2022

peastman commented May 17, 2022

peastman commented May 17, 2022

Allow ForceField to store extra descriptors for a residue #2757

Allow ForceField to store extra descriptors for a residue #2757

Comments

tristanic commented Jun 24, 2020

peastman commented Jun 24, 2020

jchodera commented Jun 24, 2020

tristanic commented Jun 24, 2020

tristanic commented Jun 24, 2020

tristanic commented Jul 16, 2020

peastman commented Jul 16, 2020

tristanic commented Jul 16, 2020

peastman commented Jul 16, 2020

tristanic commented Jul 17, 2020

peastman commented Feb 21, 2022

peastman commented May 16, 2022

tristanic commented May 17, 2022

tristanic commented May 17, 2022

peastman commented May 17, 2022

peastman commented May 17, 2022