-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposals for handling residue replacement #44
Comments
Any comments on this proposal? |
Unless I'm missing something, modes 1 and 2 are already supported. They correspond to clicking "Select None" or "Select All", respectively, on the page for replacing residues. In the case of mode 1, adding missing atoms to nonstandard residues is impossible. They're not standard residues. We don't know what atoms they're supposed to contain. So mode 3 is the main new thing. It sounds very interesting, though I don't know how well the graph matching would work. Especially if the nonstandard residue also happens to be missing some atoms. And I'm not really clear what use cases this would be valuable for. If you want to process a whole lot of files in an automated way, and you don't care about any one of them being as realistic as possible, mode 2 is probably fine. On the other hand, if you really care about making each model realistic, you probably want to select the substitutions by hand rather than trusting it to decide automatically. |
From my proposal: "The chemical component dictionary should be consulted for how to fill in missing residues based on residue names." |
"by hand" is not something that makes any sense in the context of tools meant for automated, large-scale pipelining. If you wanted to do everything "by hand", you would be re-refining atoms from electron density, not running pdbfixer. To give you an example from our workflow: We want to model ~500 human kinases from crystal structures. Some of these kinases may have post-translational modifications, such as phosphoserines, phosphothreonines, and phosphotyrosines. Suppose we have an We could require |
I don't think you'd be satisfied with the results, if you tried to do this by graph matching. What you just described is a much simpler problem: just a list of particular residues you don't want it to replace. That's very different from having some unrelated, nonstandard residue, possibly with some missing heavy atoms, and then it gets replaced by a phosphoserine because the graph matching algorithm just happened to decide that was the closest match to the atoms that happened to be present. Let me offer an alternate proposal: you can register new residue types, providing all the information it needs about them. Once you do that, it will treat them just like standard residues. It won't try to replace them with something else, and it will know how to add missing atoms to them. |
Couldn't we have it process an |
The information it needs is different from what's in a force field. It needs a PDB file (or equivalent information) giving a reference structure for each one. Take a look in the "templates" directory. |
Once again, the Chemical Components Dictionary provides this information if the residue appears in the PDB. If it's a small molecule, parameterization engines like If coordinates are not available, it is possible to generate them using the forcefield terms (bonds, angles, torsions), though this is a bit of a dicier proposition. It shouldn't be a huge problem to do something sensible from bonds, angles, and torsions using OpenMM, however. |
I'm looking into how we might be able to do this. In particular, if it's processing a file and comes across a residue it doesn't have a template for, it would automatically download one and use that. Ligand Expo (http://ligand-expo.rcsb.org/ld-download.html) has structures for any residue you're ever likely to come across. More precisely, they catalog every occurrence of each one in the PDB and let you download them independently. So we could use that to get coordinates and bond information. A potential problem is that a given occurrence won't necessarily have all the atoms. But they do generally seem to give the chemical formula, so we could just start at the top of the list of files and keep loading them until we find one with all atoms. Another problem is that, while the files will give us all the bonds within the residue, they don't give the bonds between residues. But we can probably figure those out. If something occurs in a chain with amino acids, and if it includes atoms called "N" and "C", we can just assume those should be bonded to the adjacent residues. Likewise for DNA and RNA. There's one problem I haven't figured out a solution to though: most (possibly all?) of these files include only heavy atoms. How can we identify where the hydrogens should go? |
Ligand Expo is just a browser to view the contents of the Chemical Components Dictionary I mentioned previously in this thread. I think it's a great idea to provide a mode of operation for The big problem, as you note, is that we don't how how polymeric residues are supposed to be connected to each other. I think that's a big shortcoming, and something we might push the PDB to rectify. The best short-term workaround is for us to provide a "known" list of polymers that behave the same way, as you suggest, like DNA, RNA, and amino acids (natural and post-translational modified or unnatural). For hydrogens, we can use tools like the OpenEye Toolkit to protonate the CCD database once and for all (they have a specific academic license for putting the results in the public domain), but this would protonate all sites. We'd still need to determine how to remove the protons that end up on atoms that are bound in polymers. As I suggested before, I think we really want to support three overall modes of operation:
|
Would something like Molprobity work? |
I think the LigandExpo chemical components dictionary actually does include hydrogens, so this may not be a problem. The "reduce" program uses this information. |
Ah, well then, +1 from me. |
Which functionality is most critical for you right now? The PDB
functionality, or the forcefield template functionality?
|
The forcefield templating would probably be most desirable. There's already mutation functionality, but adding PTMs would be awesome. |
Which forcefield(s) are you thinking of at the moment, and any PDB examples that would be good tests for a first pass at this feature? |
This thread breaks out the discussion from #31 into a separate thread about how we should handle residue replacement.
I would like to propose we work toward having pdbfixer use three possible modes for whether and how it replaces residues and fills in missing atoms:
All of these modes are slight variations of the current pdbfixer behavior, so I think it is best to build these into pdbfixer, rather than making separate tools for these.
The text was updated successfully, but these errors were encountered: