Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-H1 and N-H3 bonds missing #19

Open
speleo3 opened this issue Feb 15, 2017 · 1 comment
Open

N-H1 and N-H3 bonds missing #19

speleo3 opened this issue Feb 15, 2017 · 1 comment

Comments

@speleo3
Copy link
Contributor

speleo3 commented Feb 15, 2017

N-H1 and N-H3 bonds are missing in current MMTF files.

Example (/1NMR/A/A/GLY`1):
n-term-missing-bonds

@josemduarte
Copy link
Member

This is a known problem, that we tracked some time ago in our internal issue tracking. I think it'd be best to describe it here again:

The full description of all compounds including protonations exists in a separate Protonation Variants Companion Dictionary which explains the issues we've seen with many hydrogen atoms missing in the normal CCD files (see http://www.wwpdb.org/data/ccd)

That means that to have a complete and accurate set of bonds we'd need to use those files too. Which presents some challenges, for instance the identifiers. Quoting the docs:

The dictionary of protonation variants provides additional nomenclature information for the protonation states of standard amino acids in N-terminal, C-terminal, and free forms, and includes common side chain protonation states. The identifiers used in this extension dictionary longer identifier codes to distinguish the various protonation forms of the standard amino acids. For instance, an identifier code ARG_LFOH_DHH12 is used to identify the arginine variant with a neutral peptide unit and side chain protonated at NH1. The extended identifier codes are not compatible with the 3-character format restrictions for the residue identifier in the PDB format, so these codes do not currently appear in PDB files. In PDB entries, protonated residues are identified by the 3-character code of their parent amino acid; however, the atom nomenclature for protonated forms will be taken from the variant dictionary definitions.

I checked one case (1a23 with H1, H2, H3 in 1st ALA of chain A) and the ALA_xxxx_xxxx identifiers are not present in the mmCIF file. So it looks that we can're really deal with this properly at the moment.

The problem in @speleo3 's example is the same, H1 and H3 are not in the standard CC dictionary entry for GLY but in one of the companion entries. Thus there's no bonds for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants