Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLH residue code #1855

Open
thempel opened this issue Feb 27, 2024 · 5 comments
Open

GLH residue code #1855

thempel opened this issue Feb 27, 2024 · 5 comments

Comments

@thempel
Copy link

thempel commented Feb 27, 2024

Hi, I think the one-residue-code of GLH is wrong in mdtraj.core.residue_names._AMINO_ACID_CODES:

'GGL': 'E', 'GHG': 'Q', 'GHP': 'G', 'GL3': 'G', 'GLH': 'Q', 'GLJ': 'E', 'GLK':

It should be a protonated GLU (E), not a protonated GLN (Q).

@thempel thempel changed the title ASH residue code GLH residue code Feb 27, 2024
@mattwthompson
Copy link
Member

Is there a canonical reference for these codes? It's all slightly confusing as a non-biophysicist

@thempel
Copy link
Author

thempel commented Feb 27, 2024

I agree that these naming conventions are confusing and can be different in different communities. I'd probably stay close to the definitions of amino acid residues given in the major force fields. E.g. GLH is defined in the amber force field (e.g. compare this line ). Unfortunately, I don't have a good canonical reference list or dictionary.

@peastman
Copy link
Contributor

The three letter codes are defined by the PDB. The names used by Amber are nonstandard and conflict with the PDB definitions.

@thempel
Copy link
Author

thempel commented Feb 28, 2024

True. It seems, and maybe @peastman can confirm, that the mdtraj one-letter code definitions are actually taken from the PDB chemical component dictionary.

About the current case: The PDB's definition of GLH gives a one-latter code Q, GLN as parent comp id, and name "N-5-CYCLOHEXYL-N-5-[(CYCLOHEXYLAMINO)CARBONYL]GLUTAMINE". So this isn't a different protonation state of a standard amino acid but a more complex chemical modification. In my experience, if you open a random MD simulation, the chances are pretty low that a residue named GLH actually refers to this, and very high that it's a protonated GLU from an amber-based simulation.

This means that the output of traj.topology.to_fasta() is very likely wrong if there is protonated residues with amber names. Adding to the confusion, some of the names used by amber are not listed, but are mapped to '' one-letter codes, silently producing sequences that are shorter than the number of amino acids in the protein. Not being listed also makes them be classified as not protein in selections.

@sukritsingh
Copy link
Collaborator

the mdtraj one-letter code definitions are actually taken from the PDB chemical component dictionary.

Correct - this allows us to follow a fairly robust standard to any possible force field used and be compatible with experimental structural biology standards. Especially since, as stated above by Peter:

The names used by Amber are nonstandard and conflict with the PDB definitions.

It looks like we do support the protonated form in residue_names.py already, so if the residue is specified as GLH in the topology during loading, it'll probably be fine? I haven't tested it before with amber engine files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants