Problem with representation of amino acid sequence when there is incomplete data #45

digitalbio · 2021-09-24T22:25:24Z

The structure 1IGY has a two regions where a portion of the structure is missing. That part is fine.

The problem is that the sequence that corresponds to the missing regions is not shown correctly. The last amino acid at one end is D229. I highlighted this in the attached structure image and the attached sequence image. In the sequence, D229 is shown as if it is adjacent to C235. Instead of showing the D229 next to C235 there should be something to indicate that these amino acids are missing.

Cn3D used to handle this problem by inserting lower case n's to show that part of the sequence was missing. iCn3D is ignoring the missing 6 amino acids and displaying the sequence like they're still there.

PyMol handles this problem by putting dashes in the sequence where amino acids are missing. This image is attached also.

I liked the way Cn3D did this because the letters showed that there was something located at that position. In either case, it would be good to have iCn3D show the sequence correctly by including either n's or dashes to represent the missing residues.

I think the way PyMol handles this part of the sequence display well. It's important to show where the amino acids are missing because then we would show where the data are uncertain.

jiywang3 · 2021-09-27T18:01:21Z

Similar to Cn3D, iCn3D also uses lower-case letter to indicate missing residues, e.g., residues at position 433 in the chain A of PDB 3GVU are missing: https://structure.ncbi.nlm.nih.gov/icn3d/share.html?n6HTtzbaJyaJ8Niu6.
The problem in your example is that the PDB file didn't indicate that the residues 230-234 are missing. The missing residues are specified with "REMARK 465". No missing residues were reported for this structure:
REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000
REMARK 475

jiywang3 · 2021-09-28T12:57:14Z

You could use NCBI residue number instead of PDB residue number in this case: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1igy&usepdbnum=0 . The parameter "usepdbnum" is 1 (true) by default. The NCBI residue numbers are D215 and C216 for the PDB residue number D229 and C235.

jiywang3 closed this as completed Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with representation of amino acid sequence when there is incomplete data #45

Problem with representation of amino acid sequence when there is incomplete data #45

digitalbio commented Sep 24, 2021 •

edited

jiywang3 commented Sep 27, 2021 •

edited

jiywang3 commented Sep 28, 2021

Problem with representation of amino acid sequence when there is incomplete data #45

Problem with representation of amino acid sequence when there is incomplete data #45

Comments

digitalbio commented Sep 24, 2021 • edited

jiywang3 commented Sep 27, 2021 • edited

jiywang3 commented Sep 28, 2021

digitalbio commented Sep 24, 2021 •

edited

jiywang3 commented Sep 27, 2021 •

edited