Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with representation of amino acid sequence when there is incomplete data #45

Closed
digitalbio opened this issue Sep 24, 2021 · 2 comments

Comments

@digitalbio
Copy link

digitalbio commented Sep 24, 2021

The structure 1IGY has a two regions where a portion of the structure is missing. That part is fine.

The problem is that the sequence that corresponds to the missing regions is not shown correctly. The last amino acid at one end is D229. I highlighted this in the attached structure image and the attached sequence image. In the sequence, D229 is shown as if it is adjacent to C235. Instead of showing the D229 next to C235 there should be something to indicate that these amino acids are missing.

Cn3D used to handle this problem by inserting lower case n's to show that part of the sequence was missing. iCn3D is ignoring the missing 6 amino acids and displaying the sequence like they're still there.

PyMol handles this problem by putting dashes in the sequence where amino acids are missing. This image is attached also.

I liked the way Cn3D did this because the letters showed that there was something located at that position. In either case, it would be good to have iCn3D show the sequence correctly by including either n's or dashes to represent the missing residues.

I think the way PyMol handles this part of the sequence display well. It's important to show where the amino acids are missing because then we would show where the data are uncertain.

D229_structure

sequence

PyMol_sequence_view

@jiywang3
Copy link
Contributor

jiywang3 commented Sep 27, 2021

Similar to Cn3D, iCn3D also uses lower-case letter to indicate missing residues, e.g., residues at position 433 in the chain A of PDB 3GVU are missing: https://structure.ncbi.nlm.nih.gov/icn3d/share.html?n6HTtzbaJyaJ8Niu6.
The problem in your example is that the PDB file didn't indicate that the residues 230-234 are missing. The missing residues are specified with "REMARK 465". No missing residues were reported for this structure:
REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000
REMARK 475

@jiywang3
Copy link
Contributor

You could use NCBI residue number instead of PDB residue number in this case: https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?mmdbid=1igy&usepdbnum=0 . The parameter "usepdbnum" is 1 (true) by default. The NCBI residue numbers are D215 and C216 for the PDB residue number D229 and C235.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants