Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve assign peptide type 243 #263

Open
wants to merge 15 commits into
base: developer
Choose a base branch
from

Conversation

elena-krismer
Copy link
Collaborator

closes #243

@jpquast
Copy link
Owner

jpquast commented Aug 3, 2024

Let me know if I should review this.

@elena-krismer
Copy link
Collaborator Author

@jpquast i'm getting an error from vroom on macos and windows- have you seen this error before? (it's not related to the changes i made)

@jpquast
Copy link
Owner

jpquast commented Sep 7, 2024

Hi Elena,
I had a quick look at the function. I think it is necessary to introduce the protein_id column as you do in order to know which peptides belong to the same protein. However, as I can see you don't have a way to know about the initiator methionine. I think maybe I was not very clear about that. The point was to assign the peptide as fully tryptic if it starts basically at position 2 of the protein and there is no other peptide that starts at 1. In those cases the initiator cysteine is likely completely absent for most of the copies of this protein in the cell. As far as I can tell you check if any of the peptides of the protein don't have a preceding methionine (which could also be in the middle of the protein).
What I would suggest is to also require the start and end column for each protein and then just check if any starts with position 1. If yes keep the original annotation as it is. If not then any peptide starting at position 2 is considered fully-tryptic if it fulfils its C-terminal criterium.

As far as I can tell the output of the function is generally currently wrong:

assign_peptide_type(data, aa_before, last_aa, aa_after, protein_id)
  aa_before last_aa aa_after protein_id      pep_type
1         K       R        T         P1 fully-tryptic
2         S       K        R         P1 fully-tryptic
3         T       Y        T         P2   non-tryptic
4         M       K        R         P2  semi-tryptic

Row 2 should be semi-tryptic since the aa_before is not K/R. Not sure why the function now gets those standard cases wrong.
Row 4 works of course as you expected based on your code, but is conceptually wrong as explained above.

Not sure why vroom fails, but if it still does we can have a look at that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants