-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding overhangs for synthetic peptides and QconCATs #5
Comments
my current code addOverhangs <- function (pep_seq, proteins, maxLength,
preferN = FALSE, preferC = FALSE){
proteinSeq <- grep (pep_seq, proteins, value = TRUE)
proteinSeqAA <- strsplit (proteinSeq, split = "")[[1]]
if (length (proteinSeq) > 1){
resultList <- list ("AA_before_20" = NA, "AA_after_20" = NA,
"spikeTide" = pep_seq, "result" = "non_proteotypic")
return (resultList)
stop ("non-proteotypic")
}
pepPosition <- regexpr (pep_seq, proteinSeq)[1]
pepLength <- nchar (pep_seq)
###############################################################################
# add 20 aa before
aaStart <- pepPosition - 20
if (aaStart > 0) {
AA_before_20 <- paste0 (proteinSeqAA[aaStart : (pepPosition - 1)], collapse = "")
} else {
AA_before_20 <- paste0 (proteinSeqAA[1 : (pepPosition - 1)], collapse = "")
}
###############################################################################
# add 20 aa after
aaEnd <- pepPosition + pepLength + 20
if (aaEnd < nchar (proteinSeq)) {
AA_after_20 <- paste0 (proteinSeqAA[(pepPosition + pepLength) : aaEnd], collapse = "")
} else {
AA_after_20 <- paste0 (proteinSeqAA[(pepPosition + pepLength) : nchar (proteinSeq)], collapse = "")
}
# apply the following rules:
##############################################################
# 1) for the preceeding AA
aaBefore <- strsplit (AA_before_20, split = "")[[1]]
aaBasic <- which (aaBefore == "K" | aaBefore == "R")
if (length (aaBasic) > 1){
aaBasic2 <- c (0, aaBasic[1:(length (aaBasic) - 1)])
firstGoodAA <- which ((aaBasic - aaBasic2 > 3))
if (length (firstGoodAA) > 0){
firstGoodAA <- aaBasic[max (firstGoodAA)]
aaToAddBefore <- paste (aaBefore[(firstGoodAA - 3) : length (aaBefore)] , collapse = "")
} else {
aaToAddBefore <- tail (aaBefore, 4)
}
} else {
aaToAddBefore <- tail (aaBefore, 4)
}
overhang_before <- paste (aaToAddBefore, collapse = "")
############################################################################
# 2) for the following AA
aaAfter <- strsplit (AA_after_20, split = "")[[1]]
aaBasic <- c (0, which (aaAfter == "K" | aaAfter == "R"))
if (length (aaBasic) > 1) {
aaBasic2 <- c (aaBasic[2:length (aaBasic)], length (aaAfter))
firstGoodAA <- which (aaBasic2 - aaBasic > 2)
if (length ( firstGoodAA) > 0){
firstGoodAA <- aaBasic[min (firstGoodAA)]
aaToAddAfter <- aaAfter[1: (firstGoodAA + 3)]
} else{
aaToAddAfter <- head (aaAfter, 3)
}
} else {
aaToAddAfter <- head (aaAfter, 3)
}
overhang_after <- paste (aaToAddAfter, collapse = "")
############################################################################
# add overhangs
length_with_overhangs <- sum (nchar (overhang_before), nchar (pep_seq),nchar (overhang_after))
# option 1: adding full overnags
if (length_with_overhangs <= maxLength ){
spikeTide <- paste (overhang_before, pep_seq, overhang_after , sep = ".")
result <- "complete_overhangs"
}
# option 2: shrotening preceeding overhang (succeding overnhamg is 3 aminoacids long)
if (length_with_overhangs > maxLength &
nchar (pep_seq) + 7 <= maxLength &
nchar (overhang_before) > 4 & nchar (overhang_after) < 4 ){
aaAllowedBefore <- maxLength - nchar (pep_seq) - nchar (overhang_after)
aaBefore <- strsplit (overhang_before, split = "")[[1]]
aaBefore <- aaBefore[(length (aaBefore) - aaAllowedBefore + 1) : length (aaBefore)]
new_overhang_before <- paste (aaBefore, collapse = "")
spikeTide <- paste (new_overhang_before, pep_seq, overhang_after , sep = ".")
result <- "N_overhang_shortened"
}
# option 3: shrotening succeding overhang (preceding overhang is 4 aminoacids long)
if (length_with_overhangs > maxLength &
nchar (pep_seq) + 7 <= maxLength &
nchar (overhang_before) < 5 & nchar (overhang_after) > 3 ){
aaAllowedAfter <- maxLength - nchar (pep_seq) - nchar (overhang_before)
aaAfter <- strsplit (overhang_after, split = "")[[1]]
aaAfter <- aaAfter[1 :aaAllowedAfter]
new_overhang_after <- paste (aaAfter, collapse = "")
spikeTide <- paste (overhang_before, pep_seq, new_overhang_after , sep = ".")
result <- "C_overhang_shortened"
}
# option 4: shrotening both overhangs, if both need to be shortened
if (length_with_overhangs > maxLength &
nchar (pep_seq) + 7 <= maxLength &
nchar (overhang_before) > 4 & nchar (overhang_after) > 3 ){
new_overhang_before <- paste0 (tail (strsplit (overhang_before, split = "")[[1]] , 4), collapse = "")
new_overhang_after <- paste0 (head (strsplit (overhang_after, split = "")[[1]] , 3), collapse = "")
spikeTide <- paste (new_overhang_before, pep_seq, new_overhang_after, sep = ".")
result <- "both_overhangs_shortened"
}
# option 5: add a single overhang
# important do not add less than 4 amino acids N-terminus and less than 3 amino acids on C-terminus
if ( nchar (pep_seq) + 7 > maxLength){
numAAToAdd <- maxLength - nchar (pep_seq)
# if user wants overhang on N-terminus
if (preferN & numAAToAdd >= 4) { # add amino acids
if (nchar (overhang_before) == 4 ){
spikeTide <- paste (overhang_before, pep_seq, sep = ".")
result <- "N_overhang_only"
} else {
aaAllowedBefore <- maxLength - nchar (pep_seq)
aaBefore <- strsplit (overhang_before, split = "")[[1]]
aaBefore <- aaBefore[(length (aaBefore) - aaAllowedBefore + 1) : length (aaBefore)]
new_overhang_before <- paste (aaBefore, collapse = "")
spikeTide <- paste (new_overhang_before, pep_seq, sep = ".")
result <- "N_overhang_only_shortened"
}
}
# if user wants overhang on C-terminus
if ((preferC & numAAToAdd >= 3) | numAAToAdd == 3) { # add amino acids
if (nchar (overhang_after) == 3 ){
spikeTide <- paste (pep_seq, overhang_after, sep = ".")
result <- "C_overhang_only"
} else {
aaAllowedAfter <- maxLength - nchar (pep_seq)
aaAfter <- strsplit (overhang_after, split = "")[[1]]
aaAfter <- aaAfter[1 :aaAllowedAfter]
new_overhang_after <- paste (aaAfter, collapse = "")
spikeTide <- paste (pep_seq, new_overhang_after , sep = ".")
result <- "C_overhang_only_shortened"
}
}
}
# return the results
resultList <- list ("AA_before_20" = AA_before_20,
"AA_after_20" = AA_after_20,
"spikeTide" = spikeTide,
"result" = result)
return (resultList)
} |
A couple of more comments:
"data:\RAW\pvs22_QTOF_DATA_data3\data_for_synapter_2.0\cleaver_overhangs" |
|
sgibb
added a commit
to lgatto/Pbase
that referenced
this issue
Jul 27, 2014
Closed via lgatto/Pbase#6. |
lgatto
pushed a commit
to lgatto/Pbase
that referenced
this issue
Feb 14, 2015
see sgibb/cleaver#5 for details git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Pbase@98013 bc3139a8-67e5-0310-9ffc-ced21a209358
jorainer
pushed a commit
to lgatto/Pbase
that referenced
this issue
Jul 4, 2018
see sgibb/cleaver#5 for details git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Pbase@98013 bc3139a8-67e5-0310-9ffc-ced21a209358
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
it is quite clear now, that 100% digestion efficiency with trypsin should not be assumed in proteomics workflows. Inefficient trypsin digestion also posses a very serious problems in absolute quantitation workflows using labelled isotopic standards.
The way isotopic standards are currently used is peptides to be quantified are synthesised labelled. Then a known amount of the labelled peptide is spiked in the sample prior to its analysis by LC-MS. After the acquisition the amount of unlabelled peptide (and hence its protein of origin), is computed as foolows
quantity_unlabelled = signal_unlabelled/signal_labelled * quantity_labelled
Consider quantitation of the following peptide: VTTYFPSVNLR. Below is a piece of protein sequence it originates from:
GNIR.VTTYFPSVNLR.KSSQK
note to get the peptide out of the protein digestion should occur after R, however R is followed by K, which is expected to result in two dead-end products:
VTTYFPSVNLR and VTTYFPSVNLRK
as a result the amount of VTTYFPSVNLR peptide is no longer proportional to protein amount and if absolute quantitation is performed using this peptide only, the amount of protein will be underestimated (a specific example of this happening is given in ref1).
The most obvious approach to counteract the problem is to ignore peptides like this. However this is not usually possible, given that only a limited amount of peptides suitable for quantitation is available per every protein. Thus the best solution is to mimic cleavage site by adding 3 amino acids before and after.
However consider the following peptide:
QNGRLR.HFTIPSHR.ARAGR
if we add RLR on N-teminus of peptide sequence again the cleavage site does not mimic what happens in the protein since if cleavage occurs after the first R in the protein it yeilds a dead end product:
LR.HFTIPSHR
hence the overhang needs to be extended 3 aa before the RLR. However this extension of overhangs is not always possible, since there is a limit to peptide's length (usually a synthetic peptide of no longer than 20aa) can be synthesised, hence additional parameters need to be passed to the model to determine the optimal compromise.
I will write out a detailed outline of the workflow if this functionality is to be added to cleaver.
references:
The text was updated successfully, but these errors were encountered: