Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct saturation by only quantifying on unsaturated isotopes in all samples under comparison #39

Closed
pavel-shliaha opened this issue Jan 21, 2013 · 75 comments
Assignees
Milestone

Comments

@pavel-shliaha
Copy link
Collaborator

The prototype of suggested function:

requantify <- function (synapterObj , satThreshold, minIsotopes)

synapterObj - list of synapter objects
satThreshold - intensity over which accurate ion recording is not possible, due to saturation
minIsotopes - minimum number of isotopes accepted for requantitation

For every peptide the function will find common unsaturated ions from supplied synapter objects, and requqntify EMRT using these ions only. satThreshold is used to determine which ions saturate. minIsotopes is the minimum number of isotopes that are sufficient to requantify (i.e. sometimes only few isotopes will be seen in all samples under saturation and over LOD)

Important: the information on peptide isotopes is not presented in current synapter objects! Pep3D file that is loaded in synapter is filtered so that for each EMRT only one isotope is preserved. This means that synapter object will have to be modified to include this information. Current consensus is that a list should be created each element of which represents a single EMRT and contains information on ion intensities from those EMRTs

@pavel-shliaha
Copy link
Collaborator Author

Laurent asked me to create a comment here:

I have had a look at the data and noticed that the saturation does not affect fragment quantities and lower abundance isotope. I would suggest the following easy to implement soultion:

  1. when you load the P3D files keep only a single line per EMRT, but also record isotopic distribution of each EMRT. For example: write z2i0_1000_z2i1_500_z2i2_100_z3i0_200 which corresponds to a list of ions: charge 2 isotope with C13 = 0 has intensity of 1000, charge 2 isotope C13 = 1 with intensity of 500, charge 2 isotope C13 = 1 with intensity of 100, charge 3 isotope C13 = 0 with intensity of 200. Kepp this line for every EMRT

  2. when matching fragments write the same but in form fragment_intensity, for example: b19_1000_y3_500.

Once you have applied a function of combining individual synapter analysis into MSnsets, you can apply a command of correcting saturation either on fragments or isotopes, by saying quantify on all fragments or isotopes which intensity is below certain threshold. Note that this is done AFTER several synapter analysis are combined into an MSnSet.

@sgibb
Copy link
Collaborator

sgibb commented Mar 29, 2015

As we discussed on skype we will follow the first strategy (quantify on peptides). Now the isotopic distribution is stored in the Synapter object and the resulting MSnSet:

## isotopicDistr is formated: ion_z,ion_iso,ion_counts
tail(cbind(exprs(m), fData(m)["isotopicDistr"]))
#                   Synapter1                                                        isotopicDistr
# IVDTWR                 1198                                               2,0,838;2,1,360;1,0,30
# ESASHETPTDLRPVIPR       555                                        3,0,275;3,1,280;1,0,56;1,1,43
# GYDLPDEVFDENEK          775                                       2,0,331;2,1,293;2,2,151;1,0,37
# IFGLEK                64706 1,0,42448;1,1,17358;1,2,2956;1,3,417;2,0,1037;2,1,490;1,0,118;1,1,83
# LATMNIDIDLTSGGK         414                                              3,0,218;3,1,196;1,0,122
# QNLFNNTVK              1266                                       2,0,703;2,1,399;2,2,164;1,0,33

How we realize the requantification? Do we simple sum up all counts below a threshold, e.g. for the last peptide: satThreshold = 500; newCount = 399 + 164 + 33 (596)?

@pavel-shliaha
Copy link
Collaborator Author

To do requantification synapter needs to first select all ions that are present in all the runs under saturation threshold. It then sums the intensity of these ions for every run, which is the new quantitation value. For example:

run1: 2,0,838;2,1,360;1,0,30
run2: 2,0,8000;2,1,4000;1,0,30

saturation threshold: 5000

there is 1 isotope present in all runs below saturation threshold: 2.1. so the new quantitation value is now 360 + 30: 4000 + 30

@sgibb
Copy link
Collaborator

sgibb commented Mar 30, 2015

So we just sum up common isotopes. For your example run 1 will be (360+30)/(4000+30) = 0.097 and run 2 will be 1.0? (these values are no counts anymore)

What do we do if there are

  1. no isotopic distribution information avaiable (e.g. rescued EMRTs)
  2. no counts below the saturation threshold
  3. no common isotopes?

@pavel-shliaha
Copy link
Collaborator Author

  1. this will have to be loaded unfortunately from the P3D file
  2. there will always be isotopes below saturation threshold. THe number of identified isotopes is expected to be correlated with peptide intensity. The more there is of a peptide, the more isotopes it will have over a saturation threshold. I.e. for high intensity peptides (above saturation threshold), there will be more isotopes recorded than for a low intensity peptide and these isotopes will be below saturation threshold
  3. this situation is more complicated. Consider this: the peptide is identified in two runs in the first run it is high intensity and has 5 isotopes identified:

z2i0 - 10.000 z2i1 - 8000 z2i3 - 5000 z2i4 - 2000 z2i5 - 1000

for the second run the peptide intensity is low and it has the following distribution

z2i0 - 5000 z2i1 - 2000

saturation threshold is 6000. Now this becomes a problem since there are no peptide ions that are common that are below saturation threshold. The only option here is to predict the correct intensity of ions above saturation threshold, by using ions below saturation threshold in run 1. There is a simple function that can generate isotopic distribution and then we say given that

z2i3 - 5000 z2i4 - 2000 z2i5 - 1000, we should actually see z2i0 at 15000 and z2i1 at 10000. It should not be much work since the function is very simple (it is based on sample). See
IsotopicDistributionN {OrgMassSpecR} for an example

@sgibb
Copy link
Collaborator

sgibb commented Mar 31, 2015

  1. this will have to be loaded unfortunately from the P3D file

That's a pain! I will do it later if the whole method works as expected.

Regarding 3.: In theory it sounds simple (like always) but here I used the last three peptides of my example dataset and try to calculate the original counts using theoretical proportions of the isotopes (ignoring any saturation). As you can see the calculated values are sometimes very good and sometimes (especially for the less abundant isotopes) very bad.

# I changed the isotopic distribution format for easier parsing, order is equal
GYDLPDEVFDENEK      775                                       2_0:331;2_1:293;2_2:151;1_0:37
IFGLEK            64706 1_0:42448;1_1:17358;1_2:2956;1_3:417;2_0:1037;2_1:490;1_0:118;1_1:83
QNLFNNTVK          1266                                       2_0:703;2_1:399;2_2:164;1_0:33

Using the BRAIN package (which offers a similar function as OrgMassSpecR::IsotopicDistributionN but is a lot of faster) I get the following results:

library("BRAIN")
(props <- calculateIsotopicProbabilities(getAtomsFromSeq("GYDLPDEVFDENEK"), nrPeaks=3))
# [1] 0.3958128 0.3450117 0.1722195
c("2_0"=331, "2_1"=293, "2_2"=151)/props
#      2_0      2_1      2_2 
# 836.2540 849.2465 876.7883 
# original counts are 775

(props <- calculateIsotopicProbabilities(getAtomsFromSeq("IFGLEK"), nrPeaks=4))
# [1] 0.65718493 0.26507158 0.06427314 0.01155455
c("1_0"=42448, "1_1"=17358, "1_2"=2956, "1_3"=417)/props
#      1_0      1_1      1_2      1_3 
# 64590.65 65484.20 45991.22 36089.69
# original counts are 64706

(props <- calculateIsotopicProbabilities(getAtomsFromSeq("QNLFNNTVK"), nrPeaks=3))
# [1] 0.5474634 0.3145201 0.1056640
c("2_0"=703, "2_1"=399, "2_2"=164)/props
#      2_0      2_1      2_2 
# 1284.104 1268.599 1552.090
# original counts are 1266

If I don't know the true counts (because I assume they are affected by saturation) I would tent to use the less abundant isotopes (because it is less likely that they are saturated) to calculate the "original" counts. But as shown above these less abundant isotopes results in very wired counts. Slowly I doubt that it would be possible to estimate the true counts in a reasonable way.

BTW: Would it be save to sum up the counts of equal isotopes of different charge states? (e.g. 1_1 + 2_1 + 3_1) If that would be possible (as I assume) the results will differ even more.

@pavel-shliaha
Copy link
Collaborator Author

I understand the difficulty and it is expected. Perhaps it is indeed going to be complicated to calculate the intensity of saturated ions based on non-saturated ions, due to the fact that there is a more significant noise contribution to signal for less intense isotopes.

Also perhaps the situation outlined in my previous post is extremely rare and we dont even need this calculation. So I would prefer to have 2 separate functions: 1st - requantify with lower intensity isotopes, 2nd - predict intensity of higher abundant isotopes and requantify then.

The bottom line is I can only test whether this functionality is useful once it is available to me. Put it in synapter and I will test it using the saturation dataset we have,

@sgibb
Copy link
Collaborator

sgibb commented Apr 4, 2015

Ok, I just finished the first attempt to realize this feature. It is still clumsy but I hope it is enough to start experimenting with it.

default workflow

## start a new, clean R session
## install devtools if you haven't done it before
install.packages("devtools")
library("devtools")

## get the latest synapter with requantification support
install_github("lgatto/synapter@saturation")
library("synapter")

## load synapterdata example files
path <- system.file("extdata", package="synapterdata")

identfiles <- list.files(path, pattern="^HDMSe", full.names=TRUE) 
quantfiles <- list.files(path, pattern="^MSe.*IA", full.names=TRUE) 
pep3dfiles <- list.files(path, pattern="Pep3DAMRT", full.names=TRUE) 
fasta <- unzip(file.path(path, "EcoliK12_enolase_UPSsimga_NB.fasta.zip"))

set.seed(1)
s <- vector(mode="list", length=length(identfiles))

## run standard synapter workflow
system.time(
for (i in seq(along=s)) {
  s[[i]] <- Synapter(list(identpeptide=identfiles[i],
                          quantpeptide=quantfiles[i],
                          quantpep3d=pep3dfiles[i],
                          fasta=fasta))
  filterUniqueDbPeptides(s[[i]])
  filterIdentPepScore(s[[i]], method="BH", fdr=0.05)
  filterQuantPepScore(s[[i]], method="BH", fdr=0.05)
  filterIdentProtFpr(s[[i]], fpr=0.05)
  filterQuantProtFpr(s[[i]], fpr=0.05)
  filterIdentPpmError(s[[i]], ppm=20)
  filterQuantPpmError(s[[i]], ppm=20)
  mergePeptides(s[[i]])
  modelRt(s[[i]], span=0.05)
  searchGrid(s[[i]])
  setBestGridParams(s[[i]])
  findEMRTs(s[[i]])
})

## create sample names
sets <- paste0("ups", sub("^.*([0-9]{2,})fmol_.*([0-9]{2,})_IA.*", 
                          "\\1.\\2", basename(identfiles)))

## create msnsets
ms <- lapply(s, as, Class="MSnSet")
## update sampleNames/fvarLables of the msnsets
ms <- mapply(function(m, s) {
              sampleNames(m) <- s;
              m <- updateFvarLabels(m, s)
              m
}, m=ms, s=sets)

## combine msnsets
system.time(
  msc <- Reduce(combine, ms)
)

# ignore peptides with no isotopicDistr data for now (rescued)
#i <- which(rowSums(!is.na(fData(msc)[, grep("isotopicDistr", fvarLabels(msc))])) == ncol(msc))
#
#msc <- msc[i]

Nothing new yet.

requantification

The next step would be the requantification. There is a new function called requantify with three arguments (the MSnSet; saturationThreshold: the maximal unsaturated counts, method: could be "sum": the sum of the common isotopes below the saturationThreshold; "th.mean"/"th.median"/"th.weighted.mean": use the mean/median/weighted.median of the isotopes below the saturationThreshold multiplied with their theoretical probability of occurrence; the latter need the BRAIN package installed (call ìnstall.packages("BRAIN") before)).

summed <- requantify(msc, saturationThreshold=Inf, method="sum")
thmean <- requantify(msc, saturationThreshold=Inf, method="th.mean")
thmedian <- requantify(msc, saturationThreshold=Inf, method="th.median")
thwmean <- requantify(msc, saturationThreshold=Inf, method="th.weighted.mean")

synapter-plgs agreement

Another new function is the synapterPlgsAgreement discussed in #73.

msc <- synapterPlgsAgreement(msc)
cols <- grepl("synapterPlgsAgreement|nIdentified|nAgree|nDisagree", fvarLabels(msc))
head(fData(msc)[, cols])

What is missing/what to do next:

  • @pavel-shliaha has to test the correctness and usefulness of these functions.
  • There is no support for rescued EMRTs. As @pavel-shliaha mentioned before we have to read the isotopic distribution information from the ident pep3d file. I will add this if @pavel-shliaha told me that the functions produce useful results.
  • There is not any documentation of these functions yet (no vignette; not even a manual page).
  • We have to rethink about the new function/column/argument names.
  • Maybe a cleaner rewrite of this "quick hack" is needed.
  • The functions should be methods.

@pavel-shliaha I am looking forward to your feedback.

@lgatto
Copy link
Owner

lgatto commented Apr 6, 2015

A little technical comment/question regarding

The functions should be methods

Do you mean requantify? Why so? Using methods when functions work well enough or there is no requirement for dispatching on different inputs is not worth it. (I have been using methods when functions where fit for purpose too often in the past)

@sgibb
Copy link
Collaborator

sgibb commented Apr 6, 2015

@lgatto: Yes, I have seen that you change some methods to functions in some of your packages. What is the reason for that? Is there any drawback of S4 methods (except that dispatching needs some time and debugging is sometimes harder)?

@pavel-shliaha
Copy link
Collaborator Author

cannot find requantify function. Where does it live?

sgibb added a commit that referenced this issue Aug 20, 2015
@pavel-shliaha
Copy link
Collaborator Author

requantify(msc, saturationThreshold=Inf, method="sum") function does not work very well.

  1. easy to fix: if requantification is impossible can you not introduce NAs,but just leave the same quant values. Also introduce an additionl column "requantification" where you put TRUE, if requantification was succeful and FALSE if it was not. The reason is a user might not want to loose the data if requantification was not complete.

  2. do I understand correctly that you re just removing the isotopes that are above saturation threshold.

e.g.

peptide in run 1 isotopes: 100, 10, 1
peptide in run 2 isotopes: 1e4, 1e3,1e2

if saturation threshold is 1e3, do you jut remove the ions above that and get

peptide in run 1: 100 + 10 + 1
peptide in run 2: 1e2

if so than this should not be the case. You have to remove the isotopes that are above saturation threshold in one of the runs from all of the runs. So the result should be:

peptide in run : 1
peptide in run 2: 1e2

@sgibb
Copy link
Collaborator

sgibb commented Aug 23, 2015

1.) Is that really expected? I wouldn't expect that there are any original values in the MSnSet after requantification. Do you really think we should implement it the way you suggested? If the user want to mix up original and requantified values he could easily do so:

r <- requantify(msc, saturationThreshold=1e3, method="sum")
notNA <- !is.na(exprs(r))
exprs(msc)[notNA] <- exprs(r)[notNA]

2.) This is not true. The current behaviour is already the way you want it to be.
Default workflow:

library("devtools")
load_all("synapter")
path <- system.file("extdata", package="synapterdata")

identfiles <- list.files(path, pattern="^HDMSe", full.names=TRUE)
quantfiles <- list.files(path, pattern="^MSe.*IA", full.names=TRUE)
pep3dfiles <- list.files(path, pattern="Pep3DAMRT", full.names=TRUE)
fasta <- unzip(file.path(path, "EcoliK12_enolase_UPSsimga_NB.fasta.zip"))

set.seed(1)
s <- vector(mode="list", length=length(identfiles))

## run standard synapter workflow
system.time(
for (i in seq(along=s)) {
  s[[i]] <- Synapter(list(identpeptide=identfiles[i],
                          quantpeptide=quantfiles[i],
                          quantpep3d=pep3dfiles[i],
                          fasta=fasta))
  filterUniqueDbPeptides(s[[i]])
  filterIdentPepScore(s[[i]], method="BH", fdr=0.05)
  filterQuantPepScore(s[[i]], method="BH", fdr=0.05)
  filterIdentProtFpr(s[[i]], fpr=0.05)
  filterQuantProtFpr(s[[i]], fpr=0.05)
  filterIdentPpmError(s[[i]], ppm=20)
  filterQuantPpmError(s[[i]], ppm=20)
  mergePeptides(s[[i]])
  modelRt(s[[i]], span=0.05)
  searchGrid(s[[i]])
  setBestGridParams(s[[i]])
  findEMRTs(s[[i]])
})

## create msnsets
ms <- lapply(s, as, Class="MSnSet")

## combine msnsets
system.time(
  msc <- Reduce(combine, ms)
)

To show the selected isotopes I use some internal functions here:

f <- fData(msc)[, grep("isotopicDistr", fvarLabels(msc))]
x <- .splitIsotopicDistr(unlist(f[22, ]))
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_01
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4
#  16684   6747   1885    430 137906  55855  15305   3352    549
#
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_03
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4
#  10610   4355   1409    190 136850  57326  16114   1718    325
#
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_05
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4
#   9983   4544    746    148 136386  56244  15337   3329    645
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_01
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4
#  12127   5041    990    177 148808  60998  16693   3656    577
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_03
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4    2_5
#   8180   3325    927    205 133856  54500  15275   3102    498     81
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_05
#    1_0    1_1    1_2    1_3    2_0    2_1    2_2    2_3    2_4
#  10893   4425   1365    276 141491  58884  16458   3489    345
#
.commonIsotopes(x, saturationThreshold=1e2)
# character(0)
.commonIsotopes(x, saturationThreshold=1e3)
# [1] "1_3" "2_4"

# 1_2 in run 1 is the only run where 1_2 > 1800 and all 1_2 isotopes are filtered
.commonIsotopes(x, saturationThreshold=1.8e3)
# [1] "1_3" "2_4"
.commonIsotopes(x, saturationThreshold=1.9e3)
# [1] "1_2" "1_3" "2_4"

Question: The current behaviour has some disadvantages. E.g. the third EMRT was seen in 5 of 6 runs with a good isotopic distribution. Because we require that the isotopes must be present in all runs (even when the EMRT was not observed at all), we could not find any common isotopes (because the first run has not any isotope) and could not requantify this EMRT:

x <- .splitIsotopicDistr(unlist(f[3, ]))
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_01
# <NA>
#   NA
#
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_03
#   1_0   1_1   1_2   1_3   2_0   2_1   2_2   2_3   2_4
# 10256  6073  1915   450 80564 48499 16732  4034   852
#
# $isotopicDistr.MSe_101111_25fmol_UPS1_in_Ecoli_05
#   1_0   1_1   1_2   1_3   2_0   2_1   2_2   2_3   2_4   2_5
#  9710  5588  1699   426 86293 46817 14378  3842   847   264
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_01
#   1_0   1_1   1_2   1_3   2_0   2_1   2_2   2_3   2_4   2_5
#  9443  5068  1716   423 87367 47123 16806  4317   992   130
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_03
#   1_0   1_1   1_2   1_3   2_0   2_1   2_2   2_3   2_4   2_5
# 10260  5859  1842   428 79717 46234 15919  3935  1026   138
#
# $isotopicDistr.MSe_111111_50fmol_UPS1_in_Ecoli_05
#   1_0   1_1   1_2   1_3   2_0   2_1   2_2   2_3   2_4   2_5
#  8439  5036  1629   347 68177 41550 14343  3415   751   122
#
.commonIsotopes(x, saturationThreshold=Inf)
# character(0)

I would suggest to ignore NA runs (where the EMRT was not observed) and requantify all the other runs (so in this example 2 runs out of 6 would be enough for requantification).

@pavel-shliaha
Copy link
Collaborator Author

I would actually like to suggest the following method to deal with the problem:

imagine we have a peptide at 10, 20, 50, 100 and 500 fmol (this is actually what we see in the organelle proteomics datasets a lot), the problem is that 10 and 500 fmol do not have any isotopes in common that are not over sat threshold in 500, but are still present at 10, so correction is not possible. 50 fmol however has common isotopes with both 10 and 500fmol, so we could quantify 10 vs 50 and 50 vs 500 to solve this problem. Thus my suggestion is:

  1. nominate a run where all isotopes are under sat threshold (reference run). If there is several such runs than we could either:
  • use the run with the overall peptide highest intensity (e.g. if we have 20 and 50, take 50)
  • use all runs and find an average
  1. requantify only those runs that have ions over sat threshold, i.e. leave 10 as it is, but requantify only 500.

@sgibb
Copy link
Collaborator

sgibb commented Aug 30, 2015

at 1) I would suggest to let the user choose a reference run and don't try to find a run automatically. (If we try to find a run automatically it could happen that the reference run differs across peptides. Or we have to analyse all runs at once.)
at 2) that's sounds reasonable. Additionally I would suggest to ignore NA runs.

@sgibb sgibb added this to the 2.0 milestone Aug 30, 2015
@sgibb sgibb self-assigned this Aug 30, 2015
@sgibb
Copy link
Collaborator

sgibb commented Sep 2, 2015

As discussed in our last skype call I add a new requantify method: "reference".

It looks for an unsaturated run (none of the isotopes is higher than the saturation threshold) with the highest summed intensity value and use this as reference. A run with intensity values above the saturation threshold would be corrected as follows:

  1. The unsaturated intensities are divided by their counterparts of the reference run. We get a scale factor for each isotope.
  2. We calculate the mean scale factor.
  3. We estimate the (unsaturated) intensities by multiplying the scale factor by the intensities of the reference run.

Example:

reference= "1_0:2000;1_1:200;1_2:20;1_3:2;"
satrun   = "1_0:6000;1_1:1000;1_2:100;1_3:10;1_4:1"

satThreshold = 5000

# 1. step
satrun[2:4]/reference[2:4] # 1000/200, 100/20, 10/2
# 5, 5, 5

# 2. step
scaleFactor = mean(c(5, 5, 5)) # 5

# 3. step 
unsatInt = reference[1]*scaleFactor # 10000
correctedInt = sum(10000, 1000, 100, 10, 1) # 11111

A more complete example (ref is run 3 and sat is run 4)

f <- data.frame(isotopicDistr.run1="1_0:1000;1_1:100;1_2:10;1_3:1;",
                isotopicDistr.run2="1_0:800;1_1:100;1_2:10;1_3:1;",
                isotopicDistr.run3="1_0:2000;1_1:200;1_2:20;1_3:2;",
                isotopicDistr.run4="1_0:6000;1_1:1000;1_2:100;1_3:10;1_4:1",
                stringsAsFactors=FALSE)

synapter:::.requantifyReferenceRun(f, saturationThreshold=Inf)
# isotopicDistr.run1 isotopicDistr.run2 isotopicDistr.run3 isotopicDistr.run4
#               1111                911               2222               7111
synapter:::.requantifyReferenceRun(f, saturationThreshold=5000)
# isotopicDistr.run1 isotopicDistr.run2 isotopicDistr.run3 isotopicDistr.run4
#               1111                911               2222              11111

The user visible function (msc object from the example code some comments above):

refcorrected <- requantify(msc, saturationThreshold=5000, method="reference")

@sgibb
Copy link
Collaborator

sgibb commented Nov 29, 2015

We need to review all saturation correction algorithms. Even "sum" has its bugs.

"Sum" (maybe others too) should get an option to decide whether we want to remove unsaturated isotopes that are below the threshold but not present in all runs (current behaviour) or if we want to keep them:

run1 z1:10, z2:5, z3:1; run2 z2:5
threshold: 20: run1: 15 (or 16, z3 is missing in run2), run2: 5
threshold: 8: run1: 5 (or 6?), run2: 5

@sgibb
Copy link
Collaborator

sgibb commented Dec 3, 2015

A question regarding reference run requantification:

We have the following situation:

reference run: z1_1: 100, z1_2: 10, z1_3
run500mol:     z1_1: 2000, z1_2: 1000, z1_3: 500, z2_1: 100, z2_2: 10

If the saturation threshold is 400, the reference run and run500mol have not any unsaturated isotope in common. How we handle this?

Sounds strange? But happens at least 2 times in your example data set, e.g. for GPFPIIV (second row, second plot):

ions_reference6

@pavel-shliaha
Copy link
Collaborator Author

"If the saturation threshold is 400, the reference run and run500mol have not any unsaturated isotope in common. How we handle this?"

you cant hadle this by reference approach. First some definitions so we are on the same page. Assuming we are talking about a particular peptide: SATURATED runs are those runs that have isotopes over saturation threshold. UNSATURATED runs are those runs that have isotopes below certain threshold. REFERENCE run is unsaturated run with the highest intensity.

The idea of SUM method is to use common isotopes below saturation threshold for requantification. However you will run into the problem there are no common isotopes between certain runs. This can be solved partly by reference method that does not require common isotopes between all runs, but requires common isotopes only between reference run and saturated runs. However there still can be situations in which there are no isotopes in common beow saturation theshold between reference and quantitation run, in which case requantification is impossible.

@sgibb
Copy link
Collaborator

sgibb commented Jan 25, 2016

@pavel-shliaha asked me to look at some wired results of the detector saturation correction using the reference method.

DDPHACYSTVFDK

dd

99.92 % of the signal after reference requantification is from the 500 fmol samples. Why this happen? If you look at the fData(x)$isotopicDistr columns you can see that there are not any information about the isotopic distribution available for all samples except the 500 fmol ones and a single 75 fmol.

As fData(x)$synapterPlgsAgreement shows there was no transfer because the grid search could not match ident and quant run. IHMO no requantification could solve this.
What to do? Would it be a good solution not to requantificated any peptides where some sample could not matched in the grid search?

load_all("../../synapter/")
combMSnSet <- readRDS(file.path("result", "combMSnSet.RDS"))

combScheme <- list("25" = 6:7, "50" = 11:13, "75" = 14:16,
                   "100" = 1:3, "250" = 4:5, "500" = 8:10)
j <- unlist(combScheme)
i <- which(featureNames(combMSnSet) == "DDPHACYSTVFDK")
dd <- combMSnSet[i]

fvarLabels(dd) <- gsub("29022012_Ecoli_spikes_", "", fvarLabels(dd))
iso <- fData(dd)[grepl("isotopicDistr", fvarLabels(dd))][j]
agree <- fData(dd)[grepl("synapterPlgsAgreement", fvarLabels(dd))][j]
rownames(iso) <- rownames(agree) <- NULL

e <- exprs(dd)[, j]
r <- exprs(requantify(dd, method="reference", sat=1e5))[, j]
rownames(e) <- rownames(r) <- NULL

e
#  29022012_Ecoli_spikes_25fmol_04  29022012_Ecoli_spikes_25fmol_06
#                            14090                            13879
#  29022012_Ecoli_spikes_50fmol_02  29022012_Ecoli_spikes_50fmol_04
#                            41539                            28718
#  29022012_Ecoli_spikes_50fmol_07  29022012_Ecoli_spikes_75fmol_02
#                            47001                            47370
#  29022012_Ecoli_spikes_75fmol_04  29022012_Ecoli_spikes_75fmol_06
#                            53378                            41788
# 29022012_Ecoli_spikes_100fmol_02 29022012_Ecoli_spikes_100fmol_04
#                            44478                            77878
# 29022012_Ecoli_spikes_100fmol_06 29022012_Ecoli_spikes_200fmol_02
#                            70682                           159945
# 29022012_Ecoli_spikes_200fmol_04 29022012_Ecoli_spikes_500fmol_02
#                           124565                           185455
# 29022012_Ecoli_spikes_500fmol_04 29022012_Ecoli_spikes_500fmol_06
#                           235531                           239742

r
#  29022012_Ecoli_spikes_25fmol_04  29022012_Ecoli_spikes_25fmol_06
#                                0                                0
#  29022012_Ecoli_spikes_50fmol_02  29022012_Ecoli_spikes_50fmol_04
#                                0                                0
#  29022012_Ecoli_spikes_50fmol_07  29022012_Ecoli_spikes_75fmol_02
#                                0                                0
#  29022012_Ecoli_spikes_75fmol_04  29022012_Ecoli_spikes_75fmol_06
#                                0                              518
# 29022012_Ecoli_spikes_100fmol_02 29022012_Ecoli_spikes_100fmol_04
#                                0                                0
# 29022012_Ecoli_spikes_100fmol_06 29022012_Ecoli_spikes_200fmol_02
#                                0                                0
# 29022012_Ecoli_spikes_200fmol_04 29022012_Ecoli_spikes_500fmol_02
#                                0                           185455
# 29022012_Ecoli_spikes_500fmol_04 29022012_Ecoli_spikes_500fmol_06
#                           235531                           239742

iso
#   isotopicDistr.25fmol_04 isotopicDistr.25fmol_06 isotopicDistr.50fmol_02
# 1                    <NA>                    <NA>                    <NA>
#   isotopicDistr.50fmol_04 isotopicDistr.50fmol_07 isotopicDistr.75fmol_02
# 1                    <NA>                    <NA>                    <NA>
#   isotopicDistr.75fmol_04 isotopicDistr.75fmol_06 isotopicDistr.100fmol_02
# 1                    <NA>         2_0:321;2_1:197                     <NA>
#   isotopicDistr.100fmol_04 isotopicDistr.100fmol_06 isotopicDistr.200fmol_02
# 1                     <NA>                     <NA>                     <NA>
#   isotopicDistr.200fmol_04
# 1                     <NA>
#                                                                                                         isotopicDistr.500fmol_02
# 1 2_0:40003;2_1:32600;2_2:18058;2_3:6621;2_4:1735;2_5:421;2_6:127;3_0:32190;3_1:25740;3_2:18706;3_3:6798;3_4:1939;3_5:435;3_6:82
#                                                                                                         isotopicDistr.500fmol_04
# 1 2_0:53185;2_1:45888;2_2:26425;2_3:9358;2_4:2456;2_5:531;2_6:119;3_0:36456;3_1:29678;3_2:20958;3_3:7640;3_4:2300;3_5:469;3_6:68
#                                                                                                  isotopicDistr.500fmol_06
# 1 2_0:53234;2_1:46184;2_2:24468;2_3:8270;2_4:2146;2_5:463;3_0:38756;3_1:34138;3_2:21667;3_3:7529;3_4:2267;3_5:493;3_6:127

agree
#   synapterPlgsAgreement.25fmol_04 synapterPlgsAgreement.25fmol_06
# 1            no_synapter_transfer            no_synapter_transfer
#   synapterPlgsAgreement.50fmol_02 synapterPlgsAgreement.50fmol_04
# 1            no_synapter_transfer            no_synapter_transfer
#   synapterPlgsAgreement.50fmol_07 synapterPlgsAgreement.75fmol_02
# 1            no_synapter_transfer            no_synapter_transfer
#   synapterPlgsAgreement.75fmol_04 synapterPlgsAgreement.75fmol_06
# 1            no_synapter_transfer            no_synapter_transfer
#   synapterPlgsAgreement.100fmol_02 synapterPlgsAgreement.100fmol_04
# 1             no_synapter_transfer             no_synapter_transfer
#   synapterPlgsAgreement.100fmol_06 synapterPlgsAgreement.200fmol_02
# 1             no_synapter_transfer             no_synapter_transfer
#   synapterPlgsAgreement.200fmol_04 synapterPlgsAgreement.500fmol_02
# 1             no_synapter_transfer                            agree
#   synapterPlgsAgreement.500fmol_04 synapterPlgsAgreement.500fmol_06
# 1                            agree                            agree

IGEEYISDLDQLR

ig

Same problem as above but just for one 500 fmol sample.

e
#  29022012_Ecoli_spikes_25fmol_04  29022012_Ecoli_spikes_25fmol_06
#                            17933                            24056
#  29022012_Ecoli_spikes_50fmol_02  29022012_Ecoli_spikes_50fmol_04
#                            61520                            55124
#  29022012_Ecoli_spikes_50fmol_07  29022012_Ecoli_spikes_75fmol_02
#                            55565                            74681
#  29022012_Ecoli_spikes_75fmol_04  29022012_Ecoli_spikes_75fmol_06
#                            77651                            74222
# 29022012_Ecoli_spikes_100fmol_02 29022012_Ecoli_spikes_100fmol_04
#                            89107                           113455
# 29022012_Ecoli_spikes_100fmol_06 29022012_Ecoli_spikes_200fmol_02
#                           105244                           225785
# 29022012_Ecoli_spikes_200fmol_04 29022012_Ecoli_spikes_500fmol_02
#                           215329                           214903
# 29022012_Ecoli_spikes_500fmol_04 29022012_Ecoli_spikes_500fmol_06
#                           252642                           280309

r
#  29022012_Ecoli_spikes_25fmol_04  29022012_Ecoli_spikes_25fmol_06
#                          17933.0                          24056.0
#  29022012_Ecoli_spikes_50fmol_02  29022012_Ecoli_spikes_50fmol_04
#                          61520.0                          55124.0
#  29022012_Ecoli_spikes_50fmol_07  29022012_Ecoli_spikes_75fmol_02
#                          55565.0                          74681.0
#  29022012_Ecoli_spikes_75fmol_04  29022012_Ecoli_spikes_75fmol_06
#                          77651.0                          74222.0
# 29022012_Ecoli_spikes_100fmol_02 29022012_Ecoli_spikes_100fmol_04
#                          89107.0                         113455.0
# 29022012_Ecoli_spikes_100fmol_06 29022012_Ecoli_spikes_200fmol_02
#                         105244.0                         225785.0
# 29022012_Ecoli_spikes_200fmol_04 29022012_Ecoli_spikes_500fmol_02
#                         215329.0                              0.0
# 29022012_Ecoli_spikes_500fmol_04 29022012_Ecoli_spikes_500fmol_06
#                         252642.0                         318415.1

iso
#                      isotopicDistr.25fmol_04
# 1 2_0:8013;2_1:6519;2_2:2411;2_3:749;2_4:241
#                                       isotopicDistr.25fmol_06
# 1 2_0:10824;2_1:8347;2_2:3532;2_3:861;3_0:197;3_1:175;3_2:120
#                                 isotopicDistr.50fmol_02
# 1 2_0:27650;2_1:21582;2_2:9207;2_3:2400;2_4:502;2_5:179
#                                                isotopicDistr.50fmol_04
# 1 2_0:23662;2_1:19309;2_2:8549;2_3:2520;2_4:674;2_5:244;3_0:108;3_1:58
#                                 isotopicDistr.50fmol_07
# 1 2_0:24159;2_1:19193;2_2:8596;2_3:2628;2_4:802;2_5:187
#                                  isotopicDistr.75fmol_02
# 1 2_0:32999;2_1:26262;2_2:11456;2_3:3084;2_4:654;2_5:226
#                                         isotopicDistr.75fmol_04
# 1 2_0:34799;2_1:27407;2_2:11170;2_3:3322;2_4:727;3_0:144;3_1:82
#                                                  isotopicDistr.75fmol_06
# 1 2_0:32917;2_1:25482;2_2:11430;2_3:3056;2_4:830;2_5:227;3_0:170;3_1:110
#                                                isotopicDistr.100fmol_02
# 1 2_0:39872;2_1:30827;2_2:13170;2_3:3771;2_4:972;2_5:236;3_0:188;3_1:71
#                                          isotopicDistr.100fmol_04
# 1 2_0:50963;2_1:39987;2_2:16566;2_3:4480;2_4:1097;3_0:228;3_1:134
#                                  isotopicDistr.100fmol_06
# 1 2_0:46435;2_1:38035;2_2:15497;2_3:4051;2_4:1019;2_5:207
#                                         isotopicDistr.200fmol_02
# 1 2_0:97225;2_1:80863;2_2:35200;2_3:9747;2_4:2216;2_5:455;2_6:79
#                                                                                  isotopicDistr.200fmol_04
# 1 2_0:91501;2_1:76143;2_2:32803;2_3:10120;2_4:2229;2_5:504;2_6:143;3_0:653;3_1:560;3_2:430;3_3:160;3_4:83
#   isotopicDistr.500fmol_02
# 1                     <NA>
#                                                                                  isotopicDistr.500fmol_04
# 1 1_0:401;1_1:372;2_0:94256;2_1:83636;2_2:51768;2_3:17209;2_4:3779;2_5:807;2_6:104;3_0:128;3_1:100;3_2:82
#                                                                                    isotopicDistr.500fmol_06
# 1 1_0:413;1_1:379;2_0:103359;2_1:95753;2_2:56364;2_3:18257;2_4:4074;2_5:751;2_6:150;3_0:361;3_1:259;3_2:189

agree
#   synapterPlgsAgreement.25fmol_04 synapterPlgsAgreement.25fmol_06
# 1                           agree                           agree
#   synapterPlgsAgreement.50fmol_02 synapterPlgsAgreement.50fmol_04
# 1                           agree                           agree
#   synapterPlgsAgreement.50fmol_07 synapterPlgsAgreement.75fmol_02
# 1                           agree                           agree
#   synapterPlgsAgreement.75fmol_04 synapterPlgsAgreement.75fmol_06
# 1                           agree                           agree
#   synapterPlgsAgreement.100fmol_02 synapterPlgsAgreement.100fmol_04
# 1                            agree                            agree
#   synapterPlgsAgreement.100fmol_06 synapterPlgsAgreement.200fmol_02
# 1                            agree                            agree
#   synapterPlgsAgreement.200fmol_04 synapterPlgsAgreement.500fmol_02
# 1                            agree             no_synapter_transfer
#   synapterPlgsAgreement.500fmol_04 synapterPlgsAgreement.500fmol_06
# 1                            agree                            agree

BTW: the "sum" requantification isn't working for these both examples (resulting in zero intensity for all samples).

@pavel-shliaha
Copy link
Collaborator Author

Thanks for having a look into it. In both cases it is the artifact of plotting requantified results, not of requantification itself. What I mean is that the code to plot requantification is just for the paper it is not a part of basic synapter functionality (unless we want to give visualisaion tool to the user?) About the peptides:

  1. for "DDPHACYSTVFDK" there is a plot before ("none") requantifiction and after. If there is no transfer then where does the "none" plot come from?
  2. for "IGEEYISDLDQLR" when making the plots IGNORE the peptide measurements in which transfer did not happen, i.e. the 500fmol point should be mean (c (225785.0, 318415.1)), not mean (c (225785.0, 318415.1, 0)). I am still confused why you loose a measurement even if you do not transfer by synapter (I always run code with "rescue" option).

@pavel-shliaha
Copy link
Collaborator Author

@sgibb still too peptides are requantified by sum but not by reference: "FFQELR" "ESNEITIIINPYRETVCFSVEPVK". Can u have a look?

@sgibb
Copy link
Collaborator

sgibb commented Apr 12, 2016

It is the other way around: both are requantified by reference but not by sum (what makes absolutely sense to me):

## fetch peptide of interest
poi <- combMSNset[grep("^FFQELR$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

m <- synapter:::.isotopicDistr2matrix(iso)
m
#                            1_0   1_1  1_2 1_3 1_4  2_0  2_1 2_2 2_3
# isotopicDistr.S130423_05   632   307   NA  NA  NA 6099 2445 696  NA
# isotopicDistr.S130423_07    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_09 37909 16706 2711 501 174 8502 3463 677 125
# isotopicDistr.S130423_11    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_13    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_06    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_08    NA    NA   NA  NA  NA 5538 1675  NA  NA
# isotopicDistr.S130423_10    NA    NA   NA  NA  NA 4842 1439 484  NA
# isotopicDistr.S130423_12    NA    NA   NA  NA  NA 4128 1625 371  NA
# isotopicDistr.S130423_14    NA    NA   NA  NA  NA 3386 1290 349  NA

exprs(requantify(poi, method="sum", saturationThreshold=3e4))
#        S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06
# FFQELR         NA         NA         NA         NA         NA         NA
#        S130423_08 S130423_10 S130423_12 S130423_14
# FFQELR         NA         NA         NA         NA
exprs(requantify(poi, method="reference", saturationThreshold=3e4))
#        S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06
# FFQELR      10179         NA      70768         NA         NA         NA
#        S130423_08 S130423_10 S130423_12 S130423_14
# FFQELR         NA       6765       6124         NA

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented May 10, 2016

Having problem applying the rescaleForTop3() function. Below is a figure of three datasets:

  1. not requantified
  2. requantified using SUM method
  3. requantified using SUM method with the intensities repredicted by rescaleForTop3()

image

I had a look at a peptide I thin is requantified incorrectly:

image

as you can see the sum functionality has decreased the ratio from 0.2883538 to 0.207168 as expected but top3 requantitation has put it back up to 0.3473387. This probably should not happen. The code is in:
...\synapter2paper\kuharev2015\bugs_investigation\for_bug_investigation_requant_top3

@sgibb
Copy link
Collaborator

sgibb commented May 13, 2016

It seems that it is not a bug but a feature. rescaleForTop3 has an argument onlyForSaturatedRuns (default TRUE) that controls whether unsaturated runs should be rescaled or not. I modified your plotting function and marked all unsaturated peptides with black dots and you can see that they cause the wired pattern after rescaling (because they are not requantified or rescaled at all):
rescaletop3
(the right bottom panel was generated with rescaleForTop3(..., saturationThreshold=3e4, onlyForSaturatedRuns=FALSE))

You could find the modified code in the same directory.

@sgibb
Copy link
Collaborator

sgibb commented May 13, 2016

Another figure for the same fact (please note that the colours have different meanings now: blue: no saturation; red: saturation in one sample (A or B; that is the important fraction that cause the "strange" behaviour in the bottom left panel); green: saturation in both samples:
rescaletop3

@sgibb
Copy link
Collaborator

sgibb commented May 14, 2016

basic information

library (synapter)
library (MSnbase)

combMSNset <- readRDS ("refCombMSNSet.RDS")
satThreshold <- 3e4

## samples
a <- 1:5
b <- 6:10

## fetch peptide of interest
poi <- combMSNset[grep("^AGMVAGVIVNR$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]
isom <- synapter:::.isotopicDistr2matrix(iso)
isom
##                            1_0  1_1  1_2 1_3   2_0   2_1   2_2  2_3 2_4
## isotopicDistr.S130423_05  1378  608  273  NA 25649 15016  5707 1150  NA
## isotopicDistr.S130423_07  1246  728  255  NA 25150 11517  4869 1230  NA
## isotopicDistr.S130423_09   857  357  180  NA 19434  9894  3891  589  NA
## isotopicDistr.S130423_11   930  365   NA  NA 19231 10580  4375   NA  NA
## isotopicDistr.S130423_13   759  400  147  NA 14761  8035  3321  813  NA
## isotopicDistr.S130423_06 10708 4943 1390 341 88884 63955 27550 6491  NA
## isotopicDistr.S130423_08  7132 3480  760 184 61927 45300 21154 5227 938
## isotopicDistr.S130423_10  5054 2233  644 139 55808 40075 17012 4064 678
## isotopicDistr.S130423_12  4741 2000  746 155 50171 35098 14078 3730 670
## isotopicDistr.S130423_14  3451 1723  500 154 39062 25263 11036 3078  NA
## all samples "a" are not saturated but all of "b"
synapter:::.runsUnsaturated(t(isom), saturationThreshold=satThreshold)
## isotopicDistr.S130423_05 isotopicDistr.S130423_07 isotopicDistr.S130423_09
##                     TRUE                     TRUE                     TRUE
## isotopicDistr.S130423_11 isotopicDistr.S130423_13 isotopicDistr.S130423_06
##                     TRUE                     TRUE                    FALSE
## isotopicDistr.S130423_08 isotopicDistr.S130423_10 isotopicDistr.S130423_12
##                    FALSE                    FALSE                    FALSE
## isotopicDistr.S130423_14
##                    FALSE
## run requantification and rescaling
req <- requantify(poi, method="sum", saturationThreshold=satThreshold,
                  onlyCommonIsotopes=FALSE)
top3 <- rescaleForTop3(poi, req, satThreshold, onlyForSaturatedRuns=TRUE)
top3all <- rescaleForTop3(poi, req, satThreshold, onlyForSaturatedRuns=FALSE)

intensity table

tab <- rbind(exprs(poi), exprs(req), exprs(top3), exprs(top3all))
rownames(tab) <- c("initial", "sum", "top3", "top3all")
knitr::kable(tab)
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
initial 49781.00 44995.00 35202.00 35481.00 28236.00 204262.0 146102.00 125707.00 111389.00 84267.00
sum 9116.00 8328.00 5874.00 5670.00 5440.00 51423.0 38875.00 29824.00 26120.00 19942.00
top3 49781.00 44995.00 35202.00 35481.00 28236.00 130797.1 98880.57 75858.89 66437.57 50723.51
top3all 44431.54 40590.82 28629.98 27635.68 26514.66 250636.6 189477.44 145362.70 127309.34 97197.66

plot intensities

plot(NA, xlim=c(14, 18), ylim=c(-3, 3),
     xlab="intensity sample B", ylab="A/B")
abline(h=c(1, 0, -2), col="#808080")
col <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3")
points(log2(tab[, b]), log2(tab[, a]/tab[, b]), col=col, pch=20)
legend("topright", legend=rownames(tab), col=col, pch=20)

unnamed-chunk-3-1

intensity ratios

ratios <- rbind(exprs(poi)[, a]/exprs(poi)[, b],
                exprs(req)[, a]/exprs(req)[, b],
                exprs(top3)[, a]/exprs(top3)[, b],
                exprs(top3all)[, a]/exprs(top3all)[, b])
rownames(ratios) <- c("initial", "sum", "top3", "top3all")
knitr::kable(ratios)
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
initial 0.2437115 0.3079698 0.2800321 0.3185324 0.3350778
sum 0.1772748 0.2142251 0.1969555 0.2170750 0.2727911
top3 0.3805972 0.4550439 0.4640458 0.5340502 0.5566650
top3all 0.1772748 0.2142251 0.1969555 0.2170750 0.2727911

detailed walkthrough for onlyForSaturatedRuns=TRUE

## exprs before requantification
eBefore <- exprs(poi)
eBefore
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR     204262     146102     125707     111389      84267
## exprs after requantification
eAfter <- exprs(req)
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR       9116       8328       5874       5670       5440
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942
## grep isotopic information
isotop <- as.matrix(iso)
isotop
##             isotopicDistr.S130423_05
## AGMVAGVIVNR "1_0:1378;1_1:608;1_2:273;2_0:25649;2_1:15016;2_2:5707;2_3:1150"
##             isotopicDistr.S130423_07
## AGMVAGVIVNR "1_0:1246;1_1:728;1_2:255;2_0:25150;2_1:11517;2_2:4869;2_3:1230"
##             isotopicDistr.S130423_09
## AGMVAGVIVNR "1_0:857;1_1:357;1_2:180;2_0:19434;2_1:9894;2_2:3891;2_3:589"
##             isotopicDistr.S130423_11
## AGMVAGVIVNR "1_0:930;1_1:365;2_0:19231;2_1:10580;2_2:4375"
##             isotopicDistr.S130423_13
## AGMVAGVIVNR "1_0:759;1_1:400;1_2:147;2_0:14761;2_1:8035;2_2:3321;2_3:813"
##             isotopicDistr.S130423_06
## AGMVAGVIVNR "1_0:10708;1_1:4943;1_2:1390;1_3:341;2_0:88884;2_1:63955;2_2:27550;2_3:6491"
##             isotopicDistr.S130423_08
## AGMVAGVIVNR "1_0:7132;1_1:3480;1_2:760;1_3:184;2_0:61927;2_1:45300;2_2:21154;2_3:5227;2_4:938"
##             isotopicDistr.S130423_10
## AGMVAGVIVNR "1_0:5054;1_1:2233;1_2:644;1_3:139;2_0:55808;2_1:40075;2_2:17012;2_3:4064;2_4:678"
##             isotopicDistr.S130423_12
## AGMVAGVIVNR "1_0:4741;1_1:2000;1_2:746;1_3:155;2_0:50171;2_1:35098;2_2:14078;2_3:3730;2_4:670"
##             isotopicDistr.S130423_14
## AGMVAGVIVNR "1_0:3451;1_1:1723;1_2:500;1_3:154;2_0:39062;2_1:25263;2_2:11036;2_3:3078"
## if we want to handle only saturated runs we have to know which ones are
## unsaturated (this code block is skipped for onlyForSaturatedRuns=FALSE
unsat <- t(apply(isotop, 1, function(x)synapter:::.runsUnsaturated(t(synapter:::.isotopicDistr2matrix(x)), saturationThreshold=satThreshold)))
unsat
##             isotopicDistr.S130423_05 isotopicDistr.S130423_07
## AGMVAGVIVNR                     TRUE                     TRUE
##             isotopicDistr.S130423_09 isotopicDistr.S130423_11
## AGMVAGVIVNR                     TRUE                     TRUE
##             isotopicDistr.S130423_13 isotopicDistr.S130423_06
## AGMVAGVIVNR                     TRUE                    FALSE
##             isotopicDistr.S130423_08 isotopicDistr.S130423_10
## AGMVAGVIVNR                    FALSE                    FALSE
##             isotopicDistr.S130423_12 isotopicDistr.S130423_14
## AGMVAGVIVNR                    FALSE                    FALSE
## replace requantified values of unsaturated runs with their original
## intensities
eAfter[unsat] <- eBefore[unsat]
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942

⚠️ Maybe it is wrong to include the unsaturated runs for proportion calculation here?!

## calculation proportions
prop <- eAfter/rowSums(eAfter, na.rm=TRUE)
prop
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   0.138327  0.1250281 0.09781621 0.09859147  0.0784597
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR  0.1428897  0.1080224  0.0828723 0.07257995 0.05541307
## calculate correction factor
cf <- eBefore/prop
cf
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR     359879     359879     359879     359879     359879
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR    1429508    1352516    1516876    1534708    1520706
cfm <- rowMeans(cf, na.rm=TRUE)
cfm
## AGMVAGVIVNR
##    915370.9
## calculate new intensities
eNew <- cfm * prop
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   126620.5   114447.1   89538.11   90247.76   71819.73
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   130797.1   98880.57   75858.89   66437.57   50723.51
## replace unsaturated runs with original values
## (this code block is skipped for onlyForSaturatedRuns=FALSE
eNew[unsat] <- eBefore[unsat]
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   130797.1   98880.57   75858.89   66437.57   50723.51

detailed walkthrough for onlyForSaturatedRuns=FALSE

## exprs before requantification
eBefore <- exprs(poi)
eBefore
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR      49781      44995      35202      35481      28236
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR     204262     146102     125707     111389      84267
## exprs after requantification
eAfter <- exprs(req)
eAfter
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR       9116       8328       5874       5670       5440
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR      51423      38875      29824      26120      19942
## calculation proportions
prop <- eAfter/rowSums(eAfter, na.rm=TRUE)
prop
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR 0.04544095 0.04151297  0.0292804 0.02826351 0.02711702
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR  0.2563306   0.193782  0.1486651  0.1302016 0.09940582
## calculate correction factor
cf <- eBefore/prop
cf
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR    1095510    1083878    1202238    1255364    1041265
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   796869.3   753950.2   845571.8   855511.9   847706.9
cfm <- rowMeans(cf, na.rm=TRUE)
cfm
## AGMVAGVIVNR
##    977786.4
## calculate new intensities
eNew <- cfm * prop
eNew
##             S130423_05 S130423_07 S130423_09 S130423_11 S130423_13
## AGMVAGVIVNR   44431.54   40590.82   28629.98   27635.68   26514.66
##             S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
## AGMVAGVIVNR   250636.6   189477.4   145362.7   127309.3   97197.66

@pavel-shliaha
Copy link
Collaborator Author

Hey Sebastian I think u pinpointed the problem yourself here. It is indeed incorrect to include saturated runs into calculation of correction factor.

I believe you got confused: onlyForSaturTedRuns refers to whether u have to requantify all peptides or only those above saturation threshold, the correction factor should always be computed on unsaturated peptides only. This is analogous to isotopic correction we perform in theoretical methods.

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented May 19, 2016

Hey Sebastian,

please execute this code and tell me what you see

library(synapter)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS")
satThresholdIon <- 3e4

satCorrected <- sapply (c ("sat", "sum", "reference"),
                        function (x) NULL)

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "sum", 
                                 saturationThreshold= satThresholdIon,
                                 onlyCommonIsotopes=FALSE)

satCorrected[[3]]    <- requantify (combMSNSet, method = "reference", 
                                    saturationThreshold= satThresholdIon)

# function to find requantified row
findDifferentquant <- function (MSnSet1, MSnSet2){
  selA <- c () 
  for (i in 1:nrow (MSnSet1))  if (any (exprs(MSnSet1)[i, ] != exprs(MSnSet2)[i, ], na.rm = TRUE)) selA <- c (selA, i)
  return (selA)
}

diffSum <-  findDifferentquant(satCorrected[[1]], satCorrected[[2]])
diffRef <-  findDifferentquant(satCorrected[[1]], satCorrected[[3]])

length (diffSum)
length (diffRef)

I see

length (diffSum)
[1] 3069

and

length (diffSum)
[1] 3069

length (diffRef)
[1] 3067

@sgibb
Copy link
Collaborator

sgibb commented May 20, 2016

@pavel-shliaha I got the same output. Now I finally understand what you mean with "FFQELR" and "ESNEITIIINPYRETVCFSVEPVK" are requantified by "sum" but not by "reference". You mean that the intensities with and without reference requantification are identical.

I looked into "FFQELR". Just as reminder the isotopic distribution:

## fetch peptide of interest
poi <- combMSNset[grep("^FFQELR$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

m <- synapter:::.isotopicDistr2matrix(iso)
m
#                            1_0   1_1  1_2 1_3 1_4  2_0  2_1 2_2 2_3
# isotopicDistr.S130423_05   632   307   NA  NA  NA 6099 2445 696  NA
# isotopicDistr.S130423_07    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_09 37909 16706 2711 501 174 8502 3463 677 125
# isotopicDistr.S130423_11    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_13    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_06    NA    NA   NA  NA  NA   NA   NA  NA  NA
# isotopicDistr.S130423_08    NA    NA   NA  NA  NA 5538 1675  NA  NA
# isotopicDistr.S130423_10    NA    NA   NA  NA  NA 4842 1439 484  NA
# isotopicDistr.S130423_12    NA    NA   NA  NA  NA 4128 1625 371  NA
# isotopicDistr.S130423_14    NA    NA   NA  NA  NA 3386 1290 349  NA

It is a special case: all but one run (S130423_09) are unsaturated. Now the algorithm looks for the best reference run (which is the one with the most unsaturated isotopes: S130423_09 (8 unsaturated and 1 saturated)). Subsequently the new intensity values are calculated for run S130423_09 (all other runs are unsaturated and not touched at all). The correction factor between the unsaturated isotopes of S130423_09 and the reference (S130423_09, too) is 1. That's why the saturated value is replaced by the identical value (the requantification works: saturated value * reference correction factor; and reference correction factor = mean ( unsaturated isotopes / unsaturated reference isotopes). So in this special case it is expected that the requantificated intensties are identical to the original onces.
requantify (poi, method = "sum", saturationThreshold= 3e4, onlyCommonIsotopes=FALSE) just removes isotope S130423_09:1_0 (that's why it differs slightly).

@sgibb
Copy link
Collaborator

sgibb commented May 22, 2016

For the ESNEITIIINPYRETVCFSVEPVK the only saturated run is also the reference run (same situation as for FFQELR above).

I hope the onlyForSaturatedRuns problem is fixed now:
rescaletop3

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented May 24, 2016

there is a problem with theoretical method correction for some peptides, e.g. ILFDYSK. I have run the code and the correction is in the table presented below

library (synapter)
library (MSnbase)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS")

satThresholdIon <- 3e4
satCorrected <- sapply (c ("sat", "th.mean", "th.median", "th.weighted.mean"), function (x) NULL)

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "th.mean",
saturationThreshold= satThresholdIon,
requantifyAll=FALSE)

satCorrected[[3]] <- requantify (combMSNSet, method = "th.median",
saturationThreshold= satThresholdIon,
requantifyAll=FALSE)

satCorrected[[4]] <- requantify (combMSNSet, method = "th.weighted.mean",
saturationThreshold= satThresholdIon,
requantifyAll=FALSE)

xx <- rbind (exprs (satCorrected[[1]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ],
exprs (satCorrected[[2]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ],
exprs (satCorrected[[3]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ],
exprs (satCorrected[[4]])[featureNames (satCorrected[[2]]) == "ILFDYSK", ])

row.names(xx) <- names (satCorrected)

image

could you please produce for "ILFDYSK" a detailed procedure of what is happening here, like you did for previous peptides when we had a problem?

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented May 24, 2016

sorry yet another issue

sum method

library (synapter)
library (MSnbase)

combMSNSet <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//refcombMSNSet.RDS")
satThresholdIon <- 3e4
satCorrected <- sapply (c ("sat", "sum")

satCorrected[[1]] <- combMSNSet

satCorrected[[2]] <- requantify (combMSNSet, method = "sum",
saturationThreshold= satThresholdIon,
onlyCommonIsotopes=TRUE)

exprs (satCorrected[[2]])[5, ]

all are NA! so requantification FAILED!

poi <- combMSNSet[5,]
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNSet))]
m <- synapter:::.isotopicDistr2matrix(iso)
m <- m[apply (m, 1, function (x) any (!is.na (x))), ]
m <- m[, apply (m, 2, function (x) all (!is.na (x)))]
m <- m[, apply (m, 2, function (x) all (x < saturationThreshold))]

m

image

but this shows requantification is possible

@sgibb
Copy link
Collaborator

sgibb commented May 24, 2016

The th.* problem:

combMSNset <- readRDS("refCombMSNSet.RDS")

## fetch peptide of interest
poi <- combMSNset[grep("^ILFDYSK$", featureNames(combMSNset)),]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

x <- synapter:::.isotopicDistr2matrix(iso)
saturationThreshold <- 3e4
unsat <- .isUnsaturatedIsotope(x, saturationThreshold=saturationThreshold)
#                          1_0  1_1  1_2   2_0   2_1
#isotopicDistr.S130423_05   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_07   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_09   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_11   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_13   NA   NA   NA FALSE FALSE
#isotopicDistr.S130423_06 TRUE TRUE TRUE FALSE FALSE
#isotopicDistr.S130423_08 TRUE TRUE   NA FALSE FALSE
#isotopicDistr.S130423_10 TRUE TRUE   NA FALSE  TRUE
#isotopicDistr.S130423_12   NA   NA   NA FALSE  TRUE
#isotopicDistr.S130423_14   NA   NA   NA FALSE  TRUE

Above we see the first problem: There are not any isotopes below the saturation threshold for the first 5 runs. So we can't predict anything here (explains the NA/0.0 in the first five columns of your table).

x
#                           1_0  1_1 1_2   2_0   2_1
#isotopicDistr.S130423_05    NA   NA  NA 70214 40605
#isotopicDistr.S130423_07    NA   NA  NA 64751 41339
#isotopicDistr.S130423_09    NA   NA  NA 66143 34858
#isotopicDistr.S130423_11    NA   NA  NA 54542 30211
#isotopicDistr.S130423_13    NA   NA  NA 47453 30213
#isotopicDistr.S130423_06 12833 5303 955 63133 33950
#isotopicDistr.S130423_08  9116 4281  NA 53849 31806
#isotopicDistr.S130423_10  9077 3418  NA 50046 24914
#isotopicDistr.S130423_12    NA   NA  NA 40965 20147
#isotopicDistr.S130423_14    NA   NA  NA 31636 19936

x <- x * unsat
#                           1_0  1_1 1_2 2_0   2_1
#isotopicDistr.S130423_05    NA   NA  NA   0     0
#isotopicDistr.S130423_07    NA   NA  NA   0     0
#isotopicDistr.S130423_09    NA   NA  NA   0     0
#isotopicDistr.S130423_11    NA   NA  NA   0     0
#isotopicDistr.S130423_13    NA   NA  NA   0     0
#isotopicDistr.S130423_06 12833 5303 955   0     0
#isotopicDistr.S130423_08  9116 4281  NA   0     0
#isotopicDistr.S130423_10  9077 3418  NA   0 24914
#isotopicDistr.S130423_12    NA   NA  NA   0 20147
#isotopicDistr.S130423_14    NA   NA  NA   0 19936

And here we see the second problem. Allmost all two-charged ions are saturated (and not used for the prediction) that's why our predicted intensities for S130423_06 and S130423_08 are very low. The runs with unsaturated two-charged ions S130423_10 and S130423_14 yield higher intensities.

The "sum" problem:

The isotopic matrix of the fifth peptide (LAQANGWGVMVSHR) is:

                           2_0   2_1  2_2  2_3  2_4   3_0   3_1   3_2   3_3   3_4
isotopicDistr.S130423_05 24557 19414 8161 3387 1424 66080 59557 33660 21962  9124
isotopicDistr.S130423_07 26498 21050 9577 3939 1922 68076 58367 42972 24453 13588
isotopicDistr.S130423_09    NA    NA   NA   NA   NA    NA    NA    NA    NA    NA
isotopicDistr.S130423_11 16756 16158 5719 2899 1341 56830 53333 30881 16859  7490
isotopicDistr.S130423_13 17117 12276 5233 2072   NA 48460 41008 30369 13261  8023
isotopicDistr.S130423_06 24851 17418 9016   NA   NA 65222 56007 37009    NA    NA
isotopicDistr.S130423_08 19034 13436 6568   NA   NA 52526 44629 32459    NA    NA
isotopicDistr.S130423_10 14319 11793 5010   NA   NA 55761 47528 25765    NA    NA
isotopicDistr.S130423_12 15707 13055 5352   NA   NA 49521 45595 26457    NA    NA
isotopicDistr.S130423_14 17540 12890 6435   NA   NA 44465 43195 29950    NA    NA

As you see the third run isotopicDistr.S130423_09 is completely missing. In the current definition of onlyCommonIsotopes=TRUE that means there is not any isotope present in all runs (because S130423_09 has no isotopes at all).

@pavel-shliaha
Copy link
Collaborator Author

for sum method can you please no consider runs, for which we dont have peptide identity i.e. all the isotopes are NA. Simply ignore the line or convert to 0. This is the final fix I need to finish the paper. For the theroretical methods we need to think what to do in instances like the one above where requantification is not possible.

@lgatto
Copy link
Owner

lgatto commented May 25, 2016

Simply ignore the line or convert to 0

Converting to 0 is not advisable. If a line is ignored, this would need to be recorded somewhere, or reported to the used at the very least. The best is to keep all features, but set those that don't return any value to NA. It is then just a matter of calling filterNA to remove then afterwards.

@pavel-shliaha
Copy link
Collaborator Author

sorry guys did not mean to tell you how to code

@lgatto
Copy link
Owner

lgatto commented May 25, 2016

sorry guys did not mean to tell you how to code

No worries - I just wanted to make sure we stay away from wild 0-imputation.

@sgibb
Copy link
Collaborator

sgibb commented May 25, 2016

Ok, now missing runs (runs without any recorded intensity value) are ignored for requantify(..., method="sum", onlyCommonIsotopes=TRUE):

combMSNset <- readRDS("refCombMSNSet.RDS")

## fetch peptide of interest
poi <- combMSNset[5,]
## grep isotopic distribution
iso <- fData(poi)[, grep("isotopicDistr", fvarLabels(combMSNset))]

# NONE
exprs(poi)
# FALSE
exprs(requantify(poi, method="sum", saturationThreshold=3e4, onlyCommon=FALSE))
# TRUE
exprs(requantify(poi, method="sum", saturationThreshold=3e4, onlyCommon=TRUE))
S130423_05 S130423_07 S130423_09 S130423_11 S130423_13 S130423_06 S130423_08 S130423_10 S130423_12 S130423_14
NONE 247326 270442 NA 208266 177819 209523 168652 160176 155687 154475
FALSE 88029 101027 NA 67222 57982 51285 39038 31122 34114 36865
TRUE 52132 57125 NA 38633 34626 51285 39038 31122 34114 36865

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented Jun 13, 2016

Hey Sebastian. Problems again. Please have a look.

satThresholdIon <- 3e4
combMSNSet2  <- readRDS ("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2//output//UDMSE//combMSNSet.RDS")
combMSNSet2 <- requantify (combMSNSet2, method = "sum", 
                           saturationThreshold= satThresholdIon, 
                           onlyCommonIsotopes=FALSE)

getting the following message:

Error in seq.default(from = 1L, to = nall, by = 2L) :
wrong sign in 'by' argument

already tried restarting R session and reinstalling synapter. Can you reproduce that?

@sgibb
Copy link
Collaborator

sgibb commented Jun 13, 2016

I never see this kind of error before. It seems that the peptide YATALAK(row 3412) has just NA values (except precursor.mhp.S130423_05: 737.4174). I don't know why this happen.

Nevertheless it was a bug, that the functions could not handle entries without any non-NA value. That is fixes now.

@pavel-shliaha: Why we didn't recognize this before. Where does YATALAK come from?

@pavel-shliaha
Copy link
Collaborator Author

lets keep this issue open for now

@pavel-shliaha
Copy link
Collaborator Author

pavel-shliaha commented Sep 16, 2016

requantify (readRDS("Y://RAW/pvs22//_QTOF_DATA_data3//synapter2paper//kuharev2015//synapter2_intensity//output//UDMSE//refcombMSNSetNS.RDS"), 
            method = "th.mean", 
            saturationThreshold= 3e4)

returns

Error in Mod(z$values) : non-numeric argument to function

could you please have a look

@sgibb
Copy link
Collaborator

sgibb commented Sep 16, 2016

I have currently no access to prot-filesrv1 (networkmanager-strongswan plugin seems to not accept/send the password). Can you send me the refcombMSNSetNS.RDS via e-mail/dropbox/google drive?

@pavel-shliaha
Copy link
Collaborator Author

shared the file with you via google drive

@sgibb
Copy link
Collaborator

sgibb commented Sep 16, 2016

Sorry, but I can't reproduce the error. Just works for me. Could you try again and directly call traceback() after the error. Also the output of sessionInfo() could be helpful.

@pavel-shliaha
Copy link
Collaborator Author

@sgibb started having trouble with the theoretical method as soon as I updated synapter. I have shared the file with you through google drive.

testMSNSet <- readRDS ("MSnbaseProblemRequant.RDS")

satThresholdIon <- 3e4

requantify (testMSNSet, method = "th.mean",
saturationThreshold= satThresholdIon)

error:

Error in Mod(z$values) : non-numeric argument to function

@pavel-shliaha
Copy link
Collaborator Author

Sorry guys now it works after restarting R session (but I also restarted before posting). I am not sure but perhaps this intermittent error has smth to do with the BRAIN package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants