computation of conditions bug #52

pavel-shliaha · 2017-09-12T19:32:16Z

If I execute the following code I get

myoTD <- readTopDownFiles(path = getwd(),
                          type = c("b", "c", "y", "z"),
                          adducts = data.frame(mass=c(-H, H), name=c("cmH", "zpH"), to=c("c", "z")),
                          neutralLoss = NULL,
                          modifications = NULL,
                          sampleColumns = sampleColumns) 
myoTD <- filterInjectionTime(myoTD, maxDeviation=log2(3), keepTopN=2)
myoTD <- filterIntensity(myoTD, threshold=0.1)
myoTD <- filterCv(myoTD, threshold=30)
myoTD <- filterNonReplicatedFragments(myoTD)
ncb <- as(myoTD, "NCBSet")
ncbRed <- ncb[,ncb$AgcTarget == 5e5 & ncb$EtdReagentTarget %in% c(1e7, 0) & ncb$Mz == 707.3,]
ncbRedCond <- as.data.frame (bestConditions(ncbRed), stringsAsFactors = FALSE)
ncbRedCond["conditions"] <- row.names (ncbRed@colData)[ncbRedCond$index]
conditions

#   index bonds                              conditions
#1     95    90  C0707.30_5.0e+05_1.0e+07_07_05.00_00_1

so the best condition is index 95 with 90 bonds. However this is not the case: the best condition is 131 with 93 fragments

xx <- ncbRed@assay[, c(131, 95)]
sum(xx[, 1] == 1, xx[, 1] == 2, 2*sum(xx[, 1] == 3))
sum(xx[, 2] == 1, xx[, 2] == 2, 2*sum(xx[, 2] == 3))

C131 gives 93 and C95 gives 92 fragments. I imagine its a simple bug of not awarding 2 points for "Both" fragment (3 in your case)?

sgibb · 2017-09-12T21:00:09Z

You are right. Currently n-term, c-term, both are counted once. As far as I understood that's the way you suggested it in #22.

IMHO there was a misunderstanding. I asked:

Would you prefer a C-terminal fragment with higher intensity about a "both" fragment with lower intensity?

And you said

it depends on application. However in our case I can filter to select only reproducible fragments of high intensity before starting aggregation into N, C and Both

That's why I thought you don't care whether a bond is covered by one or two fragments and we look for the highest bond coverage (regardless its covered by one or two fragments).

But I understand that a bond is covered better by "both" than by a single "N" or "C".

I will change that.

pavel-shliaha · 2017-09-12T21:11:23Z

yes, sorry I understood your question in a very convoluted way ... I thought you were asking "why is "both" with low intensity fragments better (gets a score of 2) than N-terminal with high intensity (gets a score of one)". Facepalm.

Also now you probably understand why I suggested splitting the table into N- and C-terminal part for computing combinations... Given we have to test both the tables its much simpler to do this separately

sgibb · 2017-09-12T22:01:32Z

Ok, I changed the function. Now you get a three column matrix (with conditions as row.names); columns are: index, number of fragments, number of covered bonds:

bestConditions(n)
#                                       index fragments bonds
#C937.30_5.0e+05_0.0e+00_00.00_28_00_01  4830        72    69
#C703.20_1.0e+06_1.0e+07_02.50_00_21_1   3903        12    11
#C703.20_1.0e+06_1.0e+07_05.00_28_00_2   3945         8     8
#C703.20_5.0e+05_1.0e+06_15.00_00_00_2   2929         4     4
#C703.20_1.0e+06_5.0e+06_15.00_14_00_3   3830         3     3
#C937.30_5.0e+05_0.0e+00_00.00_35_00_07  4847         3     2
#C562.70_5.0e+05_5.0e+06_02.50_28_00_2   1052         2     2
#C562.70_1.0e+05_1.0e+06_02.50_00_14_3    129         1     1
#C562.70_5.0e+05_0.0e+00_00.00_00_14_01   702         1     1

Could you please test again? (Hopefully I don't mess everything up. It is dangerous to code at night ...)

pavel-shliaha · 2017-09-13T16:54:46Z

this is still a problem, and I think I know why. But just to reiterate, when we calculate coverage:

both - gets 2 points
N/C - 1 point
none - 0 points

when you are combining conditions for a particular bond

both + both = both - 0 points
both + N = both - 0 points
both + C = both - 0 points
none + none = none - 0 points
N + N = N - 0 points
C + C = C - 0 points
N + C = both - 1 point
both + N = both 1 point
both + C = both 1 point

the part that is wrong in your code is that

both + N gets 0 point
both + C gets 0 point

at least your code seems to perform this way numerically, i.e. when you add a new condition you do not reward having both in the position you only previously had N or C. Might it be smth to do with this

x[.row(x[, hc[1L], drop=FALSE]), ] <- 0L

here you seem to assign 0 to all rows even if you only had N/C in the bond?

sgibb · 2017-09-13T17:53:10Z

To be honest I don't completely understand your example. But should it work in the following way?

Example data:

Matrix:

bond	C1	C2	C3
1	N	B
2	N		C

First round: Select condition C1 (because 2 bonds covered) and return the following:

index	fragments	bonds
1	2	2

Now my code does the following (because all bonds are already covered by at least one fragment):

bond	C1	C2	C3
1
2

Am I right, that you suggest to degrade B in C2 to C (because N was already present in C1) and keep C in C3 (because bond 2 was not covered by B or C):

bond	C1	C2	C3
1		C
2			C

So you will get 2 more rounds:

index	fragments	bonds
1	2	2
2	1	1
3	1	1

pavel-shliaha · 2017-09-13T17:56:10Z

sorry I made a mistake in my description, since summing conditions is asymmetrical, as I just realise. From now on lets put already selected condition as the first one, so:

both + N = both - 0 points
N + both = both - 1 point

the latter is broken in your code.

pavel-shliaha · 2017-09-13T18:02:17Z

as to your comment: yes Both becomes C when N is already covered. But your code makes Both into nothing.

I understand this is confusing so:

THE BOTTOM LINE IS WE NEED TO ACCUMULATE AS MANY FRAGMENTS GOING BOTH DIRECTIONS. HAVING A FRAGMENT GOING IN THE DIRECTION THAT IS ALREADY PRESENT GIVES 0. HAVING A FRAGMENT GO IN A DIRECTION THAT IS ABSENT GIVES 1

hence my suggestion to split the table into 2 parts.

NFragTable <- as.matrix (ncbRed@assay)
NFragTable <- t(apply (NFragTable, 1, function(x) ifelse (x %in% c(1,3), TRUE, FALSE))) 

CFragTable <- as.matrix (ncbRed@assay)
CFragTable <- t(apply (CFragTable, 1, function(x) ifelse (x %in% c(2,3), TRUE, FALSE))) 


numCond <- 10
  
combTable <- as.data.frame (matrix (ncol = 2, nrow = numCond, 
                                      dimnames = list(NULL, c("index","fragments"))), stringsAsFactors = FALSE)
  
  for (i in 1:numCond){
    numBonds <- colSums (NFragTable) + colSums (CFragTable) 
    sel <- order (numBonds, decreasing = TRUE)[1]
    combTable[i, "index"] <- sel
    combTable[i, "fragments"] <- numBonds[sel]
    
    NFragTable <- NFragTable[!NFragTable[, sel], ] 
    CFragTable <- CFragTable[!CFragTable[, sel], ]
    
    sel <- NFragTable[, sel] | CFragTable[, sel]
    
    NFragTable <- NFragTable[!sel, ]
    CFragTable <- CFragTable[!sel, ]
}

sgibb · 2017-09-21T19:25:27Z

@pavel-shliaha could please (re)test bestConditions in the most recent version.

pavel-shliaha · 2017-09-21T22:13:19Z

yep, it now works!

pavel-shliaha · 2018-01-13T10:50:14Z

right now the bestConditions() returns a table with the number of fragments covered + index. I know we discussed this and I know I asked for fragments not bonds, but it seems the field requires to maximise bonds not fragments. Could you please now compute both: bonds covered and fragments number and introduce an argument, maximise = "bonds" or "fragments", with "fragments" as default. If "fragments" is selected as in now,the conditions are listed in order of increasing fragments if "bonds" are selected the conditions are returned in order of maximising bonds covered. However metrics are always returned

sgibb · 2018-01-15T09:35:10Z

I have a question regarding the final reporting table. The current approach is:

Find the highest fragment coverage (easy to change to highest bond coverage; I already did that.)
Remove the fragments from the whole dataset (easy to change for bonds)
Start at 1. again until now additional coverage could be generated.
Report results.

The problem is in the implementation of step 2. Image the following conditions:

	cond1	cond2
bond1	C	-
bond2	N	B

maximise fragments:

Select cond1 or 2 based on intensity (e.g. we select cond1)
Remove C from bond1 and N from bond2 (turn B in cond2 into a C).
Select cond2 which adds a C:
Report

index	fragments	bonds
1	2	2
2	1	?

The question is what should be reported for the ? in the table above? 1 because the C covers 1 bond (then the colSum would be larger than the real number of bonds). Or zero because the bond is already covered by condition 1 (then the colSum would be equal to the real bond number, but some fragments won't cover a bond)?

same problem for maximise bonds is IMHO easy:

Select cond1 because it covers 2 bonds.
Remove both bonds from the coverage set
Nothing left
Report

index	fragments	bonds
1	2	2

pavel-shliaha · 2018-01-15T09:51:15Z

In your example, the answer we are looking for is

index	fragments	bonds
1	2	2
2	1	0

however, what is important attempting to maximise bonds will generate a different optimal combination than attempting to maximise fragments. Hence we need an argument: maximise = "bonds" or "fragments" However in both cases (if possible, I would prefer to have both columns: fragments and bonds) calculated.

I know this is not what I initially wanted, but it seems the field requires to report coverage in terms of bonds, however, I would still like to have the fragment functionality calculated.

sgibb · 2018-01-15T18:53:30Z

As we discussed via skype we optimise for number of fragments (default) or bonds. If there are multiple conditions with the same number of fragments/bonds we choose the one with the highest overall intensity. The bonds (fragments) are ignored for optimisation if maximise="fragments" (maximise="bonds") but reported.

library("topdownr")
data(tds)
tds <- aggregate(tds)
ncb <- as(tds, "NCBSet")

bestConditions(ncb, maximise="fragments")
#                                index fragments bonds
# C1.0e+06_0.0e+00_00.00_00_14_1   219        99    98
# C1.0e+06_1.0e+06_05.00_21_00_1   160        36    31
# C1.0e+06_1.0e+06_02.50_00_21_1   154         9     6
# C1.0e+06_1.0e+06_15.00_14_00_1   181         5     2
# C1.0e+06_1.0e+06_02.50_00_07_1   152         5     4
# C5.0e+05_0.0e+00_00.00_00_21     144         4     1
# C1.0e+06_1.0e+06_02.50_00_14_1   153         3     2
# C1.0e+06_1.0e+06_10.00_00_21     176         3     1
# C1.0e+06_1.0e+06_10.00_21_00_1   171         2     0
# C5.0e+05_1.0e+06_10.00_00_00_1   104         2     0
# C1.0e+06_1.0e+06_15.00_00_14     186         2     2
# C5.0e+05_0.0e+00_00.00_35_00_1   141         1     0
# C1.0e+05_1.0e+06_02.50_00_00_1    11         1     0
# C1.0e+06_1.0e+06_10.00_28_00     172         1     0
# C1.0e+06_1.0e+06_05.00_28_00_1   161         1     1
# C5.0e+05_1.0e+06_05.00_00_07_1    88         1     1
# C1.0e+06_1.0e+06_30.00_00_00     201         1     0
# C1.0e+06_1.0e+06_10.00_00_07     174         1     0
# C5.0e+05_1.0e+06_15.00_07_00     105         1     0
# C5.0e+05_1.0e+06_02.50_00_21_1    79         1     0
# C1.0e+06_1.0e+06_10.00_00_14     175         1     0
# C5.0e+05_1.0e+06_10.00_00_14_1   100         1     0
# C5.0e+05_1.0e+06_50.00_28_00     130         1     0
# C5.0e+05_0.0e+00_00.00_07_00_1   137         1     0
# C1.0e+06_1.0e+06_10.00_00_35_1   178         1     0
# C1.0e+06_0.0e+00_00.00_21_00_1   215         1     0
bestConditions(ncb, maximise="bonds")
#                                index fragments bonds
# C1.0e+06_0.0e+00_00.00_00_14_1   219        99    98
# C1.0e+06_1.0e+06_10.00_00_07     174        35    32
# C1.0e+06_1.0e+06_05.00_00_07     163         5     5
# C5.0e+05_0.0e+00_00.00_35_00_1   141         6     3
# C1.0e+06_1.0e+06_10.00_28_00     172         2     2
# C1.0e+06_1.0e+06_05.00_28_00_1   161         4     2
# C5.0e+05_1.0e+06_05.00_00_14_1    89         4     2
# C5.0e+05_1.0e+06_02.50_00_21_1    79         3     2
# C5.0e+05_1.0e+06_05.00_00_07_1    88         2     1
# C1.0e+06_1.0e+06_02.50_00_07_1   152         2     1
# C1.0e+06_1.0e+06_15.00_00_14     186         1     1

pavel-shliaha · 2018-01-28T11:31:43Z

If possible could you please introduce 2 additional columns containing the total number of fragments and bonds covered in every condition. So now there will be 5 columns:

"index"
"fragments_added_to_combination"
"bonds_added_to_combination"
"fragments_in_codnition"
"bonds_in_condition"

this will make the table more infromative

sgibb · 2018-01-30T11:45:22Z

Now there are 5 columns:

library("topdownr")
data(tds)
tds <- aggregate(tds)
ncb <- as(tds, "NCBSet")

bestConditions(ncb, maximise="fragments", n=10)
#                                Index FragmentsAddedToCombination
# C1.0e+06_0.0e+00_00.00_00_14_1   219                          97
# C1.0e+06_1.0e+06_05.00_21_00_1   160                          37
# C1.0e+06_1.0e+06_02.50_00_07_1   152                           9
# C1.0e+06_1.0e+06_02.50_00_21_1   154                           6
# C1.0e+06_1.0e+06_15.00_14_00_1   181                           5
# C5.0e+05_0.0e+00_00.00_00_21     144                           4
# C1.0e+06_1.0e+06_02.50_00_14_1   153                           3
# C1.0e+06_1.0e+06_10.00_00_21     176                           3
# C1.0e+06_1.0e+06_10.00_21_00_1   171                           2
# C5.0e+05_1.0e+06_10.00_00_00_1   104                           2
#                                BondsAddedToCombination FragmentsInCondition
# C1.0e+06_0.0e+00_00.00_00_14_1                      96                   97
# C1.0e+06_1.0e+06_05.00_21_00_1                      32                   89
# C1.0e+06_1.0e+06_02.50_00_07_1                       8                   90
# C1.0e+06_1.0e+06_02.50_00_21_1                       3                   90
# C1.0e+06_1.0e+06_15.00_14_00_1                       2                   75
# C5.0e+05_0.0e+00_00.00_00_21                         1                   29
# C1.0e+06_1.0e+06_02.50_00_14_1                       2                   67
# C1.0e+06_1.0e+06_10.00_00_21                         1                   62
# C1.0e+06_1.0e+06_10.00_21_00_1                       0                   86
# C5.0e+05_1.0e+06_10.00_00_00_1                       0                   63
#                                BondsInCondition
# C1.0e+06_0.0e+00_00.00_00_14_1               96
# C1.0e+06_1.0e+06_05.00_21_00_1               88
# C1.0e+06_1.0e+06_02.50_00_07_1               90
# C1.0e+06_1.0e+06_02.50_00_21_1               87
# C1.0e+06_1.0e+06_15.00_14_00_1               71
# C5.0e+05_0.0e+00_00.00_00_21                 29
# C1.0e+06_1.0e+06_02.50_00_14_1               66
# C1.0e+06_1.0e+06_10.00_00_21                 62
# C1.0e+06_1.0e+06_10.00_21_00_1               83
# C5.0e+05_1.0e+06_10.00_00_00_1               62

pavel-shliaha added the bug label Sep 12, 2017

sgibb added this to the bioc release milestone Sep 12, 2017

sgibb self-assigned this Sep 12, 2017

sgibb added a commit that referenced this issue Sep 12, 2017

Count "both" fragments twice in coverage; see #52

e01c228

sgibb added a commit that referenced this issue Sep 21, 2017

Fix .bestNcbConditions; see #52

d474190

pavel-shliaha closed this as completed Sep 21, 2017

pavel-shliaha reopened this Sep 21, 2017

sgibb closed this as completed Sep 23, 2017

pavel-shliaha reopened this Jan 13, 2018

sgibb modified the milestones: bioc release, bioc 3.7 Jan 14, 2018

sgibb added a commit that referenced this issue Jan 15, 2018

.highestNcbCoverage gains a maximise argument; see #52

e7f1e18

sgibb added a commit that referenced this issue Jan 15, 2018

Add maximise argument to bestConditions; see #52

f3af26e

sgibb closed this as completed Jan 17, 2018

sgibb added a commit that referenced this issue Jan 30, 2018

bestCondition returns 5-column matrix now; see #52

302d2a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

computation of conditions bug #52

computation of conditions bug #52

pavel-shliaha commented Sep 12, 2017 •

edited by sgibb

Loading

sgibb commented Sep 12, 2017

pavel-shliaha commented Sep 12, 2017 •

edited

Loading

sgibb commented Sep 12, 2017

pavel-shliaha commented Sep 13, 2017 •

edited

Loading

sgibb commented Sep 13, 2017

pavel-shliaha commented Sep 13, 2017

pavel-shliaha commented Sep 13, 2017

sgibb commented Sep 21, 2017

pavel-shliaha commented Sep 21, 2017

pavel-shliaha commented Jan 13, 2018

sgibb commented Jan 15, 2018

pavel-shliaha commented Jan 15, 2018 •

edited

Loading

sgibb commented Jan 15, 2018

pavel-shliaha commented Jan 28, 2018

sgibb commented Jan 30, 2018

computation of conditions bug #52

computation of conditions bug #52

Comments

pavel-shliaha commented Sep 12, 2017 • edited by sgibb Loading

sgibb commented Sep 12, 2017

pavel-shliaha commented Sep 12, 2017 • edited Loading

sgibb commented Sep 12, 2017

pavel-shliaha commented Sep 13, 2017 • edited Loading

sgibb commented Sep 13, 2017

pavel-shliaha commented Sep 13, 2017

pavel-shliaha commented Sep 13, 2017

sgibb commented Sep 21, 2017

pavel-shliaha commented Sep 21, 2017

pavel-shliaha commented Jan 13, 2018

sgibb commented Jan 15, 2018

pavel-shliaha commented Jan 15, 2018 • edited Loading

sgibb commented Jan 15, 2018

pavel-shliaha commented Jan 28, 2018

sgibb commented Jan 30, 2018

pavel-shliaha commented Sep 12, 2017 •

edited by sgibb

Loading

pavel-shliaha commented Sep 12, 2017 •

edited

Loading

pavel-shliaha commented Sep 13, 2017 •

edited

Loading

pavel-shliaha commented Jan 15, 2018 •

edited

Loading