-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in clean method / utils.clean function? #210
Comments
It is unrelated to the fact that the 0s are at the start/end, but with the fact that we remove 0s only when there are 3 or more: > test0 <- c(0, 0, 0, 1, 0, 1, 0, 0, 0)
> test0[MSnbase:::utils.clean(test0)]
[1] 0 0 1 0 1 0 0
> test1 <- c(0, 0, 0, 1, 0, 0, 1, 0, 0, 0)
> test1[MSnbase:::utils.clean(test1)]
[1] 0 0 1 0 0 1 0 0
> test2 <- c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
> test2[MSnbase:::utils.clean(test2)]
[1] 0 0 1 0 0 1 0 0
> test3 <- c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)
> test3[MSnbase:::utils.clean(test3)]
[1] 0 0 1 0 0 1 0 0 The reason for this is that we want to keep the 0s before and after ranges of non-0s. |
But actually, we probably could update the function to remove double leading/trailing zeros. Is there a good reason to do this? |
Reading the documentation I understood that zeros adjacent to peaks are kept while all those that are more than 1 position away from a non-zero value are removed. An alternative function to do this would be (note that this function would also handle #' @description Expands stretches of TRUE values in \code{x} by one on both
#' sides.
#'
#' @note The return value for a \code{NA} is always \code{FALSE}.
#'
#' @param x \code{logical} vector.
#'
#' @noRd
.grow_trues <- function(x) {
previous <- NA
x_new <- rep_len(FALSE, length(x))
for (i in 1:length(x)) {
if (is.na(x[i])) {
previous <- NA
next
}
## If current element is TRUE
if (x[i]) {
x_new[i] <- TRUE
## if last element was FALSE, set last element to TRUE
if (!is.na(previous) && !previous)
x_new[i - 1] <- TRUE
} else {
## if previous element was TRUE, set current to TRUE.
if (!is.na(previous) && previous)
x_new[i] <- TRUE
}
previous <- x[i]
}
x_new
} This produces: Test <- c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0)
.grow_trues(Test > 0)
[1] FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
Test[.grow_trues(Test > 0)]
[1] 0 1 1 0 0 1 1 1 0 While the function uses a library(microbenchmark)
library(MSnbase)
microbenchmark(.grow_trues(Test > 0), MSnbase:::utils.clean(Test))
Unit: microseconds
expr min lq mean median uq
.grow_trues(Test > 0) 6.570 7.5320 10.43929 9.6830 12.3165
MSnbase:::utils.clean(Test) 353.821 368.2385 584.08324 423.5585 511.7995
max neval cld
30.389 100 a
11846.897 100 b If you agree I could replace the original with this one. |
If we want to change the behaviour of .vclean <- function(x) {
notZero <- x != 0 & !is.na(x)
notZero | c(notZero[-1], FALSE) | c(FALSE, notZero[-length(notZero)])
} It turns the numerical vector
|
I am fine with changing the behaviour, and speed gain is impressive here - @sgibb, can you send a PR. |
Awesome! |
I am not quite happy how the current ints <- c(0, NA, 20, 0, 0, 0, 123, 124343, 3432, 0, 0, 0)
keep <- MSnbase:::utils.clean(ints)
ints[keep]
[1] NA 20 0 0 123 124343 3432 0 Here I don't like two things:
|
Forgot to reopen the issue ;) |
My 2 cents:
Here, I would suggest to handle zeros and
This is, as far as I can remember, the expected behaviour, as each peak should keep it's own pair of surrounding zeros. The reason is that if these two peaks are far apart, when plotting, each should reach zero intensity next to it's maximum > ints <- c(0, 10, 15, 2, 0, 0, 10, 30, 3, 0)
> mz <- c(1, 2, 3, 4, 5, 100, 101, 102, 103, 104)
> plot(mz, ints, type = "l") rather than plot(mz[-6], ints[-6], type = "l") |
Agree on the two |
To clarify, when I extract a |
I'm fine with |
Just a link to the |
* master: merge Sebastian's improvement, bump version adapt MSnSet unit tests to new utils.clean implementation rewrite utils.clean; closes #210 From: Laurent <lg390@cam.ac.uk> git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@129402 bc3139a8-67e5-0310-9ffc-ced21a209358
From: Sebastian Gibb <mail@sebastiangibb.de> git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@130545 bc3139a8-67e5-0310-9ffc-ced21a209358
I was surprised by the result to a
utils.clean
call: seems the very first element and the very last element is alwaysTRUE
:With a vector like
I was expecting to get:
but the result of
utils.clean
is:The first and last elements are thus always set to
TRUE
no matter what the next or previous elements are.@lgatto @sgibb is this intentional?
The text was updated successfully, but these errors were encountered: