Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in filterMz #184

Closed
lgatto opened this issue Jan 19, 2017 · 18 comments
Closed

Bug in filterMz #184

lgatto opened this issue Jan 19, 2017 · 18 comments
Labels

Comments

@lgatto
Copy link
Owner

lgatto commented Jan 19, 2017

On disk works

library("MSnbase")
mzf <- system.file("microtofq/MM14.mzML", package = "msdata")
od <- readMSData2(files = mzf, msLevel. = 1, centroided. = TRUE)
filterMz(od, mz = c(1005, 1006))

But note that this will lead to only empty spectra and that, at this stage, nothing has happened, as the filtering is recorded but not applied.

But in memory fails

im <- readMSData(files = mzf, msLevel. = 1, centroided. = TRUE)
filterMz(im, mz = c(1005, 1006))
Error in if (length(mz(object)) != peaksCount(object)) msg <- validMsg(msg,  : 
  argument is of length zero
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I haven't done any additional, more extensive testing so far.

@lgatto lgatto added the bug label Jan 19, 2017
@lgatto
Copy link
Owner Author

lgatto commented Jan 19, 2017

Note that this came up while looking into a filterEmptySpectra function (see #181)

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

Following up here - an empty spectra is not valid anymore!

> sp <- im[[1]]
> empty_sp <- filterMz(sp, mz = c(1005, 1006))
Warning message:
In trimMz_Spectrum(object, mzlim = mz, msLevel. = msLevel., ...) :
  No data points between 1005 and 1006 for spectrum with acquisition
number 1. Returning empty spectrum.
> empty_sp
Object of class "Spectrum1"
 Retention time: 4:30 
 MSn level: 1 
 Total ion count:  
 Polarity: 0 
> validObject(empty_sp)
Error in if (length(mz(object)) != peaksCount(object)) msg <- validMsg(msg,  : 
  argument is of length zero

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

This above is now fixed in d1c7e07

Another issue is that

setMethod("filterMz", "Spectrum",
          function(object, mz, msLevel., ...) {
              return(trimMz_Spectrum(object, mzlim = mz,
                                     msLevel. = msLevel., ...))
          })

didn't check that filterMz,Spectrum (or trimMz_Spectrum) didn't check the validity of their return object. Adding this could have an impact on speed though.

@sgibb @jotsetung - should we add a call to validOject to trimMz_Spectrum?

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

Note also that

> peaksCount(empty_sp)
integer(0)

but

> peaksCount(new("Spectrum1"))
[1] 0

This is not very consistent. Should we change that an empty spectrum has a peaks count of 0? @sgibb @jotsetung

@jorainer
Copy link
Collaborator

Regarding the validObject in trimMz_Spectrum: I would be against that - this would have a tremendous influence on performance for my data. But I understand that data validity is also important - now what about having a validity method that does not call all the validObject of all objects from which Spectrum inherits? one that does just validate the data within?

Regarding peaksCount: yup - having a consistent return value of 0 if there are not peaks/data would be great.

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

now what about having a validity method that does not call all the validObject of all objects from which Spectrum inherits? one that does just validate the data within?

Could you clarify?

@jorainer
Copy link
Collaborator

Whenever validObject is called on an object, validObject is also called for all objects it inherits. In the case of Spectrum1 this would be Spectrum, Versioned and all objects Versioned inherits. My suggestion would be to have a function:

.validSpectrum <- function(object) {
    msg <- validMsg(NULL, NULL)
    if (any(is.na(intensity(object))))
        msg <- validMsg(msg, "'NA' intensities found.")
    if (any(is.na(mz(object))))
        msg <- validMsg(msg, "'NA' M/Z found.")
    if (any(intensity(object) < 0))
        msg <- validMsg(msg, "Negative intensities found.")
    if (any(mz(object)<0))
        msg <- validMsg(msg, "Negative M/Z found.")
    if (length(object@mz) != length(object@intensity))
        msg <- validMsg(msg, "Unequal number of MZ and intensity values.")
    if (length(mz(object)) != peaksCount(object))
        msg <- validMsg(msg, "Peaks count does not match up with number of MZ values.")
    if (any(diff(mz(object)) < 0))
        msg <- validMsg(msg, "MZ values are out of order.")
    if (is.null(msg)) TRUE
    else stop(msg)
}

which can be also added to the Spectrum definition

...
             mz = numeric(),
             intensity = numeric()),
         validity = .validSpectrum
})

and which is called directly whenever a faster validation is required (i.e. validSpectrum(object) instead of validObject(object)).

Benchmarking for this:

sp1 <- new("Spectrum1")
microbenchmark(validObject(sp1), .validSpectrum(sp1))
Unit: microseconds
                expr     min       lq      mean   median      uq     max neval
    validObject(sp1) 510.942 558.6075 594.44723 568.1550 584.061 975.295   100
 .validSpectrum(sp1)  25.774  29.3020  33.62809  31.8005  33.299  93.858   100
 cld
   b
  a 

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

Regarding peaksCount: yup - having a consistent return value of 0 if there are not peaks/data would be great.

Done in 1381fcc

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

@jotsetung - is your validSpectum(object) not equivalent to validObject(object, complete = FALSE)?

@jorainer
Copy link
Collaborator

actually, complete = FALSE is the default, with complete = TRUE its way slower. The .validSpectrum function also just checks the object, but not the objects from which it inherits. validObject on a Spectrum1 would call the validity method of all classes Spectrum1 inherits.

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

Ah yes, I was not paying attention.

So may be there could be a validSpectrum function, but we keep validObject as the default, so that, as developers, we can choose which one to choose, but the default is still the full deep validity.

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

If, in the Spectrum class definition, I define

       validity = function(object) validSpectrum(object)

we should end up with the validity defined in validSpectrum plus inherited checks, i.e. the original behaviour, right?

@jorainer
Copy link
Collaborator

Yes. I think it's even enough to just say validity = .validSpectrum.
You're implementing it now? me too :) - clash -

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

I have it ready, testing right now. You can take over if you want. Just name it validSpectrum (drop the .) and don't forget to update the validity based on d1c7e07

@jorainer
Copy link
Collaborator

No prob, let's use your implementation.

@lgatto
Copy link
Owner Author

lgatto commented Jan 20, 2017

Done

@lgatto lgatto closed this as completed Jan 20, 2017
@jorainer
Copy link
Collaborator

So I'll not implement it as you already did.

jorainer added a commit that referenced this issue Jan 20, 2017
o Accessing slots directly.
o Using is.unsorted instead of any(diff(mz) < 0)
o related to issue #184.
@jorainer
Copy link
Collaborator

Small modification:

mz <- abs(rnorm(5000, sd = 4))
int <- abs(rnorm(5000, sd = 3, mean = 1000))
sp1 <- new("Spectrum1", mz = mz, intensity = int)

microbenchmark(any(diff(sp1@mz) < 0), is.unsorted(sp1@mz))

Unit: microseconds
                  expr    min      lq      mean   median       uq     max neval
 any(diff(sp1@mz) < 0) 79.970 95.7015 157.13452 173.8415 182.5305 746.460   100
   is.unsorted(sp1@mz) 17.496 18.0430  19.66587  18.7865  20.4275  51.865   100
 cld
   b
  a 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants