Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate MALDIquant's baseline estimation #119

Closed
4 tasks
sgibb opened this issue Dec 3, 2023 · 6 comments
Closed
4 tasks

Integrate MALDIquant's baseline estimation #119

sgibb opened this issue Dec 3, 2023 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@sgibb
Copy link
Member

sgibb commented Dec 3, 2023

To replace MALDIquant with the more modern RforMassSpectrometry pipeline we have to integrate some of its functionality, e.g. baseline estimation and removal. The following functions, their C-code, documentation and tests have to be copied to MsCoreUtils:

https://github.com/sgibb/MALDIquant/blob/master/R/estimateBaseline-functions.R
https://github.com/sgibb/MALDIquant/blob/master/tests/testthat/test_estimateBaseline-methods.R

They already work with just doubles or matrix as input so I assume we don't need any recoding here. But the documentation has to be taken from https://github.com/sgibb/MALDIquant/blob/master/man/estimateBaseline-methods.Rd and its example section needs to be rewritten to work without the MALDIquant classes.

CC: @lgatto, @cpauvert

@lgatto
Copy link
Member

lgatto commented Dec 25, 2023

@sgibb - the MALDIquant code is licensed under GPL (>=3), while MsCoreUtils is Artistic-2.0. How do you prefer to handle porting the code? Should we use two licenses, of switch the the latter?

@lgatto
Copy link
Member

lgatto commented Dec 25, 2023

I see that in localMaxima.c the header is

/* Sebastian Gibb <mail@sebastiangibb.de>
 *
 * This file is taken from MALDIquant for R and related languages.
 *
 */

I will be using this one for the new c files.

@lgatto
Copy link
Member

lgatto commented Dec 25, 2023

The code with unit test and documentation is available in this branch. An example is however currently missing. I could re-use the example data from MALDIquant:

> library("MALDIquant")
> data("fiedler2009subset", package="MALDIquant")
> s <- fiedler2009subset[[1]]
> s <- list(m = mass(s), intenstity = intensity(s))
> str(s)
List of 2
 $ m         : num [1:42388] 1000 1000 1000 1000 1000 ...
 $ intenstity: int [1:42388] 3149 3134 3127 3131 3170 3162 3162 3152 3174 3137 ...

Do we want to add such a variable to the data? @sgibb @jorainer

Alternatively, I could suggest MALDIquant and load the data from there (as above). That would avoid adding 0.5 Mb of data to the package.

@lgatto
Copy link
Member

lgatto commented Dec 25, 2023

I tentatively implemented the second solution, but happy to adapt.

Examples:

     ## Using the first spectrum from and example dataset from
     ## the MALDIquant package, without loading the package
     data("fiedler2009subset", package="MALDIquant")
     mass <- fiedler2009subset[[1]]@mass
     intensity <- fiedler2009subset[[1]]@intensity
     rm(fiedler2009subset)
     
     ## Example spectrum
     plot(mass, intensity, type = "l")
     
     ## ----------------------------
     ## SNIP baseline
     base_SNIP <- estimateBaseline(mass, intensity,
                                   method = "SNIP",
                                   iterations = 100)
     lines(mass, base_SNIP, col = "red")
     
     ## ----------------------------
     ## TopHat baseline
     base_TH75 <- estimateBaseline(mass, intensity,
                                   method = "TopHat", halfWindowSize = 75)
     lines(mass, base_TH75, col = "blue")
     
     base_TH150 <- estimateBaseline(mass, intensity,
                                    method = "TopHat", halfWindowSize = 150)
     lines(mass, base_TH150, col = "steelblue")
     
     ## ----------------------------
     ## Convex hull baseline
     base_CH <- estimateBaseline(mass, intensity,
                                 method = "ConvexHull")
     lines(mass, base_CH, col = "green")
     
     ## ----------------------------
     ## Median baseline
     base_med <- estimateBaseline(mass, intensity,
                                  method = "median")
     lines(mass, base_med, col = "orange")
     
     legend("topright", lwd = 1,
             legend = c("SNIP", "TopHat (hws = 75)",
                        "TopHat (hws = 150)",
                        "ConvexHull", "Media"),
             col = c("red", "blue", "steelblue",
                     "green", "orange"))

@sgibb
Copy link
Member Author

sgibb commented Dec 27, 2023

@lgatto thanks for looking into this.

I am fine with Artistic-2.0 for my code. I guess we could add a line "License: Artistic-2.0" to be clear?

I could also change MALDIquant from GPL to Artistic-2.0?

Instead of adding 0.5 Mb of data or adding MALDIquant to suggest we could use a (poor) simulation as example:

nmz <- 5000
mz <- seq(1000, length.out = nmz)

## create peaks
center <- seq(50, nmz, by = 500)
peaks <- lapply(center, function(cc)1000 * dpois(0:100, (1000 + cc) / 75))

## create baseline
intensity <- 100 * exp(-seq_len(nmz)/2000)

## add peaks to baseline
for (i in seq(along = center)) {
    intensity[center[i]:(center[i] + 100)] <-
        intensity[center[i]:(center[i] + 100)] + peaks[[i]]
}

## add noise
intensity <- intensity + rnorm(nmz, mean = 0, sd = 1)

plot(mz, intensity, type = "l")

Created on 2023-12-27 with reprex v2.0.2

@lgatto
Copy link
Member

lgatto commented Dec 27, 2023

Thanks, changed here.

lgatto added a commit that referenced this issue Dec 28, 2023
* feat: prepare matrix aggregation

* prepare common annotate manual page

* work on annotate manual page

* add aggregate_by_matrix() examples

* test: add aggregate_by_matrix tests

* docs: improve aggregate man page

* docs: import aggregate man

* fix error with man link

* fix type news

* Apply suggestions from code review

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Implement sgibb's review suggestions

* docs: fix defintion of MAT

* docs: name Samuel's contribution in aggregate man page

* fix: import as and revert back to lies

* review: no need to set factor, but set names

* fix: specify Matrix::crossproduct()

* fix: set colnames after aggregation

* refact: use Matrix::colSums() by default, also for matrix

* porting MALDIquant's baseline estimation

* export C functions

* fix unit tests

* clean up/fix tests and man page

* use bioc 3.19

* add baseline estimation example

* update example

* gha release and devel

* update description

* use simulated data, export individual baseline functions

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

* Update R/estimateBaseline.R

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>

---------

Co-authored-by: Sebastian Gibb <mail@sebastiangibb.de>
@lgatto lgatto closed this as completed Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants