Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive memory usage while using mergeMassPeaks and FilterPeaks #64

Open
dsammour opened this issue Mar 19, 2020 · 6 comments · May be fixed by #70
Open

Massive memory usage while using mergeMassPeaks and FilterPeaks #64

dsammour opened this issue Mar 19, 2020 · 6 comments · May be fixed by #70

Comments

@dsammour
Copy link

Hi Sebastian,

I am currently working with massive MassPeaks lists of MALDI-FTICR data. By massive I mean

>length(e$msDataPeaks)
[1] 41371 # number of spectra
> mean(lengths(e$msDataPeaks))
[1] 2027.565 # average number of peaks per spectrum

Everything works flawlessly, but I noticed i) a huge memory usage (can reach up to 120 GBs!) when calling mergeMassPeaks and ii) huge memory usage + sometimes error messages when calling filterMassPeaks. Both were called after binPeaks. The error message is as follows:

Fehler in which(is.na(m)) : 
  lange Vektoren noch nicht unterstützt: ../../src/include/Rinlinedfuns.h:138

I know that internally both functions construct intensity matrices which blows up memory usage. Did you ever face such issues? What could you recommend in such situation?

Suggestion
For the internal construction of the intensity matrices, do you think it would be a better idea to construct spars matrices? for examples instead of the current implementation of .as.matrix.MassObjectList to use something like this :

.mass = unlist(lapply(focusRegion, MALDIquant::mass))
.intensity = unlist(lapply(focusRegion, MALDIquant::intensity))
.uniqueMass = sort.int(unique(.mass))
n = lengths(focusRegion)
r = rep.int(seq_along(focusRegion), n)
i = findInterval(.mass, .uniqueMass)
sparmat = Matrix::sparseMatrix(i = r, j = i, x = .intensity,                                     
                  dimnames = list(NULL, .uniqueMass), 
                  dims = c(length(focusRegion), length(.uniqueMass)))

what do you think?

@sgibb
Copy link
Owner

sgibb commented May 8, 2020

Did you ever face such issues? What could you recommend in such situation?
No. But I never had so much data.

Sparse matrices could be a solution especially with the on-disk vector feature.

@YonghuiDong
Copy link

@dsammour and @sgibb, Sorry for putting my inappropriate question here.

I also want to analyze my MALDI-FTICR data (MALDI profiling data, not MALDI imaging data ) with MALDIquant. But I don't know how to convert them into the data types that are supported by MALDIquantForeign. How did you convert your data? Thanks.

Dong

@dsammour
Copy link
Author

Hi @YonghuiDong, could you please open an issue in MALDIquantForeign and provide more details about the data structure, perhaps with an example. Thanks.

@YonghuiDong
Copy link

Hi @dsammour ,
Thanks for your suggestion. I have opened an issue in MALDIquantForeign. Could you please have a look.

sgibb/MALDIquantForeign#31

Thanks a lot.

Dong

@sgibb sgibb linked a pull request Oct 20, 2021 that will close this issue
@paoloinglese
Copy link
Contributor

Please have a look at PR #71 I've optimized the speed and memory usage for filterPeaks. The bottleneck in that function is the binary matrix for peaks occurrence.

@sgibb
Copy link
Owner

sgibb commented Nov 8, 2021

Thanks to @paoloinglese this is solved for filterPeaks in #71 and #72, respectively (and just merged into master, not on CRAN yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants