Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw MSI spectra as MassSpectra objects - size in memory #58

Open
dsammour opened this issue Oct 30, 2018 · 25 comments
Open

raw MSI spectra as MassSpectra objects - size in memory #58

dsammour opened this issue Oct 30, 2018 · 25 comments

Comments

@dsammour
Copy link

Hi,

I am using MALDIquant extensively to process MSI datasets. These datasets tend to be extremely large (<50GB when loaded into memory into lists of MassSpectrum objects). The issue here is that each MassSpectrum object stores intensities and corresponding m/z values, which means huge redundancy and unnecessary memory usage since the mass axis is exactly identical for every spectrum. I was wondering if it would be possible to exclude the mass axis from MassSpectrum objects and possibly create a new type of object, say MassAxis, which can be passed as an argument to the processing functions. Obviously, this cannot be applied to MassPeak objects.

Thanks,
Denis

@sgibb
Copy link
Owner

sgibb commented Oct 30, 2018

Dear Denis,

thanks for you feature request. MALDIquant was written because we had to analyse MALDI data from different MS devices with different resolution. That is the reason why we choose to store the m/z for each spectrum separately.

I see at least two solutions to the problem.

  1. Create a new MassSpectrum/MassPeaks object that store their data on the disk instead of in memory.
  2. Create a new object that keeps the m/z values once along with all intensities.

I would prefer the first method because it greatly reduce the memory usage. The second method roughly halfs the memory usage.

You could also try to remove useless metaData (unfortunately they are redundant too).

Unfortunately I am not involved in any MS/MSI project anymore. While I am still maintaining MALDIquant (fixing bugs/writing little new features if needed) I don't have enough time for such a large feature.

If you like to develop one of the above solutions I am happy to assist/review/accept pull request. If you don't want/can't develop something like this you may want to have a look at Cardinal which was written for large IMS data sets and keeps everything on disk.

Best wishes,

Sebastian

@dsammour
Copy link
Author

dsammour commented Nov 2, 2018

Hi Sebastian,

Thanks for your reply. I also find the first option more compelling, in particular, there is a package called matter which is used internally in Cardinal. Theoretically, it can interface with the imzML binary file directly, sparing the need for file conversion and saving, but I am not sure whether writing back to imzML is a good idea. I'll try and give it a shot sometime later to create on-disk objects storing only intensity data that can interface with MALDIquant using createMassSpectrum so that the rest of the functions can be directly applied. Meanwhile, it would be nice if you just store this issue as a feature request for the future.

Best,
Denis

@sgibb
Copy link
Owner

sgibb commented Nov 3, 2018

I don't think we have to rewrite the data to imzML (btw: this is how MSnbase::OnDiskMSnExp works for mzML). I am not sure whether it is a good idea but the first time I thought about an on-disk object I was playing with a filesystem-based structure:

spectrum1/ # a folder
spectrum1/mz # binary file containing the mz values
spectrum1/intensity # binary file containing the intensity values
spectrum1/metaData # text INI/JSON/... file with metadata
spectrum2/
spectrum2/mz
spectrum2/intensity
spectrum2/metaData
...

If the m/z are identical for all spectra we could just create symlinks or keep them in memory.
While I thought it is a simple way it may be better to use a more standardized format (imzML, mzDB, ...).

As I said above regardless of the underlying solution I would be more than happy to accept pull requests that implementing an on-disk feature.

@dsammour
Copy link
Author

dsammour commented Nov 6, 2018

Hi Sebastian,

Thanks again for your comments.

I have been reading the documentation of matter package and I found it it to be very simple and versatile. In section 7 of this document they describe the possibility to extend classes with matter objects. I don't have much experience in dealing with classes in R, so I could use some help in the following:

To replicate the current class structure of MassSpectrum, I will need to have one object per spectrum which will later be stored in a list, say, a list of MassSpectrumOnDisk objects with only the metaData slot is being stored in memory while mass and intensity slots will be saved on disk. Ideally, to make all methods of MassSpectrum available for this new class, it should inherit from MassSpectrum class. However, for the on-disk functionality matter requires mass and intensity slots to be of type matter_df (or any other chosen matter object). Is it generally possible to update the type of slots for inheriting classes? Do you see another alternative?

Thanks,
Denis

@sgibb
Copy link
Owner

sgibb commented Nov 6, 2018

In R it is not possible to overwrite the type of slots in inherited classes. An alternative approach (that requires the change of the parent class AbstractMassObject and will break backward compatibility but this feature is worth to break it) are class unions as members, see setClassUnion, e.g.:

## create class unions (number or character/ mz&intensity or filename)
setClassUnion("NumericOrCharacter", c("numeric", "character"))

setClass("AbstractMassObject",
         slots=list(mass="NumericOrCharacter", intensity="NumericOrCharacter",
                    metaData="list"),
         contains="VIRTUAL")

setClass("MassSpectrum",
         slots=list(mass="numeric", intensity="numeric"),
         prototype=list(mass=numeric(), intensity=numeric(),
                        metaData=list()),
         contains="AbstractMassObject")

setClass("OnDiskMassSpectrum",
         slots=list(mass="character", intensity="character"),
         prototype=list(mass=character(), intensity=character(), metaData=list()),
         contains="AbstractMassObject")

ms <- new("MassSpectrum", mass=1:10, intensity=1:10)
oms <- new("OnDiskMassSpectrum", mass="filename", intensity="filename")

A possible class hierarchy would be:

AbstractMassObject (VIRTUAL, mass/intensity = "numeric"/"matter_vec", metaData="list/matter")
├── AbstractMassSpectrum (VIRTUAL, just for method implementation)
│   ├── MassSpectrum (mass/intensity = "numeric"/"double", metaData="list")
│   ├── MassSpectrumOnDisk (mass/intensity = "matter_vec", metaData="matter")
└── MassPeaks (mass/intensity = "numeric"/"double", metaData="list")

While I find matter is a really great, impressive package the decision to depend on matter is a tough one. Currently MALDIquant is on CRAN and matter on bioconductor. If we import matter the user can't install MALDIquant via install.packages("MALDIquant") but has to use install.packages("BiocManager"); BiocManager::install("MALDIquant"). IMHO there are at least two solutions:

  1. Move MALDIquant to bioconductor.
  2. Avoid matter and use something else.

I don't know which of the solutions is harder. I don't have any experience on moving from CRAN to bioconductor (IMHO it was the wrong decision in 2011 to send the package to CRAN and not to bioconductor ...).
EDIT: Maybe we could put matter into Suggests:. Then everybody could use MALDIquant in memory and if you want to create and use an on-disk object the constructor could show a message how to install matter.

@sgibb
Copy link
Owner

sgibb commented Nov 6, 2018

Here is an short example implementation of an on-disk class using the binary filesystem structure I mentioned earlier: https://gist.github.com/sgibb/b46a543d4bd1476a2abd412d23bf3780

I don't know whether there are any performance benefits using this approach or matter.

Unfortunately methods like plot, transformIntensity etc won't work because I used to access the slots directly via @mass/@intensity instead of their accessor methods mass()/intensity() to avoid method dispatching and save some time (it was a bad decision but it should be solvable with a sed command). Also most of the methods have to be carefully analyzed and partly rewritten to not access mass()/intensity() multiple times and use internal/local variables instead (wasn't needed by accessing the slot directly).

@dsammour
Copy link
Author

dsammour commented Nov 7, 2018

Thanks Sebastian for investing your time in the clarifications above.

The class hierarchy mentioned above makes a lot of sense. Regarding the whole Bioconductor thing, you may find this thread very interesting. It seems that one can set the repositories which install.packages uses (see ?setRepositories). One answer also mentions that one need only to add a line containing biocViews: in the DESCRIPTION file. Alternatively, as you mentioned, adding matter to Suggests: should be more than enough.

The following suggestion would probably belong to MALDIquantForeign instead. IMHO the most elegant solution for importing large imzML data, is actually not to import them altogether! Ideally, and having the implementation of the class hierarchy mentioned above by you, the raw data could be kept on disk (within the ibd file) and "attached" to the session via matter. In this situation, the loaded dataset will consist of a list of MassSpectrumOnDisk objects whose mass and intensity slots are matter_vec objects pointing to different offsets within the same ibd file. The metaData slot can be still loaded into memory because of its small footprint. To be safe, the ibd will be copied beforehand with the copy being the temporary file that is attached. Do you think this is possible?

I find the solution above intriguing, however, at the moment I can't invest time to try it out (not in this year at least), but I will definitely inform you if I get something going regarding this.

Thanks,
Denis

@sgibb
Copy link
Owner

sgibb commented Nov 7, 2018

Thank you for the thread.

Your suggestion to use the ibd file instead of inventing a new file format is great! I would definitively do that.

My approach has the disadvantage that it reads all mass/intensity data of a spectrum into memory. That's fine for most methods because they work on a single spectrum (and a single spectrum should fit in memory). matter offers the possibility to work on chunks (if a spectrum doesn't fit into memory) or you want to run methods across spectra.

While I really appreciate your help and effort and I am very interested in improving MALDIquant but I am curious why you want to use MALDIquant instead of Cardinal for IMS. Looking over matter and Cardinal it seems to be more sophisticated and better suited for IMS-analysis.

@dsammour
Copy link
Author

dsammour commented Nov 7, 2018

It's a valid point, but honestly I find the processing methods implemented in MALDIquant superior and clearer to the ones in Cardinal, and being stored in lists, it is easier (and somewhat faster) to parallelize. Using lists to store peak information is also more convenient as opposed to sparse matrices in Cardinal. In general, I prefer to use MALDIquant for preprocessing and Cardinal for visualization and manipulation.

@sgibb
Copy link
Owner

sgibb commented Nov 7, 2018

Nice, to hear! Thanks.

@dsammour
Copy link
Author

Hi Sebastian,

I implemented what we were discussing above! I created a new class MassSpectrumOnDisk whose mass and intensity slots are of class matter_vec. I used a class union of numeric and matter_vec for the AbstractMassObject i.e. setClassUnion("NumericOrOnDisk", c("numeric", "matter_vec")). For the most part, I avoided editing MassSpectrum and MassPeaks methods editing only some aspects of AbstractMassObject methods and adding methods for MassSpectrumOnDisk type. I tested most of the functions routinely used for preprocessing and everything seems to work flowlessly. I had some trouble with the trimming and subsetting methods since it does not make sense to trim on-disk objects, which I solved by creating new MassSpectrum objects whenever trimming is needed (for example for the PlotMsiSlice). calling detectPeaks on MassSpectrumOnDisk objects will create in-memory MassPeaks objects. I also updated importImzML in MALDIquantForeign to be able to create instances of MassSpectrumOnDisk objects successfully attaching the imzML dataset without loading it. Once can also use createMassSpectrumOnDisk:

> createMassSpectrumOnDisk(seq(1,100), seq(1,100))
S4 class type            : MassSpectrumOnDisk                                                                  
Number of m/z values     : 100                                                                                 
Range of m/z values      : 1 - 100                                                                             
Range of intensity values: 1e+00 - 1e+02                                                                       
Memory usage             : 13.516 KiB                                                                          
Path to on-disk spectra  : C:\Users\DENIS_~1\AppData\Local\Temp\RtmpUHGrxv\spectrum3b430a83046.mass
                         : C:\Users\DENIS_~1\AppData\Local\Temp\RtmpUHGrxv\spectrum3b41621428f.intensity

As an example, I have an example imaging dataset which consists of 10924 spectra with 10000 bins each:

disk = importImzMl(path = "F:\\R projects\\gist_big_experiment\\sample_test\\patAB.imzML", verbose = F, attachOnly = T, duplicateFile = T)

disk[[1]]
S4 class type            : MassSpectrumOnDisk   
Number of m/z values     : 10000                
Range of m/z values      : 299.95 - 1998.982    
Range of intensity values: 5.463e-03 - 7.607e-01
Memory usage             : 15.625 KiB           
File                     : F:\R projects\gist_big_experiment\sample_test\patAB.imzML
Path to on-disk spectra  : C:\Users\DENIS_~1\AppData\Local\Temp\RtmpUHGrxv\file3b43fafa7_patAB.ibd

MALDIquant:::.memoryUsageStr(disk)
[1] "166.77 MiB"

diskLoadTime ## time it took to load the spectra
Time difference of 39.32421 secs

> a = Sys.time();disk = removeBaseline(disk);processingTime = Sys.time() - a
> processingTime
Time difference of 27.21858 secs

In comparison, loading the spectra into memory:

mem = importImzMl(path = "F:\\R projects\\gist_big_experiment\\sample_test\\patAB.imzML", verbose = F, attachOnly = F)

 mem[[1]]
S4 class type            : MassSpectrum         
Number of m/z values     : 10000                
Range of m/z values      : 299.95 - 1998.982    
Range of intensity values: 5.463e-03 - 7.607e-01
Memory usage             : 159.805 KiB          
File                     : F:\R projects\gist_big_experiment\sample_test\patAB.imzML

> MALDIquant:::.memoryUsageStr(mem)
[1] "1.665 GiB"

> diskLoadTime
Time difference of 24.93157 secs

> a = Sys.time();mem = removeBaseline(mem);processingTime = Sys.time() - a
> processingTime
Time difference of 11.86915 secs

I made a lot of changes and updated the documentation files for both MALDIquant and MALDIquantForeign, it would be very nice if you can review my changes and perhaps test them. How should we manage that? should I push the project directly from rstudio to your github repository?

@sgibb
Copy link
Owner

sgibb commented Feb 26, 2019

Awesome! Looks promising!

Could you please fork the MALDIquant (and the MALDIquantForeign) repository and create a pull request? Then we could discuss and latter merge the code easily.

@dsammour
Copy link
Author

dsammour commented Feb 26, 2019

Done.

For MassSpectrumOnDisk I have noted that some functions ex. calibrateIntensity and averageMassSpectra are still evaluated sequentially even with mc.cores is set to >1. When packing these two inside an mclapply function, parallel evaluation proceeds without problems. Is it possible that they have a simple lapply instead of the .lapply function inside? cannot get my head around the code.

For matter package, I specified a version constraint (>= 1.8.0). This is the one I am using on my system but the classes used where already defined available in (>= 1.4.1). Maybe it is more convenient to use the latter.

@sgibb
Copy link
Owner

sgibb commented Mar 3, 2019

I fixed calibrateIntensity, thanks for finding this! I have to look into averageMassSpectra a little bit deeper.

@sgibb
Copy link
Owner

sgibb commented Jun 16, 2019

@dsammour sorry for the delay. I partly incorporated your PR and pushed a working (at least for me) version of the MassSpectrumOnDisk support to github (documentation and unit tests are still missing). Could you please try if this satisfy your needs?:

devtools::install("sgibb/MALDIquant@OnDiskVector")
devtools::install("sgibb/MALDIquantForeign@OnDiskVector")
library("MALDIquant")
library("MALDIquantForeign")

# example file from
# https://ms-imaging.org/wp/wp-content/uploads/2019/03/S042_Continuous_imzML1.1.1.zip
f <- "S042_Continuous.imzML"

mem <- import(f, verbose=FALSE)
ods <- import(f, attachOnly=TRUE, duplicateFiles=TRUE, verbose=FALSE)
odp <- import(f, attachOnly=TRUE, duplicateFiles=FALSE, verbose=FALSE)

mem[1:2]
# [[1]]
# S4 class type            : MassSpectrum
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 8.61e+00
# Memory usage             : 8.625 KiB
# File                     : /home/sebastian/tmp/downloads/S042_Continuous.ibd
#
# [[2]]
# S4 class type            : MassSpectrum
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 1.827e+01
# Memory usage             : 8.625 KiB
# File                     : /home/sebastian/tmp/downloads/S042_Continuous.ibd
#

ods[1:2]
# [[1]]
# S4 class type            : MassSpectrumOnDisk
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 8.61e+00
# Memory usage             : 7.234 KiB
# File                     : /tmp/Rtmpzl41n8/S042_Continuous6fb97b37cbbb.ibd
#
# [[2]]
# S4 class type            : MassSpectrumOnDisk
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 1.827e+01
# Memory usage             : 7.234 KiB
# File                     : /tmp/Rtmpzl41n8/S042_Continuous6fb97b37cbbb.ibd
#

odp[1:2]
# [[1]]
# S4 class type            : MassSpectrumOnDisk
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 8.61e+00
# Memory usage             : 7.281 KiB
# File                     : /home/sebastian/tmp/downloads/S042_Continuous.ibd
#
# [[2]]
# S4 class type            : MassSpectrumOnDisk
# Number of m/z values     : 300
# Range of m/z values      : 225 - 249.917
# Range of intensity values: 0e+00 - 1.827e+01
# Memory usage             : 7.281 KiB
# File                     : /home/sebastian/tmp/downloads/S042_Continuous.ibd
#

@dsammour
Copy link
Author

@sgibb sorry for the huge delay as well. Just wanted to let you know that I have been working with the version I pushed for some time now ("onDiskVector" and "onDisk_support_for_imzML" branches of MALDIquant and MALDIquantForeign, respectively), everything seems to work smoothly. Note that I had to change the functionality of some methods to accommodate the MassSpectrumOnDisk objects, for example the "trim" method, does the trimming but returns a MassSpectrum object afterwards (thought that would be the easiest solution - trimming the file on disk would not make much sense!). However, under the "OnDiskVector" branch under your MALDIquant repository, I could not see these changes, did you have a different solution for the problem?

@foellmelanie
Copy link

Hi @sgibb and @dsammour ,

thank you very much for working on this. I hope you can continue this good work and make the changes available through CRAN soon.

I am looking forward to the support for on disk imzML :)

Thank you,
Melanie

@sgibb
Copy link
Owner

sgibb commented Feb 10, 2020

Sorry for this huge delay. Unfortunately the write support in the OnDiskVector class is not working for binary files with an offset (as nearly every .ibd file use it). If we disable write support these doesn't really make sense at all because you have to import all spectra at once into memory (instead one at the time its used) as soon as you call the first function.

Please see this stackoverflow question that describes the problem:
https://stackoverflow.com/questions/60158312/how-to-replace-a-value-in-a-binary-file-in-r

@dsammour did it really work for you?

@sgibb
Copy link
Owner

sgibb commented Feb 12, 2020

Ok, I found a solution for the above problem. IMHO it is working now as expected.
Could you please try the latest versions of my branches?:

devtools::install_github("sgibb/MALDIquant@OnDiskVector")
devtools::install_github("sgibb/MALDIquantForeign@OnDiskVector")

library("MALDIquant")
library("MALDIquantForeign")

# example file from
# https://ms-imaging.org/wp/wp-content/uploads/2019/03/S042_Continuous_imzML1.1.1.zip
f <- "S042_Continuous.imzML"

mem <- import(f)
ods <- import(f, attachOnly=TRUE)

@foellmelanie
Copy link

Thank you very much for following up on the on disk support.

I have tried your code in Rstudio cloud with R3.6.0.

Simple commands such as length(ods) and summary(ods) work.

But I get the same error for other commands such as:

avgSpectra = averageMassSpectra(ods,method="mean")
plot(ods[[1]])
any(sapply(ods, isEmpty))
transformIntensity(ods, method="sqrt")
calibrateIntensity(ods, method="TIC")

@sgibb
Copy link
Owner

sgibb commented Feb 13, 2020

@foellmelanie thanks for testing. Which kind of error do you get?

For me it works like expected:

#devtools::install_github("sgibb/MALDIquant@OnDiskVector")
#devtools::install_github("sgibb/MALDIquantForeign@OnDiskVector")

library("MALDIquant")
#> 
#> This is MALDIquant version 1.99.1
#> Quantitative Analysis of Mass Spectrometry Data
#>  See '?MALDIquant' for more information about this package.
library("MALDIquantForeign")

# example file from
# https://ms-imaging.org/wp/wp-content/uploads/2019/03/S042_Continuous_imzML1.1.1.zip
download.file(
    "https://ms-imaging.org/wp/wp-content/uploads/2019/03/S042_Continuous_imzML1.1.1.zip",
    "S042_Continuous_imzML1.1.1.zip"
)
unzip("S042_Continuous_imzML1.1.1.zip")
f <- "S042_Continuous.imzML"

mem <- import(f)
ods <- import(f, attachOnly=TRUE)

avgSpectra <- averageMassSpectra(ods, method="mean")
plot(ods[[1]])

any(sapply(ods, isEmpty))
#> [1] FALSE
ods <- transformIntensity(ods, method="sqrt")
ods <- calibrateIntensity(ods, method="TIC")
plot(ods[[1]])

@foellmelanie
Copy link

@sgibb sorry forgot to paste the error. Then its probably just a problem of the path because I used Rstudio cloud:

Error in file(con, "rb") : cannot open the connection In addition: Warning message: In file(con, "rb") : cannot open file '/tmp/RtmpfjI3Vf/S042_Continuous1482df6d1d5.ibd.intensity.1216.mod': No such file or directory

@sgibb
Copy link
Owner

sgibb commented Feb 15, 2020

Mh, I can't reproduce this error in https://rstudio.cloud/project/950000

@foellmelanie
Copy link

Thanks @sgibb, all good now, I think something was wrong with my file. I downloaded it again and now it works. Works also for other files.

@dsammour
Copy link
Author

dsammour commented Mar 11, 2020

Sorry for the late reply, I had a fare share of deadlines to meet.

I did two tests; one for the branch I suggested and one for the 'onDiskVector' branch. Both are working properly.

dsammour branch of MALDIquant and MALDIquantForeign

Note: on a linux-based system.

> d1 = MALDIquantForeign::importImzMl("/mass-store/export/process/imzML data clinical samples/raw imzml/[MODIFIED] 20160526_SET4_SL2_patAb.imzML", attachOnly = TRUE, verbose = FALSE)

NOTE: imzML dataset was loaded via attacheOnly option and a duplicate file was generate. Any changes made to the spectra are directly written to the duplicate file.

> d1[[1]]
S4 class type              : MassSpectrumOnDisk                                                                     
Number of m/z values       : 120000                                                                                 
Range of m/z values        : 300 - 2000                                                                             
Range of intensity values  : 0e+00 - 1.799e+00                                                                      
Memory usage               : 6.742 KiB                                                                              
File                       : /mass-store/disk3/process/imzML data clinical samples/raw imzml/[MODIFIED] 20160526_SET4_SL2_patAb.imzML
Path to on-disk mass values: /mass-store/ceph/tmp/RtmpfeOswi/file1d96ca2006d9_[MODIFIED] 20160526_SET4_SL2_patAb.ibd
Path to on-disk intensities: /mass-store/ceph/tmp/RtmpfeOswi/file1d96ca2006d9_[MODIFIED] 20160526_SET4_SL2_patAb.ibd

> plot(d1[[1]])

raw

> d1 = calibrateIntensity(d1, "TIC")

normalized

onDiskVector branch of MALDIquant and MALDIquantForeign

Note: On Windows.

> d1 = MALDIquantForeign::importImzMl("F:\\R projects\\GIST\\raw_data\\[MODIFIED] 20160526_SET4_SL2_patAb.imzML", attachOnly = TRUE, verbose = FALSE)

> d1[[1]]
S4 class type            : MassSpectrumOnDisk
Number of m/z values     : 120000            
Range of m/z values      : 300 - 2000        
Range of intensity values: 0e+00 - 1.594e+00 
Memory usage             : 7.211 KiB         
File                     : C:\Users\DENIS_~1\AppData\Local\Temp\RtmpigKF7b\[MODIFIED] 20160526_SET4_SL2_patAb142418bd719f.ibd

> plot(d1[[1]])
> d1 = calibrateIntensity(d1, "TIC")
> plot(d1[[1]])

The same plots as above were generated.

Notes

I think it would really make sense to generate a note when loading attached files as the one I added within ìmportImzML` for example:

if(attachOnly)
  {
         if(duplicateFile)
                message("\nNOTE: imzML dataset was loaded via attacheOnly option and a duplicate file was generate. ", 
                 "Any changes made to the spectra are directly written to the duplicate file.\n ")
         else
                message("\nNOTE: imzML dataset was loaded via attacheOnly option to the ORIGINAL FILE. ", 
                        "Any changes made to the spectra are directly written to the imzML file.\n ")
  }

I would also suggest to keep the following part which guards against the possibility that a processed imzML file is imported:

 if(isProcessed && attachOnly){
         message("The imzML file is of type 'processed'. The 'attachOnly' option is only available ", 
                 "for 'continuous' type and therefore will be overridden. In-memory MassPeaks objects will be created.")
         attachOnly <- FALSE
  }

So it will override the àttachOnlyvariable and will createMassPeaks` objects in memory in this situation.

I also didn't understand how the .isModified.OnDiskVector(x) works, could you @sgibb please clarify that.

On another note, the above code seems not to work with imzML files generated by MALDIquantForeign::exportImzML:

> d = MALDIquantForeign::importImzMl("F:\\R projects\\GIST\\sample_test\\patAB.imzML", attachOnly = TRUE, verbose = FALSE)
Invalid argumentCouldn't find end of Start Tag referenceableParamGr line 21338
Premature end of data in tag spectrum line 21337
Premature end of data in tag spectrumList line 94
Premature end of data in tag run line 93
Premature end of data in tag mzML line 2
Error: 1: Invalid argument2: Couldn't find end of Start Tag referenceableParamGr line 21338
3: Premature end of data in tag spectrum line 21337
4: Premature end of data in tag spectrumList line 94
5: Premature end of data in tag run line 93
6: Premature end of data in tag mzML line 2

where patAB.imzML was generated by èxportImzML` method. But I guess this is a secondary issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants