Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export Spectra as HDF5 format (.h5)? #229

Open
Don86 opened this issue Nov 26, 2021 · 6 comments
Open

Export Spectra as HDF5 format (.h5)? #229

Don86 opened this issue Nov 26, 2021 · 6 comments

Comments

@Don86
Copy link

Don86 commented Nov 26, 2021

Hi,

I'd like to ask if there's currently a way to write out a Spectra S4 object, probably initially read as .mzML or .mzXML, as .h5? There doesn't seem to be this capability from what I'm seen in the manual. HDF5 seems like a better storage option since it has a smaller file size, well-supported outside of the mass spec world, and easily-interoperable with Python as well.

Regards,
Don

@Don86 Don86 changed the title Export as HDF5 format (.h5) Export Spectra as HDF5 format (.h5)? Nov 26, 2021
@lgatto
Copy link
Member

lgatto commented Nov 26, 2021

There's the MsBackendHdf5Peaks backend that stores the m/z and intensities on-disk in custom hdf5 data files. The spectra variables are still stored and manipulated in memory (in a DataFrame).

When you say HDF5 seems like a better storage option, I assume to refer to mzML. Even though you aren't wrong, mzML (a specific XML-based implementation for MS data that is widely adopted) and HDF5 (a general data storage system) are hardly directly comparable.

@lgatto
Copy link
Member

lgatto commented Nov 26, 2021

By the way, I'm transferring this issue from the RforMassSpectrometry.org repo to the Spectra package, which is where the backend class and interface is defined.

@lgatto lgatto transferred this issue from rformassspectrometry/RforMassSpectrometry.org Nov 26, 2021
@Adafede
Copy link

Adafede commented Nov 24, 2022

Happy to find this issue still opened! Would be great indeed 😊

@jorainer
Copy link
Member

Note that there are different backends already available that support export in a variety of formats. You could import a mzML and export that as an MGF file using the MsBackendMgf backend - but that might not be efficient. As an alternative possibility you could store the MS data from an mzML file into a SQL database (SQLite or MySQL) using the MsBackendSql - but again, that's no standard format - it's the format we define. But you could read/import that data from the SQLite or MySQL database also from python et al.

@Adafede
Copy link

Adafede commented Nov 25, 2022

I saw them, and they are great for so many cases!

My (probably relatively seldom) use case is matching (few) spectra against a (HUGE) spectral library, which stays fix for very long. My feeling is that loading with an MGF backend takes ages, while loading with a DB backend indeed faster, but still far from hd5.

We faced this issue of 99% of the time taken by loading of the spectra (not the matching) in our https://github.com/mandelbrot-project/spectral_lib_matcher#using-binary-libraries, reason why we implemented binary libraries.

@jorainer
Copy link
Member

@Adafede , if you have a huge reference spectral library, you might consider storing that into a CompDb database (from the CompoundDb package). That package provides also a Spectra backend retrieving the data directly from the database. That should be faster then using an MGF backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants