Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
mzR R/Bioconductor package provides a unified API to the common open and community-driven file formats and parsers available for mass spectrometry data, namely
mzData (see vignette for details). It uses
C++ code from other third party open-source projects and heavily relies on the
Rcpp package to, notably, provide a direct mapping from
mzR provides two actual backends to read Mass Spectrometry raw data:
netCDFwhich reads, as the name implies,
mzXMLvia the ISB
RAMPparser. This backend can also read
mzMLthrough the proteowizard
RAMPadapteraround the proteowizard infrastructure, but this interface is limited to the lowest common denominator between the
This project is intended to add several related backends to
mzR, by providing a direct wrapper around -- and full access to -- the proteowizard
msdata object. The candidate will interact closely with Laurent Gatto and Steffen Neumann, and the proteowizard and
The pwiz/mzML backend
The pwiz/mzML backend should be a drop-in replacement and pass unit tests also for the Bioconductor
MSnbase packages. Any
MSnbase modifications required will be done by Steffen Neumann and Laurent Gatto respectively. Secondly, the pwiz/mzML should provide access to the
<chromatogram>s stored in an mzML file (Martens et al. 2011).
The project also aims at facilitating access to identification data in the
mzIdentML data format (Jones et al. 2012) through the proteowizard framework. A similar backend, as currently available to raw mass spectrometry files (
mzData), will be developed for
At the end of the project, the candidate will be familiar with the major mass-spectrometry data formats and main MS toolkits used in proteomics and metabolomics. After successful completion of the project, the candidate will be added to the list of
Project attributes and estimates:
- Difficulty: medium to difficult, depending on experience and
- Skills needed: intermediate R programming, knowledge of package development helpful, good knowledge of
C++essential. The candidate will have to familiarise herself with the mass-spectrometry data, the respective data formats and the proteowizard code base.
- Deliverable: pwiz and identificaiton backends to be added to the
- Mentors: Laurent Gatto and Steffen Neuman, with additional Rcpp support from Dirk Eddelbuettel.
- References: see project description.