Recognizing off-sample mass spectrometry images with machine and deep learning
This repository is devoted to a computational project on recognizing so-called off-sample images in imaging mass spectrometry data. The project is carried out by the Alexandrov team at EMBL Heidelberg. We used public data from METASPACE to create a gold standard set of ion images, as well as developed and evaluated several methods for recognizing off-sample ion images.
- Katja Ovchinnikova: biclustering and molecular co-localization method development, gold standard preparation
- Vitaly Kovalev: deep learning method development
- Lachlan Stuart: development of the TagOff web app
- Theodore Alexandrov: supervision, gold standard preparation
Creating gold standard ion images
Using public METASPACE datasets
We used public datasets from METASPACE, a community-populated knowledge base of metabolite images. Please see the section Acknowledgements acknowledging contributors of the used data.
Web app for tagging ion images
TagOff was rapidly prototyped using the METASPACE codebase as a foundation, allowing its back-end, image display and annotation filtering to be reused. The TagOff-specific changes can be found in this commit range.
It can be run by starting the METASPACE webapp,
then navigating to http://localhost:8999/#/imageclassifier?db=HMDB-v4&user=your_name&max=10000&ds=2016-12-07_07h59m24s.
The querystring of the URL encodes the filter criteria used to select the annotations.
New criteria can be created and copied from the Annotations page of METASPACE.
Two other parameters exist:
max limits the number of annotations shown, and
user accepts a name
which is added to the image labels, allowing multiple people to independently label the same image.
After annotations have been made, the data can be exported with:
sqlite3 -header -csv ./metaspace/webapp/imageclassification.sqlite "select * from imageclassifications" > ./metaspace/webapp/dist/results.csv
Gold standard ion images
The images can be downloaded from AWS S3
wget https://s3-eu-west-1.amazonaws.com/sm-off-sample/GS.tar.gz tar -xf GS.tar.gz
METASPACE knowledge base
wget https://s3-eu-west-1.amazonaws.com/sm-off-sample/pixel-annot-export-v0.10.tar.gz tar -xf pixel-annot-export-v0.10.tar.gz
We trained Convolutinal Neural Networks using Fastai and PyTorch libraries. The best performance we achieved using Resnet50 CNN pretrained on Imagenet.
DHB matrix clusters
We are planning to integrate the best methods into https://metaspace2020.eu.
We thank the contributors of all public data to METASPACE and particularly those whose data was selected for the gold standard: Sarah Aboulmagd, Michael Becker, Dhaka Bhandari, Mark Bokhart, Berin Boughton, Shane Ellis, Mathieu Gaudin, Erin Gemperline, Cristina Gonzalez Lopez, Richard Goodwin, Anne Mette Handler, Bram Heijs, Sophie Jacobsen, Christian Janfelt, Emrys Jones, Patrik Kadesch, Pegah Khamehgir-Silz, Mario Kompauer, Lingjun Li, Manuel Liebeke, Michael Linscheid, James McKenzie, David Muddiman, Andrew Palmer, József Pánczél, Marina Reuter, Livia S. Eberlin, Veronika Saharuka, Marta Sans, Julian Schneemann, Kumar Sharma, Bernhard Spengler, Nicole Strittmatter, Zoltan Takats, Dusan Velickovic, Eric Weaver, Guanshi Zhang. The work was supported by the funding from the EU Horizon2020 project METASPACE (No. 634402), NIH NIDDK project KPMP, ERC Consolidator project METACELL (No. 773089).
Unless specified otherwise in file headers or LICENSE files present in subdirectories, all files in this repository are licensed under the Apache 2.0 license.