Project to develop methods to recognize off-sample mass spectrometry images
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CNN
DHB matrix clusters
TagOff
biclustering
molecular_similarity
GS.csv
README.md

README.md

Recognizing off-sample mass spectrometry images with machine and deep learning

This repository is devoted to a computational project on recognizing so-called off-sample images in imaging mass spectrometry data. The project is carried out by the Alexandrov team at EMBL Heidelberg. We used public data from METASPACE to create a gold standard set of ion images, as well as developed and evaluated several methods for recognizing off-sample ion images.

Team:

Creating gold standard ion images

Using public METASPACE datasets

We used public datasets from METASPACE, a community-populated knowledge base of metabolite images. Please see the section Acknowledgements acknowledging contributors of the used data.

Web app for tagging ion images

TagOff was rapidly prototyped using the METASPACE codebase as a foundation, allowing its back-end, image display and annotation filtering to be reused. The TagOff-specific changes can be found in this commit range.

It can be run by starting the METASPACE webapp, then navigating to http://localhost:8999/#/imageclassifier?db=HMDB-v4&user=your_name&max=10000&ds=2016-12-07_07h59m24s. The querystring of the URL encodes the filter criteria used to select the annotations. New criteria can be created and copied from the Annotations page of METASPACE. Two other parameters exist: max and user. max limits the number of annotations shown, and user accepts a name which is added to the image labels, allowing multiple people to independently label the same image.

After annotations have been made, the data can be exported with:

sqlite3 -header -csv ./metaspace/webapp/imageclassification.sqlite "select * from imageclassifications" > ./metaspace/webapp/dist/results.csv

Data

Gold standard ion images

The images can be downloaded from AWS S3

wget https://s3-eu-west-1.amazonaws.com/sm-off-sample/GS.tar.gz
tar -xf GS.tar.gz

METASPACE knowledge base

wget https://s3-eu-west-1.amazonaws.com/sm-off-sample/pixel-annot-export-v0.10.tar.gz
tar -xf pixel-annot-export-v0.10.tar.gz

CNN methods

We trained Convolutinal Neural Networks using Fastai and PyTorch libraries. The best performance we achieved using Resnet50 CNN pretrained on Imagenet.

DHB matrix clusters

We have generated DHB matrix clusters according to (Keller and Li, 2000). This resulted in 353 molecular formulas available here.

Future steps

We are planning to integrate the best methods into https://metaspace2020.eu.

Acknowledgements

We thank the contributors of all public data to METASPACE and particularly those whose data was selected for the gold standard: Sarah Aboulmagd, Michael Becker, Dhaka Bhandari, Mark Bokhart, Berin Boughton, Shane Ellis, Mathieu Gaudin, Erin Gemperline, Cristina Gonzalez Lopez, Richard Goodwin, Anne Mette Handler, Bram Heijs, Sophie Jacobsen, Christian Janfelt, Emrys Jones, Patrik Kadesch, Pegah Khamehgir-Silz, Mario Kompauer, Lingjun Li, Manuel Liebeke, Michael Linscheid, James McKenzie, David Muddiman, Andrew Palmer, József Pánczél, Marina Reuter, Livia S. Eberlin, Veronika Saharuka, Marta Sans, Julian Schneemann, Kumar Sharma, Bernhard Spengler, Nicole Strittmatter, Zoltan Takats, Dusan Velickovic, Eric Weaver, Guanshi Zhang. The work was supported by the funding from the EU Horizon2020 project METASPACE (No. 634402), NIH NIDDK project KPMP, ERC Consolidator project METACELL (No. 773089).

License

Unless specified otherwise in file headers or LICENSE files present in subdirectories, all files in this repository are licensed under the Apache 2.0 license.