OxoScan-MS

Open-source code/scripts for analysis of glycoproteomics data generated by OxoScan-MS. This repository contains all functions required to analyse OxoScan-MS data, as well as Jupyter Notebooks and R Markdown Notebooks to reproduce all analyses from raw/processed data files and generate all figures in the OxoScan-MS paper.

Generating input data

All analysis scripts for reported OxoScan-MS experiments are compatible with oxonium ion matrices ('maps') generated by extracting oxonium ion chromatograms across the precursor mass range in DIA-NN. To extract these files from OxoScan-MS raw files, use the --extract 204.087, [oxonium ion #2], [oxonium ion #3] in DIA-NN. The .txt output files are directly compatible with the Python functions/scripts reported here. Raw and extracted OxoScan-MS data will be available fully upon publication, however all OxoScan output files used for analysis and figure generation are available in the respective oxoscan_analyses folder.

Functionality overview

The glycoproteomics Python library provides functions for working with and manipulating glycoproteomics spectra. It was designed to be used within iPython notebooks, although can be used in other projects.

It is split into four parts:

io for reading in spectra
spectrum for binning, combining, and aligning spectra
peaks for performing peak calling on the spectra, and extracting intensities from those peaks
plotting for providing some basic plotting features within iPython notebooks

Spectra are read in and passed around between functions as hierarchical Python dictionaries, with the structure:

spectra_dict[rt_value][mz_value][ion_name] = intensity

Example

This is some boilerplate code to import the various libraries and set up matplotlib:

import os
import glycoproteomics
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats("png")
figure_size = (8, 4)
dpi = 80

Read in the spectrum and have a look to see which ions were quantified:

spectrum = glycoproteomics.io.read_spectrum_file("tests/data/spectrum.txt.gz")
ions = glycoproteomics.spectrum.list_ions(spectrum)
print(ions)
# ['138.055', '144.066', '168.066', '186.076', '204.087', '243.026', '274.092', '292.103', '308.098', '366.139', '405.079', '485.046', '512.197', '657.235']

In order to make the spectrum easier to work with, we bin the spectrum into RT and MZ bins:

rt_x_bin_size = 0.025
mz_y_bin_size = 2.0

binned_spectrum = glycoproteomics.spectrum.bin(spectrum, rt_x_bin_size, mz_y_bin_size, np.mean)

Retention time binning parameters correspond directly to the cycle time of the Scanning SWATH method (e.g. for a 1.5s cycle, rt_x_bin_size = 0.025 min). Precursor m/z binning corresponds to the scanning quadrupole window width (where the continuous movement is binned into the window width / 5 - i.e. 10 m/z window corresponds to mz_y_bin_size = 2.0).

We merge the spectra from individual ions into a matrix, which we can then plot:

ion_matrix, x_label, y_label = glycoproteomics.spectrum.to_matrix(binned_spectrum, ions)
glycoproteomics.plotting.plot_ion_matrix(ion_matrix, x_label, y_label, "spectrum.txt.gz", figure_size, dpi)
plt.show()

Once in the matrix format, we can perform peak calling on the spectra to identify the top 10 peaks:

top_N_peaks = 10

# Peak quantification ellipse
x_radius = rt_x_bin_size * 3.0
y_radius = mz_y_bin_size * 5.0

# Peak exclusion ellipse (within which the centre of another peak will not be called)
x_radius_exclude = x_radius * 3.0
y_radius_exclude = y_radius * 2.0

peaks = glycoproteomics.peaks.find(ion_matrix, x_label, y_label, top_N_peaks, x_radius_exclude, x_radius_exclude)

glycoproteomics.plotting.plot_ion_matrix_with_peaks(
    ion_matrix, x_label, y_label, peaks, x_radius, y_radius, "spectrum.txt.gz - Top {} peaks".format(top_N_peaks), figure_size, dpi
)
plt.show()

Getting set up

Set up a Python environment with glycoproteomics installed with the following commands. This will then let you run the various workbooks which use the library.

virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install --upgrade -r requirements.txt -e .

Running tests and profiling

Run the test suite with:

pytest -k "not profiling" --cov=glycoproteomics

To profile the code, and plot a graph of which functions take the most time, run:

pytest -k profiling
python -m gprof2dot -f pstats prof/peak_integration.out | dot -Tpdf -o prof/peak_integration.pdf

Acknowledgements

Persistence code is cloned from a repository by Stefan Huber. Licensed under version 3 of the GNU Lesser General Public License.

We thank the organisers and all participants in the 2020 Crick Data Challenge for a fun and productive 2-day hackathon, where the idea and initial scripts for OxoScan-MS analysis were generated.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.ipynb_checkpoints		.ipynb_checkpoints
example_workbooks		example_workbooks
glycoproteomics		glycoproteomics
oxoscan_analyses		oxoscan_analyses
paper_analysis		paper_analysis
readme_images		readme_images
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OxoScan-MS

Generating input data

Functionality overview

Example

Getting set up

Running tests and profiling

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

ehwmatt/OxoScan-MS

Folders and files

Latest commit

History

Repository files navigation

OxoScan-MS

Generating input data

Functionality overview

Example

Getting set up

Running tests and profiling

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages