Open-source code/scripts for analysis of glycoproteomics data generated by OxoScan-MS. This repository contains all functions required to analyse OxoScan-MS data, as well as Jupyter Notebooks and R Markdown Notebooks to reproduce all analyses from raw/processed data files and generate all figures in the OxoScan-MS paper.
All analysis scripts for reported OxoScan-MS experiments are compatible with oxonium ion matrices ('maps') generated by extracting oxonium ion chromatograms across the precursor mass range in DIA-NN. To extract these files from OxoScan-MS raw files, use the --extract 204.087, [oxonium ion #2], [oxonium ion #3]
in DIA-NN. The .txt output files are directly compatible with the Python functions/scripts reported here. Raw and extracted OxoScan-MS data will be available fully upon publication, however all OxoScan output files used for analysis and figure generation are available in the respective oxoscan_analyses
folder.
The glycoproteomics
Python library provides functions for working with and manipulating glycoproteomics spectra.
It was designed to be used within iPython notebooks, although can be used in other projects.
It is split into four parts:
io
for reading in spectraspectrum
for binning, combining, and aligning spectrapeaks
for performing peak calling on the spectra, and extracting intensities from those peaksplotting
for providing some basic plotting features within iPython notebooks
Spectra are read in and passed around between functions as hierarchical Python dictionaries, with the structure:
spectra_dict[rt_value][mz_value][ion_name] = intensity
This is some boilerplate code to import the various libraries and set up matplotlib:
import os
import glycoproteomics
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib_inline
matplotlib_inline.backend_inline.set_matplotlib_formats("png")
figure_size = (8, 4)
dpi = 80
Read in the spectrum and have a look to see which ions were quantified:
spectrum = glycoproteomics.io.read_spectrum_file("tests/data/spectrum.txt.gz")
ions = glycoproteomics.spectrum.list_ions(spectrum)
print(ions)
# ['138.055', '144.066', '168.066', '186.076', '204.087', '243.026', '274.092', '292.103', '308.098', '366.139', '405.079', '485.046', '512.197', '657.235']
In order to make the spectrum easier to work with, we bin the spectrum into RT and MZ bins:
rt_x_bin_size = 0.025
mz_y_bin_size = 2.0
binned_spectrum = glycoproteomics.spectrum.bin(spectrum, rt_x_bin_size, mz_y_bin_size, np.mean)
Retention time binning parameters correspond directly to the cycle time of the Scanning SWATH method (e.g. for a 1.5s cycle, rt_x_bin_size = 0.025 min
). Precursor m/z binning corresponds to the scanning quadrupole window width (where the continuous movement is binned into the window width / 5 - i.e. 10 m/z window corresponds to mz_y_bin_size = 2.0
).
We merge the spectra from individual ions into a matrix, which we can then plot:
ion_matrix, x_label, y_label = glycoproteomics.spectrum.to_matrix(binned_spectrum, ions)
glycoproteomics.plotting.plot_ion_matrix(ion_matrix, x_label, y_label, "spectrum.txt.gz", figure_size, dpi)
plt.show()
Once in the matrix format, we can perform peak calling on the spectra to identify the top 10 peaks:
top_N_peaks = 10
# Peak quantification ellipse
x_radius = rt_x_bin_size * 3.0
y_radius = mz_y_bin_size * 5.0
# Peak exclusion ellipse (within which the centre of another peak will not be called)
x_radius_exclude = x_radius * 3.0
y_radius_exclude = y_radius * 2.0
peaks = glycoproteomics.peaks.find(ion_matrix, x_label, y_label, top_N_peaks, x_radius_exclude, x_radius_exclude)
glycoproteomics.plotting.plot_ion_matrix_with_peaks(
ion_matrix, x_label, y_label, peaks, x_radius, y_radius, "spectrum.txt.gz - Top {} peaks".format(top_N_peaks), figure_size, dpi
)
plt.show()
Set up a Python environment with glycoproteomics
installed with the following commands.
This will then let you run the various workbooks which use the library.
virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install --upgrade -r requirements.txt -e .
Run the test suite with:
pytest -k "not profiling" --cov=glycoproteomics
To profile the code, and plot a graph of which functions take the most time, run:
pytest -k profiling
python -m gprof2dot -f pstats prof/peak_integration.out | dot -Tpdf -o prof/peak_integration.pdf
Persistence code is cloned from a repository by Stefan Huber. Licensed under version 3 of the GNU Lesser General Public License.
We thank the organisers and all participants in the 2020 Crick Data Challenge for a fun and productive 2-day hackathon, where the idea and initial scripts for OxoScan-MS analysis were generated.