Post-processing peptide de novo sequences to improve their accuracy
Journal article: https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00278
Much more detail on Postnovo options can be found in the Postnovo Wiki pages.
-
Postnovo runs on a Unix-like OS. Use a powerful server or desktop with >25 GB available disk space.
-
Download and decompress the latest release.
-
Use Python 3. The Anaconda distribution contains all necessary package dependencies.
-
Download DeNovoGUI and large pre-trained models with Postnovo
setup
.For low-res MS2 data (example uses
nohup
to avoid termination upon logout and&
to run in background):nohup python main.py setup --denovogui --postnovo_low --deepnovo_low &
For high-res MS2 data:
python main.py setup --denovogui --postnovo_high --deepnovo_high
-
Use the ProteoWizard msconvert tool to convert your RAW file to an MGF file with a certain spectrum header format.
msconvert preformatted_spectra.raw --mgf --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"
Add additional filters as needed, here for peak picking and removal of zero intensity peaks.
msconvert preformatted_spectra.raw --mgf --filter "peakPicking vendor" --filter "zeroSamples removeExtra" --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"
-
Reformat the input MGF file for compatability with all de novo sequencing tools.
python main.py format_mgf --mgfs /path/to/preformatted_spectra.mgf --out /path/to/spectra.mgf
-
Set up a container with TensorFlow to run DeepNovo (optional but recommended for Postnovo). Superuser privileges may be required.
singularity build tensorflow.simg docker://tensorflow/tensorflow:latest
-
Generate DeepNovo de novo sequences (can take up to ~12 hours depending on resources). The following examples consider low-res MS2 spectra. Processing high-res spectra requires more memory (see Predicting with Deepnovo). Postnovo only supports standard fixed C and variable M PTMs at the moment.
Using a single machine with 32 cores:
python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 32
Using a compute cluster via Slurm with 16 cores per node and sufficient memory allocation:
python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 16 --slurm --partition partition_with_16GB_mem --time_limit 36
-
Download MaRaCluster to cluster spectra by peptide species. As MaRaCluster input, use the reformatted MGF file. Set "log10(p-value) threshold" (in the GUI) or "
--pvalThreshold
" (in the CLI) to -2. -
Run Postnovo, which in the process generates Novor and PepNovo+ de novo sequences via DeNovoGUI (can take up to ~3 hours depending on resources). Results are written by default to the directory of the MGF input.
python main.py predict --mgf /path/to/spectra.mgf --clusters /path/to/MaRaCluster.clusters_p2.tsv --frag_method CID --frag_resolution low --denovogui --deepnovo --cpus 32
Copyright 2018, Samuel E. Miller. All rights reserved.
Postnovo is publicly available for non-commercial uses.
Licensed under GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.
See postnovo/LICENSE.txt.