Skip to content

semiller10/postnovo

Repository files navigation

Postnovo Logo

Post-processing peptide de novo sequences to improve their accuracy

Journal article: https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00278

Quick Start

Much more detail on Postnovo options can be found in the Postnovo Wiki pages.

  1. Postnovo runs on a Unix-like OS. Use a powerful server or desktop with >25 GB available disk space.

  2. Download and decompress the latest release.

  3. Use Python 3. The Anaconda distribution contains all necessary package dependencies.

  4. Download DeNovoGUI and large pre-trained models with Postnovo setup.

    For low-res MS2 data (example uses nohup to avoid termination upon logout and & to run in background):

    nohup python main.py setup --denovogui --postnovo_low --deepnovo_low &

    For high-res MS2 data:

    python main.py setup --denovogui --postnovo_high --deepnovo_high

  5. Use the ProteoWizard msconvert tool to convert your RAW file to an MGF file with a certain spectrum header format.

    msconvert preformatted_spectra.raw --mgf --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"

    Add additional filters as needed, here for peak picking and removal of zero intensity peaks.

    msconvert preformatted_spectra.raw --mgf --filter "peakPicking vendor" --filter "zeroSamples removeExtra" --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"

  6. Reformat the input MGF file for compatability with all de novo sequencing tools.

    python main.py format_mgf --mgfs /path/to/preformatted_spectra.mgf --out /path/to/spectra.mgf

  7. Set up a container with TensorFlow to run DeepNovo (optional but recommended for Postnovo). Superuser privileges may be required.

    singularity build tensorflow.simg docker://tensorflow/tensorflow:latest

  8. Generate DeepNovo de novo sequences (can take up to ~12 hours depending on resources). The following examples consider low-res MS2 spectra. Processing high-res spectra requires more memory (see Predicting with Deepnovo). Postnovo only supports standard fixed C and variable M PTMs at the moment.

    Using a single machine with 32 cores:

    python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 32

    Using a compute cluster via Slurm with 16 cores per node and sufficient memory allocation:

    python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 16 --slurm --partition partition_with_16GB_mem --time_limit 36

  9. Download MaRaCluster to cluster spectra by peptide species. As MaRaCluster input, use the reformatted MGF file. Set "log10(p-value) threshold" (in the GUI) or "--pvalThreshold" (in the CLI) to -2.

  10. Run Postnovo, which in the process generates Novor and PepNovo+ de novo sequences via DeNovoGUI (can take up to ~3 hours depending on resources). Results are written by default to the directory of the MGF input.

    python main.py predict --mgf /path/to/spectra.mgf --clusters /path/to/MaRaCluster.clusters_p2.tsv --frag_method CID --frag_resolution low --denovogui --deepnovo --cpus 32

Copyright 2018, Samuel E. Miller. All rights reserved.

Postnovo is publicly available for non-commercial uses.

Licensed under GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.

See postnovo/LICENSE.txt.

About

Post-processing peptide de novo sequences to improve their accuracy

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages