Skip to content
Samuel E. Miller edited this page Jul 26, 2019 · 8 revisions
  1. DeepNovo is a powerful de novo sequencing tool that is recommended to be used in conjunction with the default DeNovoGUI tools (Novor and PepNovo+) in Postnovo. DeepNovo finds sequences using neural networks in TensorFlow. The version of DeepNovo used by Postnovo was modified to return 20 rather than 1 candidate sequence prediction per spectrum.

    a. Postnovo runs DeepNovo in a Singularity container containing TensorFlow. Superuser privileges may be required to build the Singularity container image (contact your sysadmin as needed). In Singularity versions starting with v.2.4, import and build the Docker TensorFlow image with the following command.

    singularity build tensorflow.simg docker://tensorflow/tensorflow:latest

    In Singularity versions prior to v.2.4, the build command is not used, and the following commands can be used instead. The size option specifies the size of the image in MB.

    singularity create --size 4000 tensorflow.simg singularity import tensorflow.simg docker://tensorflow/tensorflow:latest

    b. Large files are needed to run DeepNovo. The following Postnovo command downloads and sets up default low- and high-resolution DeepNovo models.

    python main.py setup --deepnovo_low --deepnovo_high --container /path/to/tensorflow.simg

  2. Postnovo's setup subcommand is used to download and set up associated tools and large files, including DeNovoGUI (which contains the de novo sequencing tools, Novor and PepNovo+); pre-trained low- and high-resolution models for DeepNovo and a large file required to run DeepNovo; the underlying training spectra of DeepNovo models; pre-trained low- and high-resolution Postnovo models and their feature data; the underlying training spectra of Postnovo models; and MSGF+, the default database search tool used for Postnovo and DeepNovo training and testing.

    a. The list of files that have been updated since the user's last download or that have not been downloaded at all is displayed with the following command.

    python main.py setup --check_updates

    b. Here are some setup examples.

    Download and set up DeNovoGUI; MSGF+; the DeepNovo low- and high-resolution models; and the Postnovo low- and high-resolution models.

    python main.py setup --denovogui --msgf --deepnovo_low --deepnovo_high --container /path/to/singularity.simg --postnovo_low --postnovo_high

    Download the MGF files of spectra used to train the DeepNovo low- and high-resolution models and the Postnovo low- and high-resolution models. The MGF files are placed in the directories, postnovo/DeepNovo/, postnovo/train/low/, and postnovo/train/high/.

    python main.py setup --deepnovo_low_spectra --deepnovo_high_spectra --postnovo_low_spectra --postnovo_high_spectra

  3. Here is documentation of the spectra used to train the default DeepNovo and Postnovo models.

    a. DeepNovo training spectra were randomly subsampled from the yeast datasets (low-resolution file from Hebert et al. and high-resolution project PXD003868 file BY_04_1) employed in the DeepNovo paper, in line with how models were trained in that study. Low-resolution spectra were subsampled and split into training (100,000 spectra), validation (18,000), and testing (2,000) files. High-resolution spectra were subsampled and split into training (40,000 spectra), validation (9,000), and testing (1,000) files. Fewer high- than low-resolution spectra were sampled to ensure that training didn't take too long. Peptide sequences were assigned to spectra using MSGF+ (1% FDR). Although the deamidation of N and Q were allowed in the database search, as in the study, spectra assigned peptide sequences with these PTMs were excluded from the DeepNovo training data, as Postnovo currently only allows the standard C and M PTMs.

    b. Postnovo models were constructed from multiple datasets. Peptide sequences were assigned to spectra using MSGF+ (1% FDR), with standard C and M PTMs.

    The low-resolution model was constructed from the following datasets: E. coli, R. palustris, D. vulgaris and Synechococcus sp. 7803 (project MSV000082266); H. sapiens (project PXD002179 file CHPP_SDS_3001); D. melanogaster (project PXD004120 file MM_BN_4a).

    The high-resolution model was constructed from the following datasets: H. sapiens (project PXD004424 files 151009_exo4_2, 151009_exo4_1_3h, 151009_exo4_2_3h -- merged into one MGF file); M. musculus (project PXD004948 file 20160323_CoAN_CTRL1_3372]); A. mellifera (project PXD004467 file S-1, project PXD007824 files ITB2d2, ITBF1, ITBN1, ITBNE1, RJB2d1, RJBF1, RJBN1, RJBNE1); S. lycopersicum (project PXD004947 files 03022016_Clara_MP_Fraction_02 to _09); S. cerevisiae (project PXD011695 file dbrademan_Yeast_HCD); B. subtilis (project PXD004565 file 150710_QEp_PK_Bsub_DG_Br1); M. mazei (project PXD004325 File Mm2DLC_N_1_01).