Skip to content

Predicting with Postnovo

Samuel E. Miller edited this page Aug 16, 2019 · 13 revisions

Postnovo de novo sequence prediction is performed by the predict subcommand. Two files are always required as input: a properly formatted MGF file and a MaRaCluster output file, MaRaCluster.clusters_p2.tsv. --frag_method (CID or HCD) and --frag_resolution (low or high) are always required.

Novor and PepNovo+ results are needed for Postnovo, and can be generated as part of the Postnovo command via DeNovoGUI. DeepNovo results are optional but recommended, and should first be generated with the predict_deepnovo subcommand.

By default, Postnovo writes a results file, best_predictions.csv, to the directory of the MGF input file, and reports sequences at least 7 amino acids long (the minimum prediction length supported by Postnovo).

  1. As part of the following command, Novor and PepNovo+ de novo sequences are generated from low-resolution spectra. A file named MaRaCluster.clusters_p2.tsv is assumed to be in the directory of the MGF file, so a --clusters argument is not given. The number of cores is increased from the default of 1 to the 32 available on the machine to take advantage of Postnovo's parallelization.

    python main.py --mgf /path/to/spectra.mgf --frag_method CID --frag_resolution low --denovogui --cpus 32

  2. The following command is similar to the last, but DeepNovo files that had been generated with predict_deepnovo are also used as input. These files should be in the directory of the MGF file, and should share a prefix with that file (i.e., spectra.mgf => spectra.0.01.tsv, spectra.0.03.tsv, etc.). The path to the MaRaCluster file is also given explicitly.

    python main.py --mgf /path/to/spectra.mgf --clusters /path/to/MaRaCluster.clusters_p2.tsv --frag_method HCD --frag_resolution high --denovogui --deepnovo --cpus 32

  3. If the Novor and PepNovo+ output files already exist, they should be like the DeepNovo files and be located in the directory of the MGF file, sharing a prefix with that file (i.e., spectra.mgf => spectra.0.01.novor.csv, spectra.0.01.mgf.out, spectra.0.03.novor.csv, spectra.0.03.mgf.out, etc.). Use this prefix as the --filename argument to indicate the files are provided.

    python main.py --mgf /path/to/spectra.mgf --frag_method HCD --frag_resolution high --filename spectra --deepnovo --cpus 32

  4. By default, de novo sequence predictions with an estimated >50% probability of sequence correctness are reported. The following command reports all predictions, of any probability. It also raises the reported minimum sequence length from 7 to 9.

    python main.py --mgf /path/to/spectra.mgf --frag_method CID --frag_resolution low --filename spectra --min_prob 0 --min_len 9 --cpus 32

Since Postnovo can find subsequences ("tags"), it has a tendency to select shorter, higher-scoring subsequences that encompass the most confidently identified amino acids when the selection of the final prediction from the set of Postnovo candidates is decided by the estimated probability of sequence correctness (score) alone. These candidates often have similar scores but differing lengths, so Postnovo allows score to be traded off for length in choosing the final prediction. The maximum score that can be sacrificed per percent change in sequence length is 0.0035, by default -- equivalent to a score sacrifice of 0.05 in order to add 1 amino acid to a length 7 sequence (0.0035 = 0.05 / (1/7 * 100)), or a sacrifice of 0.025 to add 1 amino acid to a length 14 sequence. The final prediction cannot be reduced below the minimum score of reported sequences (by default, 0.5).

  1. The following command doubles the score that can be sacrificed per unit length extension from 0.0035 to 0.007.

    python main.py --mgf /path/to/spectra.mgf --frag_method CID --frag_resolution low --denovogui --max_sacrifice_per_percent_extension 0.007 --cpus 32

  2. The following command prevents any trade-off between score and final prediction length.

    python main.py --mgf /path/to/spectra.mgf --frag_method CID --frag_resolution low --denovogui --max_sacrifice_per_percent_extension 0 --cpus 32

  3. The following command prevents score-length trade-offs that involve a sacrifice of more than 0.1 and reports all final predictions (--min_prob 0).

    python main.py --mgf /path/to/spectra.mgf --frag_method CID --frag_resolution low --denovogui --min_prob 0 --max_total_sacrifice 0.1 --cpus 32