Skip to content

Testing Postnovo

Samuel E. Miller edited this page Aug 16, 2019 · 8 revisions

The Postnovo test subcommand tests the algorithm's predictive models. Testing requires that de novo sequence predictions are cross-validated against FDR-controlled PSMs. Postnovo recognizes FDR-controlled results in the TSV format produced by MSGF+ database search. Postnovo test can run MSGF+, which should first be downloaded by Postnovo setup. MSGF+ should be run with the MGF file produced by Postnovo format_mgf.

The --test_plots option produces precision-recall and precision-yield plots that evaluate model accuracy. The definitions of precision and recall are not entirely straightforward, so three methods are used to calculate classification statistics, as described in the diagrams at the bottom of the page.

  1. The following command tests the Postnovo low-resolution with a properly formatted MGF file), running MSGF+ in the process (see Training Postnovo for more examples), and writing plots to the directory of the MGF input.

    python main.py test --mgf /path/to/dataset1.mgf --clusters /path/to/MaRaCluster.clusters_p2.tsv --precursor_mass_tol 20 --frag_method CID --frag_resolution low --denovogui --deepnovo --ref_fasta reference_proteins.fasta --msgf --test_plots --cpus 32

  2. The leave_one_out option of the Postnovo train subcommand does thorough model testing, but produces different plots showing true vs. predicted accuracy of predictions.

Classification Statistics Diagram 1

Classification Statistics Diagram 2