Example

MatthewThe edited this page Nov 7, 2016 · 6 revisions

To verify that Percolator is installed correctly, we can process one of the data sets used in Percolator’s original publication available from the Noble lab web site.

First, install both the percolator and the percolator-converters package from the latest release (or when building from source, run cmake; make; make install in both the percolator root directory, as well as the src/converters directory). Then, simply run the command sequence below:


  $ mkdir test; cd test
  $ wget -q http://noble.gs.washington.edu/proj/percolator/data/yeast-01.sqt.tar.gz
  $ tar xvzf yeast-01.sqt.tar.gz
  $ sqt2pin -o pin.tab yeast-01.sqt yeast-01.shuffled.sqt
  $ percolator -X pout.xml pin.tab > yeast-01.psms
  Percolator version 1.15, Build Date Oct 21 2010 10:43:31
  Copyright (c) 2006-9 University of Washington. All rights reserved.
  Written by Lukas Käll (lukall@u.washington.edu) in the
  Department of Genome Sciences at the University of Washington.
  Issued command:
  percolator -E pin.xml -X pout.xml
  Started Thu Oct 21 13:22:36 2010 on unknown_host
  Hyperparameters fdr=0.01, Cpos=0, Cneg=0, maxNiter=10
  Train/test set contains 69705 positives and 69705 negatives, size ratio=1 and pi0=1
  selecting cpos by cross validation
  selecting cneg by cross validation
  Estimating 7086 over q=0.01 in initial direction
  Reading in data and feature calculation took 46.96 cpu seconds or 47 seconds wall time
  ---Training with Cpos selected by cross validation, Cneg selected by cross validation, fdr=0.01
  Iteration 1	: After the iteration step, 10212[1] target PSMs with q<0.01 were estimated by cross validation
  Iteration 2	: After the iteration step, 11256 target PSMs with q<0.01 were estimated by cross validation
  Iteration 3	: After the iteration step, 11564 target PSMs with q<0.01 were estimated by cross validation
  Iteration 4	: After the iteration step, 11667 target PSMs with q<0.01 were estimated by cross validation
  Iteration 5	: After the iteration step, 11706 target PSMs with q<0.01 were estimated by cross validation
  Iteration 6	: After the iteration step, 11714 target PSMs with q<0.01 were estimated by cross validation
  Iteration 7	: After the iteration step, 11716 target PSMs with q<0.01 were estimated by cross validation
  Iteration 8	: After the iteration step, 11741 target PSMs with q<0.01 were estimated by cross validation
  Iteration 9	: After the iteration step, 11742 target PSMs with q<0.01 were estimated by cross validation
  Iteration 10	: After the iteration step, 11747 target PSMs with q<0.01 were estimated by cross validation
  Obtained weights (only showing weights of first cross validation set)
  first line contains normalized weights, second line the raw weights
  lnrSp	 deltLCn deltCn	Xcorr	 Sp	   IonFrac Mass	   PepLen Charge1 Charge2 Charge3 enzN enzC enzInt lnNumSP dM	 absdM	   m0
  -0.454 0.094	 0.586	0.554[2] -0.044	   -0.0211 0.757   -0.647 0.138	  0.0316  -0.0599 1.19 1.22 -1.7   -0.0401 0.246 -0.222[3] -6.23
  -0.244 0.562	 6.71	0.921	 -0.000166 -0.13   0.00125 -0.118 1.35	  0.0632  -0.12	  3.01 2.79 -1.27  -1.29   0.382 -0.601	   7.92
  After all training done, 11621 target PSMs with q<0.01 were found when measuring on the test set
  Found 11621 target PSMs scoring over 1% FDR level on testset
  Merging results from 3 datasets
  Selecting pi_0=0.813 [4]
  Calibrating statistics - calculating q values
  New pi_0 estimate on merged list gives 11854 PSMs over q=0.01
  Calibrating statistics - calculating Posterior error probabilities (PEPs)
  Processing took 81.07 cpu seconds or 82 seconds wall time

Here we have labelled a couple of interesting features in the output, referenced with superscripts above:
[1] Here the number of PSMs over a q-value of 0.01 are estimated by cross-validation
[2] The weight of XCorr is positive – indicative of a high xcorr gives a better hit – that is a good indication
[3] The weight of the absdM is negative – indicative that large differences between observed and calculated mass gives a worse score – that is good
[4] We estimate that 81.3% of the PSMs are incorrect matches

System tests

A series of scripts that execute system tests are shipped with the source code, which verify that several frequently used modes of operations have the intended behavior and performance. However, they only work if the binaries were built from source with CMake and require Python to be installed. You can then run ctest -V in the corresponding build directory, e.g. <build-directory>/percolator or <build-directory>/converters and a test report will be generated.