Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

ProteoStorm v06222018

Beyter and Lin et al. (2018). ProteoStorm: An Ultrafast Metaproteomics Database Search Framework. Cell Systems. 7, 463–467

Software Requirements

MSConvert required for peak-picking and converting RAW files to MGF format. If not using the MSConvert GUI, include the following filter for the expected TITLE format: --filter 'titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState> File:"<SourcePath>", NativeID:"<Id>"'


  1. Anaconda 2.7 or Python2.7 with numpy and psutil
$ wget
$ bash
  1. Java (1.8 or above)
$ sudo apt-get update
$ sudo apt-get install default-jre


  1. Anaconda 2.7
download from
  1. Java (1.8 or above)
download from
  1. Cygwin
download from
Add "C:\cygwin64\bin\" to environment variables-->system variables-->PATH
Add "export PATH=/cygdrive/c/Users/user/Anaconda2:$PATH" to .bashrc


python -u ./src/
	-D DatabaseDirectory (Directory containing database files in fasta format, use .fasta for the file extension)
	-S SpectralDirectory (Directory containing spectral datasets in MGF format, peak-picked and converted from RAW using MSConvert)
	-O OutputDirectory
	[-MSMS SubdirectoryName] (Name of metaproteomics dataset, Default: date_time)
	[-ms1t PrecursorMassTolerance] (e.g., 10, 20, 50, Default: 10)
	[-ms2t FragmentMassTolerance] (e.g., 0.015, 0.6 Default: 0.015)
	[-inst MS2DetectorID] (0: Low-res LCQ/LTQ, 1: Orbitrap/FTICR, 2: TOF, 3: Q-Exactive(Default))
	[-m FragmentMethodID] (1: CID, 3: HCD)
	[-S1spc S1SharedPeaksCount] (Default: 7)
	[-S2spc S2SharedPeaksCount] (Default: 7)
	[-genera 0/1] (0: Create refined protein DB using peptide-level FDR (Default), 1: genera-restriction approach)
Output: ProteoStorm_output.txt (Peptide-spectrum matches (PSMs) with p-values computed using the MS-GF+ generating function.)

If using the RefUP++ database, please see the Notes section below.


1. Download and extract demo files into ./ProteoStorm/example
2. Download Mass distribution files into ./ProteoStorm/src/DBmassDistributions
3. Run either Command 1 or 2 (genera-restriction approach) as provided below.
4. The expected output files for the test runs are located at ./ProteoStorm/example/ProteoStorm_Out/prerun_demo and ./ProteoStorm/example/ProteoStorm_Out_GeneraRestrictionApproach/prerun_demo.


CoreModule2_PeptideFiltering_Linux-x86_64.exe was compiled on CentOS 6.10 with Kernel version 2.6.32-754.2.1.el6.x86_64

Command 1

python -u ./src/ --Database ./example/fasta --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3

Command 2 (genera-restriction approach)

python -u ./src/ --Database ./example/fasta_genera_restriction_approach --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out_GeneraRestrictionApproach --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --GeneraRestrictionApproach 1 --refDBfdr 0.01 --PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt --database_partitions 400


Replace "C:/cygwin64/bin/run.exe" with corresponding path in system.

Command 1

python -u ./src/ --Database ./example/fasta --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --CygwinPATH "C:/cygwin64/bin/run.exe"

Command 2 (genera-restriction approach)

python -u ./src/ --Database ./example/fasta_genera_restriction_approach --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out_GeneraRestrictionApproach --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --GeneraRestrictionApproach 1 --refDBfdr 0.01 --PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt --database_partitions 400 --CygwinPATH "C:/cygwin64/bin/run.exe"

Alternative configurations for ProteoStorm

Alternative configurations for ProteoStorm PDF download


If using the RefUP++ database, please include the following two parameters in your command.

--PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt

--database_partitions 400

If using the genera-restriction approach, the sequence headers in your protein fasta files should have the following format:


ex: >NP_819020.1\tCoxiella\t227377


ProteoStorm: An Ultrafast Metaproteomics Database Search Framework




No releases published


No packages published
You can’t perform that action at this time.