Skip to content

ProteoStorm: An Ultrafast Metaproteomics Database Search Framework

License

Notifications You must be signed in to change notification settings

miinslin/ProteoStorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProteoStorm v06222018

Beyter and Lin et al. (2018). ProteoStorm: An Ultrafast Metaproteomics Database Search Framework. Cell Systems. 7, 463–467

Software Requirements

MSConvert required for peak-picking and converting RAW files to MGF format. If not using the MSConvert GUI, include the following filter for the expected TITLE format: --filter 'titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState> File:"<SourcePath>", NativeID:"<Id>"'

Linux

  1. Anaconda 2.7 or Python2.7 with numpy and psutil
$ wget https://repo.continuum.io/archive/Anaconda2-5.1.0-Linux-x86_64.sh
$ bash Anaconda2-5.1.0-Linux-x86_64.sh
  1. Java (1.8 or above)
$ sudo apt-get update
$ sudo apt-get install default-jre

Windows

  1. Anaconda 2.7
download from https://www.anaconda.com/download/?lang=en-us
  1. Java (1.8 or above)
download from https://java.com/en/download/
  1. Cygwin
download from https://cygwin.com/install.html
Add "C:\cygwin64\bin\" to environment variables-->system variables-->PATH
Add "export PATH=/cygdrive/c/Users/user/Anaconda2:$PATH" to .bashrc

Usage

python -u ./src/ProteoStorm.py
	-D DatabaseDirectory (Directory containing database files in fasta format, use .fasta for the file extension)
	-S SpectralDirectory (Directory containing spectral datasets in MGF format, peak-picked and converted from RAW using MSConvert)
	-O OutputDirectory
	[-MSMS SubdirectoryName] (Name of metaproteomics dataset, Default: date_time)
	[-ms1t PrecursorMassTolerance] (e.g., 10, 20, 50, Default: 10)
	[-ms2t FragmentMassTolerance] (e.g., 0.015, 0.6 Default: 0.015)
	[-inst MS2DetectorID] (0: Low-res LCQ/LTQ, 1: Orbitrap/FTICR, 2: TOF, 3: Q-Exactive(Default))
	[-m FragmentMethodID] (1: CID, 3: HCD)
	[-S1spc S1SharedPeaksCount] (Default: 7)
	[-S2spc S2SharedPeaksCount] (Default: 7)
	[-genera 0/1] (0: Create refined protein DB using peptide-level FDR (Default), 1: genera-restriction approach)
	
Output: ProteoStorm_output.txt (Peptide-spectrum matches (PSMs) with p-values computed using the MS-GF+ generating function.)

If using the RefUP++ database, please see the Notes section below.

Demo

1. Download and extract demo files into ./ProteoStorm/example
2. Download Mass distribution files into ./ProteoStorm/src/DBmassDistributions
3. Run either Command 1 or 2 (genera-restriction approach) as provided below.
4. The expected output files for the test runs are located at ./ProteoStorm/example/ProteoStorm_Out/prerun_demo and ./ProteoStorm/example/ProteoStorm_Out_GeneraRestrictionApproach/prerun_demo.

Linux

CoreModule2_PeptideFiltering_Linux-x86_64.exe was compiled on CentOS 6.10 with Kernel version 2.6.32-754.2.1.el6.x86_64

Command 1

python -u ./src/ProteoStorm.py --Database ./example/fasta --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3

Command 2 (genera-restriction approach)

python -u ./src/ProteoStorm.py --Database ./example/fasta_genera_restriction_approach --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out_GeneraRestrictionApproach --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --GeneraRestrictionApproach 1 --refDBfdr 0.01 --PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt --database_partitions 400

Windows

Replace "C:/cygwin64/bin/run.exe" with corresponding path in system.

Command 1

python -u ./src/ProteoStorm.py --Database ./example/fasta --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --CygwinPATH "C:/cygwin64/bin/run.exe"

Command 2 (genera-restriction approach)

python -u ./src/ProteoStorm.py --Database ./example/fasta_genera_restriction_approach --Spectra ./example/mgf --RemoveSpectra ./example/HS_matched_spectra.txt --SpectralDataset "demo" --output ./example/ProteoStorm_Out_GeneraRestrictionApproach --PrecursorMassTolerance 10 --FragmentMassTolerance 0.015 --InstrumentID 3 --FragmentMethodID 3 --GeneraRestrictionApproach 1 --refDBfdr 0.01 --PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt --database_partitions 400 --CygwinPATH "C:/cygwin64/bin/run.exe"

Alternative configurations for ProteoStorm

Alternative configurations for ProteoStorm PDF download

Notes

If using the RefUP++ database, please include the following two parameters in your command.

--PepMassDistribution ./src/DBmassDistributions/RefUp_2872778677.txt

--database_partitions 400

If using the genera-restriction approach, the sequence headers in your protein fasta files should have the following format:

>[sequence_identifier]\t[genus]\t[ncbi_taxonomyID]

ex: >NP_819020.1\tCoxiella\t227377

About

ProteoStorm: An Ultrafast Metaproteomics Database Search Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages