- Introduction
- Input Data
- Sample Data
- Basic Installation
- JUMPptm Commands
- Test Data Exercise
- Input and Output Data Organization
JUMPptm aims to identify PTM events from unmatched spectra after conventional peptide analysis of the whole proteome. By default, JUMPptm assumes that the whole proteome data has been analyzed by the JUMP suite, which outputs the identification of peptides/proteins, and the high-quality (HQ) unmatched spectra with de novo tags. Such HQ spectra (input #1) are taken as JUMPptm input, and searched against a list of PTMs specified by the user (input #2) using the multi-stage database search strategy using Comet. The PTM list can be guided by the results from an open search using MSFragger. JUMPptm exports the PTM peptide identification with TMT-based quantification. We can also use mzXML file as the direct input instead of ms2 files.
NOTE : ptm_pipeline.py script can run entire pipeline for Platform LSF to schedule jobs on the computational cluster or on standalone version (select the correct parameter in the parameter file). As the list of PTMs gets larger or number of mzXML files increases, is better to use cluster rather than standalone version. For other platform, users may edit the job submission functions.
Note that JUMPptm only supports 64-bit macOS and Linux. Also, JUMPptm can be run without TAGs file too. You can directly use ms2 file or mzXML file and turn off the tag path in the parameter file
- publication
- If you use JUMPptm as part of a publication, please include this reference.
Poudel, S., Vanderwall, D., Yuan, Z. F., Wu, Z., Peng, J., & Li, Y. (2022). JUMPptm: Integrated software for sensitive identification of post‐translational modifications and its application in Alzheimer's disease study. Proteomics, 2100369.
JUMPptm requires different input for analyzing the PTMs :
- ms2 files - Ideally High Quality spectra; however ms2 file or mzXML file containing all spectra could be used (lower sensitivity)
- PTMs list (may be derived by using open searches or PTMs of interest) -- this can be updated in the parameter file
- denovo JUMP derived tags file (you can turn this off in the parameter file by assigning tags_input_path = 0 if you have do not have TAGS file from JUMP)
NOTE: To facilitate subsequent stages of PTM analysis after default peptide analysis, the latest JUMP suite (v1.13.1; https://github.com/JUMPSuite/JUMP) outputs three files after analyzing a whole proteome dataset: i) the accepted unique proteins (.fasta format); ii) HQ unmatched spectra generated by the spectrum QC module (.ms2 format); and iii) de novo amino acid tags for each MS2 spectrum (.tags format). The identified proteins are used to generate a customized database to restrict the search space, and the latter two files are input files for JUMPptm. Note that all the HQ spectra are mass-corrected by the JUMP analysis, allowing narrow mass tolerance (e.g., < 6 ppm) for subsequent PTM identification.
To evaluate the JUMPptm, we provide a data set composed of 2 fractions sample_data directory.
File name | Description | Total Demo Spectra and tags |
---|---|---|
w001.ms2 | High Quality ms2 file | 792 |
w010.ms2 | High Quality ms2 file | 654 |
w001.tags | jump derived denovo tags | 792 |
w010.tags | jump derived denovo tags | 654 |
A basic install is sufficient for multicore laptops, desktops,
workstations and servers. For a basic install, use the bootstrap.sh
script provided in the repo. Then,
- Place the JUMPptm distribution source in the desired location (call
this
<path to JUMPptm>
) - Change your working directory to the top level of the JUMPptm install
and run
bash bootstrap.sh
- Obtaining JUMPptm source You can obtain the latest version of JUMP from git; simple clone the git repository:
git clone https://github.com/surPoudel/JUMP-ptm.git
in the directory where you would like JUMPptm to be installed (call this directory <path to JUMPptm>
). Note
that JUMPptm does not support out-of-place installs; the JUMPptm git
repository is the entire installation.
- Bootstrapping
To get dependencies installed, we recommend Conda, and we have
provided a bootstrapping script
bootstrap.sh
that downloads all dependencies and installs them alongside JUMPptm. Execute
./bootstrap.sh
and it will create a new directory JUMPptm
in the current working
directory. The directory JUMPptm
will contain the conda
environment to be used by JUMPptm. JUMPptm will be set up to use the PERL
and python interpreters in that environment.
Once bootstrap.sh
is finished, activate the conda environment
conda activate $PWD/JUMPptm
Note: Comet version 2021 binaries are added here [1]. If user wants different version of Comet. They could simply replace with exact name comet_linux_2021 or comet_mac_2021
Once the conda environment (JUMPptm) is activated
- make a working directory
- keep all the ms2 files or mzXML files and tags file (optional) in the same directory
- copy the parameter file (ptm_pipeline.params) from parameterFiles to the same directory
- make necessary changes for the parameters (including PTM searches stages)
- Run the command below
python /path of JUMPptm/ptm_pipeline.py ptm_pipeline.params
-
Make a directory called Test
-
Copy w001.ms2 and w001.tags files from sample_data to Test
-
Copy ptm_pipeline.params from parameterFiles to Test
-
Open the parameter file copied from step # 3 and change the following information
- ms2_input_path = replace the path with the folder that has sample_data for eg. /home/spoudel1/JUMP-ptm/sample_data
- tags_input_path = replace the path with the folder that has sample_data for eg. /home/spoudel1/JUMP-ptm/sample_data
- database_name = replace the path with the folder that has the sample database name for eg. /home/spoudel1/Database_AD_Banner/humanComprehensive_v2_ft_mc2_c57_TMTpro.fasta
Download database
- pitfile = replace the path with the folder that has the sample pitfile name for eg. /home/spoudel1/Database_AD_Banner/humanComprehensive_v2_ft_mc2_c57_TMTpro.pit
Download pitfile
Download test database and pitfile here database_pitfile
-
Execute
python /path of JUMPptm/ptm_pipeline.py ptm_pipeline.params
eg. python /home/spoudel/JUMP-ptm/ptm_pipeline.py ptm_pipeline.params
Note: This will automatically run Stage_1 and Stage_2 (Phosphorylation and Deamidation)
.
├── Pipeline_Results_OUTPUT_FOLDER # Output folder that contains pipeline results (suffixed by Pipeline_Results_)
│ ├── comet.params.new # comet search parameter file (template)
│ ├── ptm_pipeline.log # ptm pipeline log file
│ ├── ptm_pipeline.params # ptm pipeline parameter file is copied inside the results folder for record
│ ├── merge_and_consolidation # Folder that have results after merging and consolidation of PSMS from each stages
│ │ ├── ID.txt # merged IDs from all stages
│ │ ├── jump_fq_merged.params # automatically generated quantification parameter file
│ │ ├── publications # merged filtering results for peptide identification
│ │ │ ├── id_all_pep.txt # merged peptides identification (all proteins)
│ │ │ └── id_uni_pep.txt # merged peptides identification (unique proteins)
│ │ ├── quan_HH_tmt10_human_comet # quantification results folder
│ │ │ └── publications # folder containing quantification of peptides
│ │ │ ├── id_all_pep_quan.txt # peptides mapped to all proteins
│ │ │ └── id_uni_pep_quan.txt # peptides mapped to unique protein
│ │ └── results_table
│ │ └── Pan_PTM_Quan_Table.xlsx # Pan PTM output excel file
│ ├── Stage_1 # Stage_1 search results based on parameter file description
│ │ ├── jump_fc_Stage_1_FDR_1.params # filtering parameter file for stage 1
│ │ ├── stage_1_comet.params # search comet parameter file (customized automatically by program based on parameter file)
│ │ ├── sum_Stage_1_FDR_1 # filtering result folder
│ │ │ ├── ID.txt # identified PSMS (Target only) -- at given FDR
│ │ │ ├── IDwDecoy.txt # identified PSMS (Target + Decoy) -- at ven FDR
│ │ │ ├── publications # folder containing tables file for unique peptide and all peptides
│ │ │ │ ├── id_all_pep_1FDR.txt
│ │ │ │ ├── id_all_pep.txt
│ │ │ │ ├── id_all_prot_1FDR.txt
│ │ │ │ ├── id_all_prot.txt
│ │ │ │ ├── id_uni_pep_1FDR.txt
│ │ │ │ ├── id_uni_pep.txt
│ │ │ │ ├── id_uni_prot_1FDR.txt
│ │ │ │ └── id_uni_prot.txt
│ │ │ └── simplified_report
│ │ │ └── id_uni_prot.txt
│ │ └── w001 # example fraction name searched by the pipeline -- program makes separate folder for each fraction
│ │ ├── comet.params # search parameter file is copied
│ │ ├── Results_start_scan_0_end_scan_0_min_tag_len_2 # folder containing tags matched intermediate files
│ │ │ ├── expect_minusLog10.pdf
│ │ │ ├── expect_minusLog10.png
│ │ │ ├── spectrum_tag_count.txt
│ │ │ ├── spectrum_unique_tag_table.txt
│ │ │ ├── tag_qc.params
│ │ │ ├── Total_Tag_matched.pdf
│ │ │ ├── Total_Tag_matched.png
│ │ │ ├── w001_reordered_final.pickle
│ │ │ ├── xcorr.pdf
│ │ │ └── xcorr.png
│ │ ├── search_log.txt # comet search log
│ │ ├── tag_match.log # tag match program log file
│ │ ├── tag_qc.params # tag match parameter file
│ │ ├── w001.1.pep.xml # pep.xml file output with tag match information
│ │ ├── w001.1.txt # search file in txt file format
│ │ └── w001.ms2 -> /home/spoudel1/conda_work/test/w001.ms2 # input ms2 softlinked to search fraction folder
│ └── Stage_2
│ .
│ .
│ .
│ .
│
├── ptm_pipeline.params # input parameter file
├── w001.1.tags # input tag file
└── w001.ms2 # input ms2 file
Pan_PTM_Quan_Table.xlsx --- concentanated Pan PTM output file
NOTE: The ID.txt file in merge_and_consolidation folder is modified for the sake of concatenation of different stages. The peptides have Z alphabet appended at the Cterminus that designates the stage. Z = Stage_1; ZZ = Stage_2 etc. The original peptide sequence is also retained. This helps in accurate quantification of peptides using the psms that belongs to specfic stage (so we get unique stagewise psms)
[1] Eng, Jimmy K., Tahmina A. Jahan, and Michael R. Hoopmann. "Comet: an open‐source MS/MS sequence database search tool." Proteomics 13.1 (2013): 22-24.
-
To submit bug reports and feature suggestions, please contact
Suresh Poudel (suresh.poudel@stjude.org)