QRePS is a tool for shotgun proteomics that performs statistical analysis on NSAF values in the results of proteome analysis. QRePS visualizes results of statistical testing with volcano plot and selects differentially regulated proteins (DRP) with different methods (listed below). Based on selected features QRePS calculates set of proteomic metrics and performs GO terms enrichment analysis with the use of STRING. New version of QRePS calculates proteomic metrics and performs GO analysis for DirectMS1Quant results.
QRePS proviDes three methods to select DRP:
- static - fold change and fdr thresholds are given by user
- semi-dynamic - fold change threshold is given by user and fdr threshold is calculated according to outliers rule: Q3 + 1.5 IQR
- dynamic - lower and upper fold change thresholds are calculated as Q1 - 1.5 IQR and Q3 + 1.5 IQR respectively, fdr threhold is calculated as in semi-dynamic
QRePS calculates following metrics
Install from PyPI
pip install QRePS
Alternatively, you can install directly from GitHub::
pip install git+https://github.com/kazakova/Metrics
usage: qreps [-h]
(--sample-file SAMPLE_FILE | --quantitation-file QUANTITATION_FILE | --ms1-file MS1_FILE)
[--pattern PATTERN] [--labels LABELS [LABELS ...]]
[--input-dir INPUT_DIR] [--output-dir OUTPUT_DIR]
[--imputation {kNN,MinDet}] [--max-mv MAX_MV]
[--thresholds {static,semi-dynamic,dynamic,ms1}]
[--regulation {UP,DOWN,all}] [--species SPECIES]
[--goplot-format GOPLOT_FORMAT] [--fold-change FOLD_CHANGE]
[--alpha ALPHA] [--fasta-size FASTA_SIZE] [--report REPORT]
options:
-h, --help show this help message and exit
--sample-file SAMPLE_FILE
Path to sample file.
--quantitation-file QUANTITATION_FILE
Path to quantitative analysis results file.
--ms1-file MS1_FILE Path to DirectMS1Quant results file.
--pattern PATTERN Input files common endpattern. Default is "_protein_groups.tsv".
--labels LABELS [LABELS ...]
Groups to compare.
--input-dir INPUT_DIR
--output-dir OUTPUT_DIR
Directory to store the results. Default value is current directory.
--imputation {kNN,MinDet}
Missing value imputation method.
--max-mv MAX_MV Maximum ratio of missing values.
--thresholds {static,semi-dynamic,dynamic,ms1}
DE thresholds method.
--regulation {UP,DOWN,all}
Target group of DE proteins.
--species SPECIES NCBI species identifier. Default value is 9606 (H. sapiens).
--goplot-format GOPLOT_FORMAT
GO plot output format. Options: "svg", "png", "both",
"none". Default: "svg"
--fold-change FOLD_CHANGE
Fold change threshold.
--alpha ALPHA False discovery rate threshold.
--fasta-size FASTA_SIZE
Number of proteins in database for enrichment calculation.
--report REPORT Generate report.txt file, default False.
QRePS can be used in three different ways:
- Perform quantitative analysis (--input-dir, --pattern, --imputation, --sample-file parameters)
- Use external quantitative analysis results (--quantitation-file)
- Use results of MS1-based quantitative analysis (--ms1-file)
Input files for quantitative analysis should contain following columns:
- 'dbname' (i.e. sp|P14866|HNRPL_HUMAN)
- 'description' (i.e. Heterogeneous nuclear ribonucleoprotein L OS=Homo sapiens OX=9606 GN=HNRNPL PE=1 SV=2)
- 'NSAF' We suggest using Scavager protein_groups result files. If you use something else, you should specify what files are to be taken from --input-dir with common endpattern --pattern.
Quantitation file should contain 'log2(fold_change)', '-log10(fdr_BH)', 'Gene', 'Protein' columns
MS1 file should be *quant_full.tsv file from DirectMS1Quant results.
QRePS tool needs a sample file and at least one data file for each of groups to perform quantitative analysis. Sample file should be comma-separated and contain columns 'File Name' and 'SampleID'.
Input directory can be given either with --input_dir or with 'File Name' in sample file. If both --input-dir and path with sample file are given, directory given with --input-dir will be used.
Pattern may or may not be included in 'File Name' (see example).
SampleID contain labels of groups to be compared and should match those given by --labels.
QRePS produces the following files:
- volcano plot (volcano.png)
- missing value ration distribution plot (NaN_distribution.png) (only if quantitative analysis is performed)
- summary table with the results of statistical testing (Quant_res.tsv)
- summary table with differentially regulated genens (DRF.tsv)
- symmary table with calculated proteomic metrics (metrics.tsv)
- summary table with the results of GO terms enrichment analysis (GO_res.tsv)
- STRING network plot (GO_network.svg)
- report file (report.txt if --report True)
Input and output files can be found in /example
- Quantiative analysis
qreps --sample-file example_1/a172_dbtrg_sample.csv --labels DBTRG_I,DBTRG_K A172_I,A172_K --input-dir example_1 --output-dir example_1 --imputation kNN --thresholds dynamic --regulation UP
- External quantitative analysis results
qreps --quantitation-file example_2/ms1diffacto_out_DE_A2780_0.5_sum_each_run.txt --labels Chemprot_0.5,Chemprot_K --output-dir example_2 --thresholds semi-dynamic --fold-change 1.5 --regulation all --report True
- DirectMS1Quant resuls analysis
qreps --ms1-file ms1quant_out_DE_A2780_5_DE_A2780_K1_quant_full.tsv --labels Chemprot_0.5,Chemprot_K --output-dir output --thresholds ms1 --regulation all --report True
Apart from the QRePS tool, this repository contains additional resources referenced in the article "PROTEOMICS-BASED SCORING OF CELLULAR RESPONSE TO STIMULI FOR IMPROVED CHARACTERIZATION OF SIGNALING PATHWAY ACTIVITY":
• Jupyter Notebooks with original calculations (you can just use QRePS on your data now)