Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Manipulating and exploring protein and proteomics data.


From github using devtools::install_github:



See the DESCRIPTION file for a complete list.

Getting started

Currently, the best way to get started is ?Proteins and the Pbase-data vignette. More documentation is on its way.


Pbase is under heavy development and is likely to considerably change in the near future. Suggestion and bug reports are welcome and can be filed as github issues.

If you would like to contribute, please directly send pull requests for minor contributions and typos. For major contributions, we suggest to first get in touch with the package maintainers.


Assessing the redundancy of a protein fasta database

Given a protein fasta file, what is the maximal sensitivity that can be expected from a mass spectrometry experiment with 0, 1, ... miscleavages. This should probably also include a filtering step for peptide flyability.


Some literature about estimating detectability:

Liu et al. 2011:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

Logistic Regression based on Hydrophobicity, Isoelectric point, length, molecular weight, average hydrophobicity, average isoelectric point

Webb-Robertson et al. 2007:

Requirements for in-silico created peptides: missedCleavages = 0:2, length(peptides) >= 6, mass(peptides) < 6000 (Da)

35 features: length, weidght, # of (non-)polar, # of (un)charged, # of pos./neg. charged residues, hydrophobicity (different models), polarity (different models), bulkiness, AA singlet counts

Sanders et al. 2007

Requirements for in-silico created peptides: length(peptides) >= 6

Features: Length, Charge, Isoelectric Point, Molecular Weight, Hydropathicity, Counts of each AA (20 Features), Percent composition of each AA (20 Features), Percent of polar, psoitive, negative, hydrophobic AA

take-home-message: a model of one species/dataset could not be transfered to another dataset (without dramatically decreasing the performance)

Mallick et al. 2007

~1000 Features.

Some of the most discriminating properties: Total/Average net/positive charge, hydrophobic moment, isoelectric point, Histidine composition

take-home-message: The model of one species is comparable to another if the evolutionary distance is small (e.g. yeast and human) but you can't compare different devices/datasets (e.g. MALDI vs ESI)

Simple Rules

Mass: 500:4500

Length: 5:40

95% of all peptides are of length 5:30:

Average Isoelectric point: seq(0, 1.4)

Hydropathy/Hydrophobicity Kyte, Jack, and Russell F. Doolittle. "A simple method for displaying the hydropathic character of a protein." Journal of molecular biology 157.1 (1982): 105-132.

Selection of optimal heavy peptides for absolute quantitation

See Pavel's idea.

Protein domains

Available through the integration with the EnsmbleDb package. See the Pbase-with-ensembldb vignette.

Mapping a Protein Sequence to a Genome Sequence

See the mapping vignette.

See also this document for additional examples and integration with RNA-seq data.


The package allows to easily interact with AAString and AAStringSet instances, protein databases such as UniProt (and possibly biomaRt in the future) using protein identifiers, protein identification results (mzID or (devel) mzR packages) and possibly also MSnExp and MSnSet instances.


Manipluating and exploring protein and proteomics data



No releases published
You can’t perform that action at this time.