Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


Failed to load latest commit information.
Latest commit message
Commit time
December 18, 2021 23:23
December 1, 2013 23:46
December 18, 2021 23:24
February 7, 2021 15:10
December 18, 2021 23:24
December 18, 2021 23:24
December 18, 2021 23:24
November 12, 2016 08:46
April 21, 2018 21:55
September 4, 2017 12:55
December 18, 2021 23:24
July 22, 2018 22:33
February 20, 2019 23:34
May 18, 2019 02:32
December 18, 2021 23:24
December 18, 2021 23:24
November 12, 2016 08:46


R-CMD-check CRAN Version Downloads from the RStudio CRAN mirror

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042> (PDF).

Paper Citation

Formatted citation:

Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.

BibTeX entry:

  author = {Xiao, Nan and Cao, Dong-Sheng and Zhu, Min-Feng and Xu, Qing-Song.},
  title = {{protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences}},
  journal = {Bioinformatics},
  year = {2015},
  volume = {31},
  number = {11},
  pages = {1857--1859},
  doi = {10.1093/bioinformatics/btv042},
  issn = {1367-4803},
  url = {}


To install protr from CRAN:


Or try the latest version on GitHub:


Browse the package vignette for a quick-start.

Shiny App

ProtrWeb, the Shiny web application built on protr, can be accessed from

ProtrWeb is a user-friendly web application for computing the protein sequence descriptors (features) presented in the protr package.

List of Supported Descriptors

Commonly used descriptors

  • Amino acid composition descriptors

    • Amino acid composition
    • Dipeptide composition
    • Tripeptide composition
  • Autocorrelation descriptors

    • Normalized Moreau-Broto autocorrelation
    • Moran autocorrelation
    • Geary autocorrelation
  • CTD descriptors

    • Composition
    • Transition
    • Distribution
  • Conjoint Triad descriptors

  • Quasi-sequence-order descriptors

    • Sequence-order-coupling number
    • Quasi-sequence-order descriptors
  • Pseudo amino acid composition (PseAAC)

    • Pseudo amino acid composition
    • Amphiphilic pseudo amino acid composition
  • Profile-based descriptors

    • Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix)

Proteochemometric (PCM) modeling descriptors

  • Scales-based descriptors derived by principal components analysis
    • Scales-based descriptors derived by amino acid properties (AAindex)
    • Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)
    • Scales-based descriptors derived by factor analysis
    • Scales-based descriptors derived by multidimensional scaling
    • BLOSUM and PAM matrix-derived descriptors

Similarity computation

Local and global pairwise sequence alignment for protein sequences:

  • Between two protein sequences
  • Parallelized pairwise similarity calculation with a list of protein sequences

GO semantic similarity measures:

  • Between two groups of GO terms / two Entrez Gene IDs
  • Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs

Miscellaneous tools and datasets

  • Retrieve protein sequences from UniProt
  • Read protein sequences in FASTA format
  • Read protein sequences in PDB format
  • Sanity check of the amino acid types appeared in the protein sequences
  • Protein sequence segmentation
  • Auto cross covariance (ACC) for generating scales-based descriptors of the same length
  • 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors
  • BLOSUM and PAM matrices for the 20 amino acids
  • Meta information of the 20 amino acids


To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.