Comprehensive toolkit for generating various numerical features of protein sequences
Switch branches/tags
Nothing to show
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R fix uniprot api endpoint issue Nov 22, 2018
data MDS and BLOSUM finished Dec 1, 2013
docs update docs site Nov 22, 2018
inst v1.2-0 Nov 12, 2016
man update roxygen Nov 22, 2018
vignettes add floating toc to vignette Nov 22, 2018
.Rbuildignore move docs site to docs/ Apr 22, 2018
.gitattributes add appveyor for Windows CI Dec 21, 2016
.gitignore update git config Nov 12, 2016
.travis.yml test against release channel Jan 9, 2018
CONDUCT.md fix url Apr 22, 2018
CONTRIBUTING.md add contributing guide Sep 4, 2017
DESCRIPTION bump version Nov 22, 2018
LICENSE update license year Jul 23, 2018
NAMESPACE add new function removeGaps Jul 12, 2018
NEWS.md update news Nov 22, 2018
README.md fix url Apr 22, 2018
_pkgdown.yml add new function removeGaps Jul 12, 2018
appveyor.yml remove nonessential deps Nov 22, 2018
protr.Rproj update git config Nov 12, 2016

README.md

protr logo

Build Status AppVeyor Build Status CRAN Version Downloads from the RStudio CRAN mirror

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042> (PDF).

Paper Citation

Formatted citation:

Nan Xiao, Dong-Sheng Cao, Min-Feng Zhu, and Qing-Song Xu. (2015). protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31 (11), 1857-1859.

BibTeX entry:

@article{Xiao2015,
  author = {Xiao, Nan and Cao, Dong-Sheng and Zhu, Min-Feng and Xu, Qing-Song.},
  title = {{protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences}},
  journal = {Bioinformatics},
  year = {2015},
  volume = {31},
  number = {11},
  pages = {1857--1859},
  doi = {10.1093/bioinformatics/btv042},
  issn = {1367-4803},
  url = {http://bioinformatics.oxfordjournals.org/content/31/11/1857}
}

Installation

To install protr from CRAN:

install.packages("protr")

Or try the latest version on GitHub:

# install.packages("devtools")
devtools::install_github("road2stat/protr")

Browse the package vignette for a quick-start.

Shiny Web Application

ProtrWeb, the Shiny web application built on protr, can be accessed from http://protr.org.

ProtrWeb is a user-friendly web application for computing the protein sequence descriptors (features) presented in the protr package.

Descriptors List

Commonly used descriptors

  • Amino acid composition descriptors

    • Amino acid composition
    • Dipeptide composition
    • Tripeptide composition
  • Autocorrelation descriptors

    • Normalized Moreau-Broto autocorrelation
    • Moran autocorrelation
    • Geary autocorrelation
  • CTD descriptors

    • Composition
    • Transition
    • Distribution
  • Conjoint Triad descriptors

  • Quasi-sequence-order descriptors

    • Sequence-order-coupling number
    • Quasi-sequence-order descriptors
  • Pseudo amino acid composition (PseAAC)

    • Pseudo amino acid composition
    • Amphiphilic pseudo amino acid composition
  • Profile-based descriptors

    • Profile-based descriptors derived by PSSM (Position-Specific Scoring Matrix)

Proteochemometric (PCM) modeling descriptors

  • Scales-based descriptors derived by principal components analysis
    • Scales-based descriptors derived by amino acid properties (AAindex)
    • Scales-based descriptors derived by 20+ classes of 2D and 3D molecular descriptors (Topological, WHIM, VHSE, etc.)
    • Scales-based descriptors derived by factor analysis
    • Scales-based descriptors derived by multidimensional scaling
    • BLOSUM and PAM matrix-derived descriptors

Similarity Computation

Local and global pairwise sequence alignment for protein sequences:

  • Between two protein sequences
  • Parallelized pairwise similarity calculation with a list of protein sequences

GO semantic similarity measures:

  • Between two groups of GO terms / two Entrez Gene IDs
  • Parallelized pairwise similarity calculation with a list of GO terms / Entrez Gene IDs

Miscellaneous tools and datasets

  • Retrieve protein sequences from UniProt
  • Read protein sequences in FASTA format
  • Read protein sequences in PDB format
  • Sanity check of the amino acid types appeared in the protein sequences
  • Protein sequence segmentation
  • Auto cross covariance (ACC) for generating scales-based descriptors of the same length
  • 20+ pre-computed 2D and 3D descriptor sets for the 20 amino acids to use with the scales-based descriptors
  • BLOSUM and PAM matrices for the 20 amino acids
  • Meta information of the 20 amino acids

Links

Contribute

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.