Ronesh Sharma edited this page Aug 20, 2018 · 49 revisions

Welcome to the OPAL overview page: OPAL is developed to predict MoRFs in Intrinsically disordered protein sequence

OVERVIEW

Intrinsically disordered proteins lack stable 3-dimensional structure and play a crucial role in performing various biological functions. Key to their biological function are the molecular recognition features (MoRFs) located within long disordered protein sequences. Computationally identifying MoRF residues in disordered protein sequences requires the process of developing feature extraction techniques and classifiers. Using feature extraction technique, important features are extracted to represent protein sequence region and in classification task, these features are used to predict the location of MoRF residues in the disordered region. Features representing a MoRF from the protein sequence can be extracted in a number of ways using syntactical and physicochemical properties, structural information and using evolutionary information. OPAL is developed to identify MoRFs in disordered protein sequences.

BENCHMARKING

OPAL is evaluated using test sets that were previously used to evaluate MoRF predictors, MoRFpred, MoRFchibi and MoRFchibi-web. The results demonstrate that OPAL outperforms all the available MoRF predictors and is the most accurate predictor available as a practical tool for MoRF prediction. The following table benchmarks OPAL with the other state-of-the-art MoRF predictors.

Predictors Precision F-measure AUC Efficiency (local machine) Efficiency (server) Multiple sequence alignments Combined component predictors
ANCHOR 0.156, 0.134 0.201, 0.212 0.605, 0.615 3.9*10e+6 - × ×
MoRFchibi 0.334, 0.210 0.316, 0.296 0.743, 0.712 10.5*10 e+3 - × ×
MoRFpred 0.181, 0.147 0.226, 0.228 0.675, 0.620 - 48 ×
PROMIS 0.363, 0.332 0.329, 0.400 0.790, 0.818 220 - ×
MORFchibi-light 0.431, 0.324 0.354, 0.392 0.777, 0.799 9.9*10 e+3 - ×
MoRFchibi-web 0.495, 0.332 0.373, 0.399 0.805, 0.797 80 588
OPAL 0.530, 0.386 0.384, 0.436 0.816, 0.836 215 -

Precision and F-measure is given for TPR values of 0.3 and 0.5, respectively, for EXP53all set and AUC is given for TEST464 and EXP53all sets, respectively. Efficiency is given as residues/minute and local machine used is i5 4 core 3.50GHz desktop.

Prediction speed of each predictor is compared in the above table. Predictors that do not require multiple sequence alignments (MSA), entire test set used to calculate the speed using i5, 3.5GHz computer and predictors requiring MSA were tested using single sequence from test set (Uniprot:Q38087) with 903 residues. Predictors that were not downloadable were tested on its prediction server with single sequence (Uniprot:Q38087).

DATASETS

Train data

  • Train set has 421 sequences collected by Disfani et. al 2012 with 245,984 residues, of which 5,396 are MoRF residues. Each of the sequences in this set is only annotated by a single 5 to 25 residues MoRF section. This set is used to train OPAL predictor.

Test data

Each of the sequences in test and new set is only annotated by a single 5 to 25 residues MoRF section. Test set is used to evaluate OPAL predictor, while both sets are combined into single set (TEST464) to compare the state-of-the-art MoRF predictors.

Validation data

  • EXP53 set has 53 non-redundant sequences assembled by Malhis et. al. 2015 with 25,186 residues, of which 2,432 are MoRF residues. Since EXP53 protein sequences contain MoRFs up to size of 70 residues, for evaluation MoRFs are further divided into short MoRFs (up to 30 residues) and long MoRFs (longer than 30 residues). Sequences in this set has been experimentally verified to be disordered in isolation and contain more than one MoRF section per sequence. EXP53 set is used to validate the OPAL predictor. For more details on EXP53 set, please see Malhis et. al. 2015

OUTPUT

OPAL generate propensity scores for each residue to be a MoRF residue. If threshold value is required, then a value of 0.58 is suggested for OPAL. At this threshold, OPAL has TPR of 0.518 and 0.432; FPR of 0.092 and 0.082 for EXP53 and TEST464 sets, respectively.

CITATION

Sharma, R., Tsunoda, T., Raicar, G., Patil, A. and Sharma, A., OPAL: Prediction of MoRF regions in Intrinsically disordered protein sequence, Bioinformatics, 2018 .

Sharma, R., Kumar, S., Tsunoda, T., Patil, A. and Sharma, A., Predicting MoRFs in Protein Sequences using HMM Profiles, BMC Bioinformatics, 2016; 17 Suppl X, S14.

Disfani FM, Hsu WL, Mizianty MJ, Oldfield CJ, Xue B, Dunker AK, et al. MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins. Bioinformatics, 2012 Jun 15; 28 (12): i75–i83.

Malhis N, and Gsponer J. Computational Identification of MoRFs in Protein Sequences. Bioinformatics (2015) 31 (11): 1738-1744..

Malhis N, Wong TCE, Nassar R, Gsponer J. Computational Identification of MoRFs in Protein Sequences Using Hierarchical Application of Bayes Rule. PLOS ONE (2015).

Malhis N, Jacobson M, and Gsponer J. MoRFchibi SYSTEM: Software Tools for the Identification of MoRFs in Protein sequences. Nucleic Acids Research (2016)..

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.