Skip to content

Performing a regression task for estimating residue size based on given physicochemical properties of protein tertiary structures (CASP 5-9).

Notifications You must be signed in to change notification settings

msikorski93/Protein-Tertiary-Structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Protein-Tertiary-Structure

One of the most important topics of research in the field of molecular biology is the 3D reconstruction of macromolecules like proteins. The study of protein tertiary structure contributes to protein function and also used for medicine design and drug discovery. The prediction of protein structure is one of the most challenging tasks in bioinformatics. This caused in recent years soft computing technologies as the best possibilities to solve this type of tasks. To estimate residue size for each protein we used the following supervised learning algorithms:

  • artificial neural network,
  • random forest,
  • multi-layer perceptron,
  • gradient boosting.

Model MAE RMSE R2 Score
RandomForest 0.466395 0.641751 0.588156
GradientBoosting 0.485399 0.659230 0.565416
MLP 0.488006 0.667914 0.553890
ANN 0.505709 0.689411 0.524713

We achieved such results while evaluating the models. Overall, the scores are moderate. The dataset does not have much characteristics for developing machine learning models with high usefullness. The top regressor with highest performance was the random forest. Estimating residue size with this dataset is possible, yet still not accurate.

About

Performing a regression task for estimating residue size based on given physicochemical properties of protein tertiary structures (CASP 5-9).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published