Skip to content

GBDT is a machine learning model for identifying glutarylation sites

Notifications You must be signed in to change notification settings

flyinsky6/GBDT_KgluSite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GBDT_KgluSite

In this paper, a new lysine glutarylation(Kglu) site prediction model GBDT_Kglu was proposed, which adopted seven feature encoding methods to convert protein sequences into digital information, including BE, BLOSUM62, EAAC, CTDC, PSSM, CKSAAP, and Secondary Structural information. Then, the NearMiss-3 method dealed with the imbalanced data set issue ,and Elastics Net was used to filter redundant information in the features. Finally, the prediction model for identify Kglu site based on GBDT was established

Requirement

Backend = Tensorflow(1.14.0)
keras(2.3.1)
Numpy(1.20.2)
scikit-learn(1.0.2)
pandas(1.3.5)
matplotlib(3.5.2)\

Dataset

The data uploaded in DataSet is the original data before dividing the dataset, with 707 positive samples and 4369 negative samples, all with a sample length of 33, where X stands for virtual amino acids. Glutarylation.csv is the original dataset, Glutarylation208.csv is obtained by removing duplicate data using CD-hit, and contains a total of 208 proteins. The folder Train contains all training data, while Test contains all independent test data.

Feature

There are seven features were used in GBDT_KgluSite model. Two of them were generated by one_hot.py, and CKSAAP.py, the PSSM feature was generated by PSI-BLAST, The rest of them were obtained by iLearnPlus.

Model

GBDT_ KgluSite.py can be directly used to predict glutarylation modification sites when load the pretrained model GBDT_KgluSite.pickle

Contact

Feel free to contact us if you nedd any help: flyinsky6@gmail.com

About

GBDT is a machine learning model for identifying glutarylation sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages