Skip to content

nafcoder/StackGlyEmbed

Repository files navigation

StackGlyEmbed

Selected Feature Group

2-1

Training framework

1-1

Prediction framework

3-1

Data availability

All training and independent datasets are given in Dataset folder

Environments

OS: Ubuntu 22.04.4 LTS

Python version: Python 3.9.19

Used libraries:

numpy==1.26.4
pandas==2.2.1
pytorch==2.2.2
xgboost==2.0.3
pickle5==0.0.11
scikit-learn==1.2.2
matplotlib==3.8.2
PyQt5==5.15.10
imblearn==0.0
skops==0.9.0
shap==0.45.1
IPython==8.18.1

Reproduce results

  1. Firstly, download all features. Read the readme.txt of all_features folder.

  2. In N-GlycositeAtlas and N-GlyDE, reproducable codes are given. Training scripts are also provided. Follow the readme.txt instructions if it is given in the corresponding folder.

Prediction

Prerequisites

  1. You need to have ProteinBert. Follow the following:
pip3 install tensorflow tensorflow_addons numpy pandas h5py lxml pyfaidx
git clone https://github.com/nadavbra/protein_bert.git
cd protein_bert
git submodule init
git submodule update
python setup.py install
  1. transformers, Pytorch and tensorflow are needed for extracting the embeddings.

  2. For more query, you can visit the following GitHubs:

    ProtT5-XL-U50

    ProteinBert

    ESM2

Steps

  1. Firsly, you need to fillup dataset.txt. Follow the pattern shown below:
Protein_id,site_position_1,site_position_2,...,site_position_n
Fasta
  1. For predicting N-linked glycosylation sites from a protein sequence, you need to run the extractFeatures.py to generate features and then run predict.py for prediction.

Reproduce previous paper metrics

In Previous Paper codes, scripts are provided for reproducing the results of the previous papers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages