All training and independent datasets are given in Dataset folder
OS: Ubuntu 22.04.4 LTS
Python version: Python 3.9.19
Used libraries:
numpy==1.26.4
pandas==2.2.1
pytorch==2.2.2
xgboost==2.0.3
pickle5==0.0.11
scikit-learn==1.2.2
matplotlib==3.8.2
PyQt5==5.15.10
imblearn==0.0
skops==0.9.0
shap==0.45.1
IPython==8.18.1
-
Firstly, download all features. Read the readme.txt of all_features folder.
-
In N-GlycositeAtlas and N-GlyDE, reproducable codes are given. Training scripts are also provided. Follow the readme.txt instructions if it is given in the corresponding folder.
- You need to have ProteinBert. Follow the following:
pip3 install tensorflow tensorflow_addons numpy pandas h5py lxml pyfaidx
git clone https://github.com/nadavbra/protein_bert.git
cd protein_bert
git submodule init
git submodule update
python setup.py install
-
transformers, Pytorch and tensorflow are needed for extracting the embeddings.
-
For more query, you can visit the following GitHubs:
- Firsly, you need to fillup dataset.txt. Follow the pattern shown below:
Protein_id,site_position_1,site_position_2,...,site_position_n
Fasta
- For predicting N-linked glycosylation sites from a protein sequence, you need to run the extractFeatures.py to generate features and then run predict.py for prediction.
In Previous Paper codes, scripts are provided for reproducing the results of the previous papers.