Skip to content

nazanintbtb/KDeep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 

Repository files navigation

KDeep: a k-mer-based deep learning approach for predicting DNA/RNA transcription factor binding sites

KDeep & KDeep+ WORKFLOW

Screenshot (4)

Screenshot (6)

ABSTRACT

Based on the importance of DNA/RNA binding proteins in different cellular processes, finding binding sites of them play crucial role in many applications, like designing drug/vaccine, designing protein, and cancer control. Many studies target this issue and try to improve the prediction accuracy with three strategies: complex neural-network structures, various types of inputs, and ML methods to extract input features. But due to the growing volume of sequences, these methods face serious processing challenges. So, this paper presents KDeep, based on CNN-LSTM and the primary form of DNA/RNA sequences as input. As the key feature improving the prediction accuracy, we propose a new encoding method, 2Lk, which includes two levels of k-mer encoding. 2Lk not only increases the prediction accuracy of RNA/DNA binding sites, but also, reduces the encoding memory-consumption by maximum 84%, improves the number of trainable parameters, and increases the interpretability of KDeep by about 79%, compared to the state-of-the-art methods.

DNA ACCURACY

ROC & PR accuracy Screenshot (8)

Dnase ROC & PR accuracy DNASE

TF ROC & PR accuracy TF

Histone ROC & PR accuracy HISTONE

RNA ACCURACY

RNAPIC

USAGE

Need package

python3.7, tensorflow==2.8, cuda and cuDNN if you have GPU

DNA

Train model by DNA Dataset and run instruction

To train the model, download the training, validation and testing sets from DeepSEA dataset (You can download the datasets from here) After you have extracted the contents of the tar.gz file, move the 3 .mat files into the KDeep/data/ or KDeep+/data/ folder. then run below command:

1.python preprocess_FCGR.py. 2.python KDeep.py | KDeep+.py. 3.python test.py.

To test the KDeep or KDeep+ model without train model

1.Test trained model on your system:

Skip download data from deepsea link. You need just download test data from here and here then extract files and move to DNA\KDeep\data or DNA\KDeep+\data folder. and download The KDeep model that trained by myself from here or KDeep+ model from here and move to DNA\KDeep\model or DNA\KDeep+\model folder.

2.Test trained model on colab:

If you want just test KDeep without training go to colab.

If you want just test KDeep+ without training go to colab.

RNA

RNA dataset

Download Datasets from RNA_31then move to RNA\RNA_31 folder. AND RNA_24 then move to RNA\RNA_24 folder.

Train and test in colab

go to colab and run codes step by step.

Pre_process section

For RNA-31:

python PreProcess.py

  • Enter your direction of experience_train like (RNA_31/train/1/sequences.fa)
  • Enter your direction of experience_test like (RNA_31/test/1/sequences.fa)
  • Enter (fasta) to determine type of your data

For RNA-24:

python PreProcess.py

  • Enter your direction of experience_train like (RNA_24/1/ALKBH5_Baltz2012_train)
  • Enter your direction of experience_test like (RNA_24/1/ALKBH5_Baltz2012_test)
  • Enter (text) to determine type of your data

Training section

For RNA-31: pythin Training.py

  • Enter (420) to determine appropriate seed for learning
  • Enter train number =(30000)
  • Enter valid number = (10000)
  • Enter batch_size = (300)
  • Enter 101 to determine sequences lenght of RNA-31

For RNA-24: pythin Training.py

  • Enter (0) to determine appropriate seed for learning
  • Enter train number =(Check output of preprocess section) for experience one 'ALKBH5_Baltz2012' training number is 2410
  • Enter valid number = (Check output of preprocess section). for experience one 'ALKBH5_Baltz2012' valid number is 266
  • Enter batch_size like (300)
  • Enter 375 to determine sequences lenght of RNA-24 Point=If the model fails to train, you should reduce the batch number

Test section

For RNA-31:

python Test.py

  • Enter your direction of experience_test like (RNA_31/test/1/sequences.fa)
  • Enter (fasta) to determine type of your data
  • Enter (101) to determine sequences lenght of RNA-31

For RNA-24:

python Test.py

  • Enter your direction of experience_test like (RNA_24/1/ALKBH5_Baltz2012_test)
  • Enter (text) to determine type of your data
  • Enter (375) to determine sequences lenght of RNA-24

Extracted motif Section

For RNA-31:

python Training.py

  • Enter your direction of experience_test like (RNA_31/test/1/sequences.fa)
  • Enter (fasta) to determine type of your data
  • Enter batch-size that use in trainin section

For RNA-24:

pyhton Training.py

  • Enter your direction of experience_test like (RNA_24/1/ALKBH5_Baltz2012_test)
  • Enter (text) to determine type of your data
  • Enter batch-size that use in trainin section

CONTACT INFO

Somayyeh Koohi

Department of Computer Engineering

Sharif University of Technology

e-mail: koohi@sharfi.edu

WWW: http://sharif.ir/~koohi/

About

KDeep: a k-mer-based deep learning approach for predicting DNA/RNA transcription factor binding sites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages