GitHub - linDing-groups/Deep-4mCW2V: A sequence-based deep learning approach to predict N4-methylcytosine sites in Escherichia coli

Deep-4mCW2V

Abstract

N4-methylcytosine is a kind of DNA modification which could regulate multiple biological processes such as transcription regulation, DNA replication and gene expressions. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the E. coli. In the proposed model, DNA sequences were encoded by word embedding technique ‘word2vec’. The obtained features were inputted into 1D convolutional neural network (CNN) to classify 4mC from non-4mC sites in Escherichia coli. On the independent dataset, our model could yield the overall accuracy of 0.861%, which was approximately 4.3% higher than the existing model, 4mCCNN respectively.

Required Packages

Python3 (tested 3.5.4)
jupyter (tested 1.0.0)
scikit-learn (tested 0.22.1)
pandas (tested 1.0.1)
numpy (tested 1.18.1)
gensim (tested 3.8.1)
sklearn (tested 0.19.1)
keras (tested 2.3.1)
tensorflow (tested 2.1.0)

For Feature Generation

W2V.py

For Train the Model

Train_CNN_Model.py

Loading the Model

Test.py

Note

For files with different input sequences, you need to pay attention to the modification of parameters in code.

Citation:

Zulfiqar, Hasan, Zi-Jie Sun, Qin-Lai Huang, Shi-Shi Yuan, Hao Lv, Fu-Ying Dao, Hao Lin, and Yan-Wen Li. "Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli." Methods (2021), doi: 10.1016/j.ymeth.2021.07.011.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
134_vecs.csv		134_vecs.csv
222_vecs.csv		222_vecs.csv
222_vecs.npy		222_vecs.npy
222_vecs.txt		222_vecs.txt
Deep learning.png		Deep learning.png
Independent_Dataset		Independent_Dataset
LICENSE		LICENSE
README.md		README.md
Test.py		Test.py
Train_2Unsuper		Train_2Unsuper
Train_CNN_Model.py		Train_CNN_Model.py
Train_Comb.fasta		Train_Comb.fasta
Train_Neg2Unsuper		Train_Neg2Unsuper
Train_Pos2Unsuper		Train_Pos2Unsuper
Train_model1		Train_model1
Training_Dataset		Training_Dataset
W2V.py		W2V.py
npy to csv.py		npy to csv.py
save_model.h5		save_model.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-4mCW2V

Abstract

Required Packages

For Feature Generation

For Train the Model

Loading the Model

Note

Citation:

About

Releases

Packages

Languages

License

linDing-groups/Deep-4mCW2V

Folders and files

Latest commit

History

Repository files navigation

Deep-4mCW2V

Abstract

Required Packages

For Feature Generation

For Train the Model

Loading the Model

Note

Citation:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages