iGEM-CNN-Regression

This is a model that based on open-source TF-biding-score database.

Data Source

Reddy TB, Riley R, Wymore F, Montgomery P, DeCaprio D, Engels R, Gellesch M, Hubble J, Jen D, Jin H, Koehrsen M, Larson L, Mao M, Nitzberg M, Sisk P, Stolte C, Weiner B, White J, Zachariah ZK, Sherlock G, Galagan JE, Ball CA, Schoolnik GK. TB database: an integrated platform for tuberculosis research. Nucleic Acids Res. 2009 Jan;37(Database issue):D499-508. doi: 10.1093/nar/gkn652. Epub 2008 Oct 3. PMID: 18835847; PMCID: PMC2686437.
Ivan Yevshin, Ruslan Sharipov, Semyon Kolmykov, Yury Kondrakhin, Fedor Kolpakov, GTRD: a database on gene transcription regulation—2019 update, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D100–D105, https://doi.org/10.1093/nar/gky1128

Data preprocessing

Since DNA and TF are both made up of a limited number of known components, inspring me of using one-hot encoding.
One-hot encoding Digitize the characteristics of the classified values, here is an example shows how DNA could be encoded into matrix form to fit deep learning.

How to use it ?

If you don't have such deep learning background before, don't worry, this will only take a few minutes...

Make sure your computer has a python compilation environment
Installing python is very easy, see the tutorial here! Let's begin with Python!
Aha! Already with Python?
Use this command in cmd(command line mode in windows) below to let your computer have a same model like me!
$ git clone https://github.com/sysu-software-2020/iGEM-CNN-Regression.git

To make sure you don't have to worry about annoying packages, you can install python dependencies here!
$ pip install -r requirements.txt

And last preparation, modify the absolute path of test data(./iGEM-CNN-Regression/data_process/test_file.csv)

Then run local program to predict your result:
1. YOUR_TF : TF name.
2. YOUR_DNA: biding-site you want to predict.
$ python predict_value.py YOUR_TF YOUR_DNA

Example:
$ python predict_value.py

Please input your DNA Sequence: GAACAACTAGCATCCCCGATAAGACGGAATAGAATAGTAAAGATTGTGATTCATTGGCAGGTCCATTGTCGCATTACTAAATCATAGGCATGGAAATTTCCAGTTCACCATGGAACGACGGT

Please input your TF name: P07269

And the result(score) is :
predict_ reuslt : 2.345088
Train this deep learning model on your own? Additionally, if you want to train datas on your own, modify the model's training data path and test data path corresponding to your computer.

IGEM graphic model

Our deep learning frame is shown here:

See our (raw) model here: https://github.com/Lorisyy/iGEM-CNN-Regression/blob/main/CNN_wiki_v2.pdf

Running details

Data processing.

Run data_process/data_process.py which converts data to the csv file and get all_tf.csv, train_file.csv, test_file.csv, respectively
Run get_tf_txtfile.py to get the refined tf data in tf_txt.txt

This may confuse you somewhere, please contact me: suyy26@mail2.sysu.edu.cn .

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
data_process		data_process
imgs		imgs
model		model
.DS_Store		.DS_Store
CNN_wiki_v2.pdf		CNN_wiki_v2.pdf
LICENSE		LICENSE
README.md		README.md
model.pth		model.pth
nohup.out		nohup.out
predict.ipynb		predict.ipynb
predict.py		predict.py
predict_value.py		predict_value.py
requirements.txt		requirements.txt
running details.txt		running details.txt
train.py		train.py
train_log1-27.txt		train_log1-27.txt
train_log28-55.txt		train_log28-55.txt

License

sysu-software-2020/iGEM-CNN-Regression

Folders and files

Latest commit

History

Repository files navigation

iGEM-CNN-Regression

Data Source

Data preprocessing

How to use it ?

IGEM graphic model

Running details

About

Resources

License

Stars

Watchers

Forks

Languages