Skip to content

hkimlab/DeepPrime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Developed by Hyongbum Henry Kim's lab

Python 3.6

Table of Contents

About:

DeepPrime is a deep-learning-based prime editing efficiency prediction tool developed in Laboratory of Genome Editing, Yonsei University.

It greatly expands upon the previous start-of-the-art pegRNA activity prediction model, DeepPE, which was limited to a specific set of edit type and length combinations.

DeepPrime is developed to predict efficiencies of a nearly all feasible combinations of pegRNA designs. We integrated CNN & RNN to extract inter-sequence features between target DNA and corresponding pegRNA. DeepPrime was trained using 259K pegRNAs with PBS lengths ranging from 1 to 17, RT lengths ranging from 1 to 50, Edit positions ranging from 1 to 30, and editing lengths ranging from 1 to 3.

DeepPrime webtool

The webtool app can accommodate most applications using default parameters and using the most appropriate primed editing (PE) model for your experimental conditions. It can evaluate all possible prime editing guide RNAs (pegRNAs) for a given target according to the predicted the prime editing efficiency, DeepPrime score.

Python package for using DeepPrime: GenET

GenET (Genome Editing Toolkit) is a library of various python functions for the purpose of analyzing and evaluating data from genome editing experiments.

Installation

# Create virtual env for genet.
conda create -n genet python=3.10
conda activate genet

# Install genet
pip install genet

How to use DeepPrime using GenET

from genet.predict import DeepPrime

seq = 'CCGAGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCACGCTCCATTATC(C/T)AGCCCCAAAGCGCAACAAGCCCACTGTCTATGGTGTGTCCCCCAACTACGACAAGTGGGA'

pegrna = DeepPrime(seq)

# check designed pegRNAs
pegrna.features.head()

output:

ID Spacer RT-PBS PBS_len RTT_len RT-PBS_len Edit_pos Edit_len RHA_len Target ... deltaTm_Tm4-Tm2 GC_count_PBS GC_count_RTT GC_count_RT-PBS GC_contents_PBS GC_contents_RTT GC_contents_RT-PBS MFE_RT-PBS-polyT MFE_Spacer DeepSpCas9_score
0 SampleName GGTTCATCATCATTCAACGG TAGATAATGGAGCGTGGTGATGAGCCCGTCGGCCACCGTTGAATG 7 38 45 37 1 1 AGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCAC... ... -510.285 2 23 25 28.57143 60.52632 55.55556 -12.7 0 76.43662
1 SampleName GGTTCATCATCATTCAACGG TAGATAATGGAGCGTGGTGATGAGCCCGTCGGCCACCGTTGAATGA 8 38 46 37 1 1 AGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCAC... ... -510.285 2 23 25 25 60.52632 54.34783 -11.4 0 76.43662
2 SampleName GGTTCATCATCATTCAACGG TAGATAATGGAGCGTGGTGATGAGCCCGTCGGCCACCGTTGAATGAT 9 38 47 37 1 1 AGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCAC... ... -510.285 2 23 25 22.22222 60.52632 53.19149 -11.4 0 76.43662
3 SampleName GGTTCATCATCATTCAACGG TAGATAATGGAGCGTGGTGATGAGCCCGTCGGCCACCGTTGAATGATG 10 38 48 37 1 1 AGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCAC... ... -510.285 3 23 26 30 60.52632 54.16667 -11.2 0 76.43662
4 SampleName GGTTCATCATCATTCAACGG TAGATAATGGAGCGTGGTGATGAGCCCGTCGGCCACCGTTGAATGATGA 11 38 49 37 1 1 AGTTGGTTCATCATCATTCAACGGTGGCCGACGGGCTCATCACCAC... ... -510.285 3 23 26 27.27273 60.52632 53.06122 -11.2 0 76.43662

Installation from source code:

The webtool app can accommodate most applications by choosing the most appropriate model parameters for your experimental conditions.

For processing large number of pegRNAs, researchers can download zipped source code, install the necessary python packages, and run DeepPrime on their local systems. We recommend using a Linux-based OS.

1. Install Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

2. Create and activate virtual environment

conda create -n dprime python=3.8
conda activate dprime

3. Install Required Python Packages

pip install tensorflow==2.8.0     #Use pip linked to the above python installation
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install biopython==1.78 
pip install pandas regex silence-tensorflow 
wget https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_5_x/ViennaRNA-2.5.1.tar.gz
tar -zxvf ViennaRNA-2.5.1.tar.gz
cd ViennaRNA-2.5.1
./configure --with-python3	
make
make install

- OR -

conda install -c bioconda viennarna # using Miniconda

- OR -

pip install ViennaRNA

5. Download Source Code

wget https://github.com/hkimlab/DeepPrime/archive/main.zip
unzip main.zip

Usage:

Input format (.csv file)

ID, Unedited sequences (121 bp), Unedited sequences (121bp), alt_type (sub1, sub2, sub3, ins1, ... , del3)

ID RefSeq Edited Seq EditType
BRCA1e17_pos34_tat_CAT AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAATATTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAACATTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA sub1
BRCA1e17_pos34_tat_CCA AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAATATTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAACCATTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA sub3
BRCA1e17_pos34_tat_CCC AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAATATTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA AATCCTTTGAGTGTTTTTCATTCTGCAGATGCTGAGTTTGTGTGTGAACGGACACTGAAACCCTTTCTAGGAATTGCGGGAGGAAAATGGGTAGTTAGCTATTTCTGTAAGTATAATACTA sub3

Run Command:

python DeepPrime.py [-h] [-f INPUT_FILE] [-n NAME] [-p {PE2,PE2max,PE2max-e,PE4max,PE4max-e,NRCH_PE2,NRCH_PE2max,NRCH_PE4max}] [--cell_type {HEK293T,A549,DLD1,HCT116,HeLa,MDA-MB-231,NIH3T3}] [--pbs_min PBS_MIN] [--pbs_max PBS_MAX] [--jobs JOBS] [--progress]

Basic command: python DeepPrime.py -f [filename]

# example_input
python DeepPrime.py -f ./example_input/dp_core_test.csv
# example_input & choose PE4max system
python DeepPrime.py -f ./example_input/dp_core_test.csv -p PE4max
# example_input & choose PE4max system, cell type, and number of cores
python DeepPrime.py -f ./example_input/dp_core_test.csv -p PE2max --cell_type DLD1 --jobs 4

Optional arguments

-h or --help: show a help message
-f or --input_file: input path (.csv file)
-n or --name: name tag of run (results directory name)
-p or --pe_type: PE system. Choose one of the available PE system (PE2,PE2max,PE2max-e,PE4max,PE4max-e,NRCH_PE2,NRCH_PE2max,NRCH_PE4max). Some cell types support limited PE systems.
--cell_type: Cell type. Choose one of the available cell line.
--pbs_min: Minimum length of PBS. (1=<)
--pbs_max: Maximum length of PBS (=<17)
--jobs: Number of cores for computing
--progress: Show processing message

Current available PE models:

On-target

Cell type PE system Model
HEK293T PE2 DeepPrime_base
HEK293T NRCH_PE2 DeepPrime-FT: HEK293T, NRCH-PE2 with Optimized scaffold
HEK293T NRCH_PE2max DeepPrime-FT: HEK293T, NRCH-PE2max with Optimized scaffold
HEK293T PE2 DeepPrime-FT: HEK293T, PE2 with Conventional scaffold
HEK293T PE2max-e DeepPrime-FT: HEK293T, PE2max with Optimized scaffold and epegRNA
HEK293T PE2max DeepPrime-FT: HEK293T, PE2max with Optimized scaffold
HEK293T PE4max-e DeepPrime-FT: HEK293T, PE4max with Optimized scaffold and epegRNA
HEK293T PE4max DeepPrime-FT: HEK293T, PE4max with Optimized scaffold
A549 PE2max-e DeepPrime-FT: A549, PE2max with Optimized scaffold and epegRNA
A549 PE2max DeepPrime-FT: A549, PE2max with Optimized scaffold
A549 PE4max-e DeepPrime-FT: A549, PE4max with Optimized scaffold and epegRNA
A549 PE4max DeepPrime-FT: A549, PE4max with Optimized scaffold
DLD1 NRCH_PE4max DeepPrime-FT: DLD1, NRCH-PE4max with Optimized scaffold
DLD1 PE2max DeepPrime-FT: DLD1, PE2max with Optimized scaffold
DLD1 PE4max DeepPrime-FT: DLD1, PE4max with Optimized scaffold
HCT116 PE2 DeepPrime-FT: HCT116, PE2 with Optimized scaffold
HeLa PE2max DeepPrime-FT: HeLa, PE2max with Optimized scaffold
MDA-MB-231 PE2 DeepPrime-FT: MDA-MB-231, PE2 with Optimized scaffold
NIH3T3 NRCH_PE4max DeepPrime-FT: NIH3T3, NRCH-PE4max with Optimized scaffold

Off-target (currently writing the manual)

Cell type PE system Model
HEK293T PE2-off DeepPrime-Off: PE2 with conventional scaffold in HEK293T cells

For off-target analysis: Currently, only the model trained on PE2 with conventional scaffold in HEK293T cells is capable of running an additional analysis to predict off-target levels for specific pegRNAs.

On the webtool: First select the Off-target compatible, PE2_Conv, and run your inputs. On the results page, use the check box indicating that you are currently running the off-target compatible analysis. Selecting individual rows will auto-fill the pegRNA IDs and the off-target sequences can be added to the text area in 74bp long formats.

On the source code: Create input file (ex: dp_off_test.csv), and run

python DeepPrime.py off_run <filename>

ex)
python DeepPrime.py -f ./example_input/dp_off_test.csv -p PE2-off