# ORBIT targeting oligo design



In [1]:
!git clone https://github.com/scott-saunders/orbit.git

Cloning into 'orbit'...
remote: Enumerating objects: 72, done.[K
remote: Counting objects: 100% (72/72), done.[K
remote: Compressing objects: 100% (53/53), done.[K
remote: Total 72 (delta 19), reused 57 (delta 13), pack-reused 0[K
Unpacking objects: 100% (72/72), done.


In [2]:
%cd orbit/targeting_oligo_design_app

/content/orbit/targeting_oligo_design_app


In [3]:
!pip install -r requirements.txt

Collecting BioPython
[?25l  Downloading https://files.pythonhosted.org/packages/5a/42/de1ed545df624180b84c613e5e4de4848f72989ce5846a74af6baa0737b9/biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3MB)
[K     |████████████████████████████████| 2.3MB 7.5MB/s 
Installing collected packages: BioPython
Successfully installed BioPython-1.79


In [5]:
import pandas as pd
import numpy as np
import Bio.SeqIO
import panel as pn
import orbit_tools as tools

pn.extension(comms='colab')

This notebook specifies a [panel](https://panel.holoviz.org/) app that makes it easy to design a targeting oligo for use with ORBIT genetics. To run this app you will need the following packages:

* `numpy`
* `pandas`
* `Bio`
* `bokeh`
* `holoviews`
* `panel`

Once these are installed, you should be able to run all cells in this notebook and start the app in a new window. See the [ORBIT website](https://github.com/scott-saunders/orbit) for more details and instructions about ORBIT itself.

-------

First, let's import the *E. coli* K12 genome (GenBank accession number U00096.3) from a fasta file:

In [6]:
for record in Bio.SeqIO.parse('sequencev3.fasta', "fasta"):
    genome = str(record.seq)
    
print("Length genome: {}".format(len(genome)))
print("First 100 bases: {}".format(genome[:100]))

Length genome: 4641652
First 100 bases: AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAAT


Next, let's import all of the annotated genes, downloaded from ecocyc. Here there are some simple transformations to make the dataframe nicer to work with.

In [7]:
df_genes = pd.read_csv("All_instances_of_Genes_in_Escherichia_coli_K-12_substr._MG1655.txt", sep = '\t')


df_genes = df_genes.dropna()

df_genes['Left-End-Position'] = df_genes['Left-End-Position'].astype(int)
df_genes['Right-End-Position'] = df_genes['Right-End-Position'].astype(int)

df_genes['left_pos'] = df_genes['Left-End-Position']
df_genes['right_pos'] = df_genes['Right-End-Position']

df_genes['center_pos'] = df_genes[['left_pos','right_pos']].apply(np.mean, axis = 1)

df_genes = df_genes.drop(['Left-End-Position','Right-End-Position'],axis =1)

df_genes['gene_label'] = df_genes.apply(lambda row: row.Genes + '\n', axis = 1)

df_genes

Unnamed: 0,Genes,Direction,Product,left_pos,right_pos,center_pos,gene_label
0,3'ETS-<i>leuZ</i>,-,small regulatory RNA 3'ETS<sup><i>leuZ</i></sup>,1991748,1991814,1991781.0,3'ETS-<i>leuZ</i>\n
1,aaeA,-,aromatic carboxylic acid efflux pump membrane ...,3388194,3389126,3388660.0,aaeA\n
2,aaeB,-,aromatic carboxylic acid efflux pump subunit AaeB,3386221,3388188,3387204.5,aaeB\n
3,aaeR,+,LysR-type transcriptional regulator AaeR,3389520,3390449,3389984.5,aaeR\n
4,aaeX,-,DUF1656 domain-containing protein AaeX,3389134,3389337,3389235.5,aaeX\n
...,...,...,...,...,...,...,...
4534,zraR,+,Phosphorylated DNA-binding transcriptional act...,4203320,4204645,4203982.5,zraR\n
4535,zraS,+,ZraS-<i>N</i>-phospho-L-histidine // sensor hi...,4201926,4203323,4202624.5,zraS\n
4536,zupT,+,heavy metal divalent cation transporter ZupT,3182550,3183323,3182936.5,zupT\n
4537,zur,-,DNA-binding transcriptional repressor Zur,4259488,4260003,4259745.5,zur\n


You can see that for all 4,529 annotated genes we have the genomic coordinates, the name of the gene, and a brief description. 

Now we can declare the core of the panel app, which depends on 3 functions that exist in `orbit_tools.py`:

* `plot_nearby()` simply takes some genomic coordinates and plots 1kb upstream and downstream from those positions, using holoviews. This plot is annotated with the gene information from df_genes.
* `get_target_oligo()` returns a targeting oligo that corresponds to the supplied genomic positions. This does a bit more than just return the correct region of the genome, because the targeting oligo needs to target the lagging strand, which must be properly found.
* `get_pos_details()` formats and returns some of the informative details that the more general get_target_oligo() function uses.

The code below turns each of these functions into a "reactive" function that will respond to 4 interactive parameters:

* `left_pos` - the left genomic coordinate
* `right_pos` - the right genomic coordinate
* `attB_dir` - the desired direction of the attB site 
* `homology` - the total length of homology to use for the oligo (in nucleotides)

In [8]:
left_pos_widget  = pn.widgets.TextInput(name = 'Left Position', value = '1000', width = 200)
right_pos_widget  = pn.widgets.TextInput(name = 'Right Position', value = '1000', width = 200)
dir_widget = pn.widgets.Select(name = 'attB Direction', options = ['+','-'], value = '+', width = 100)
homology_widget = pn.widgets.IntSlider(name = 'Homology', value = 52, start = 20, end = 200, width = 200)

@pn.depends(left_pos_widget, right_pos_widget, homology_widget)
def reactive_plot_nearby(left_pos_widget, right_pos_widget, homology_widget):
    return tools.plot_nearby(left_pos_widget, right_pos_widget, homology_widget, df_genes = df_genes)

@pn.depends(left_pos_widget, right_pos_widget, dir_widget, homology_widget)
def reactive_get_target_oligo(left_pos_widget, right_pos_widget, dir_widget, homology_widget):
    oligo = tools.get_target_oligo(left_pos_widget, right_pos_widget, genome, homology = homology_widget,attB_dir = dir_widget, verbose = False)
    
    copy_source_button = pn.widgets.Button(name="Copy targeting oligo", button_type="primary", width = 100)
    copy_source_code = "navigator.clipboard.writeText(source);"
    copy_source_button.js_on_click(args={"source": oligo}, code=copy_source_code)
    
    return pn.Column(str("5'_" + oligo + "_3'"), copy_source_button)

@pn.depends(left_pos_widget, right_pos_widget, homology_widget, dir_widget)
def reactive_get_pos_details(left_pos_widget, right_pos_widget, homology_widget, dir_widget):
    return tools.get_pos_details(left_pos_widget, right_pos_widget, homology_widget, dir_widget)

We can test how our parameter input widgets will look:

In [9]:
param_input = pn.Column(
    pn.Row(left_pos_widget, right_pos_widget, dir_widget, homology_widget)
)

param_input

Let's also write some instructions to go with the app.

In [10]:
app_text = """This tool is currently implemented only for the *E. coli* K12 genome (GenBank accession number U00096.3). Please contact Scott Saunders for details or further questions (ssaunder@caltech.edu). 

**Instructions:**

1. Find the genomic coordinates of the modification you would like. [Ecocyc](https://ecocyc.org/) is recommended for simple gene deletions.
2. Input these positions to the app as `Left Position` and `Right Position` below. Check that the intended locus shows up in the genome plot. 
3. Choose which direction the attB sequence should go - either `+` or `-`. Typically this is the same direction as the gene of interest.
4. Decide how long your homology arms need to be and input with the `Homology` slider. Default is 52 bp total, which yields a 90 bp oligo (attB is 38 bp).
5. If the genome plot with the attB homology arms looks correct and the oligo sequence appears in the gray panel, then click `Copy targeting oligo` and order from IDT or an equivalent DNA supplier.

--------

"""

Finally, we can specify the panel widget as a few of these panel widgets just stacked on top of each other:

* `param_input` the parameter input widgets from above
* `reactive_plot_neaby` the reactive genomic plot
* `reactive_get_pos_details` the reactive function to get oligo details
* `reactive_get_target_oligo` the reactive function to get the actual oligo sequence

In [11]:
orbit_app = pn.Column(
    "# ORBIT targeting oligo design",
    app_text,
    param_input,
    reactive_plot_nearby,
    pn.Column(reactive_get_pos_details, reactive_get_target_oligo ,background='WhiteSmoke')
)

#orbit_app

Then we can run the app. 

You can use the app in a full window by clicking "Mirror cell in tab" at the top right of the app's code cell. Then click "Change page layout" to make each tab full screen (button at top right of window).

In [13]:
orbit_app

In [None]:
%load_ext watermark

In [None]:
%watermark -v -p numpy,Bio,pandas,bokeh,holoviews,panel

CPython 3.7.6
IPython 7.22.0

numpy 1.19.2
Bio 1.78
pandas 1.2.4
bokeh 1.4.0
holoviews 1.13.2
panel 0.8.3
