# Building a new sensor
1. Define a specific sequence
2. Divide the sequence in windows of the same length of the trigger (36nt)
3. For each possible trigger, define a toehold sensor sequence
4. Filter sequences that contain stop codons
5. Evaluate sequence properties using NUPACK and rank them

### Import dependencies

In [1]:
import os
import os.path
from subprocess import Popen, PIPE
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import scipy
import statsmodels.api as sm
from skbio import Sequence

  from pandas.core import datetools


### Define library functions

In [2]:
def reversed_complement(sequence):
    mapping = {'A': 'U', 'G': 'C', 'U': 'A', 'C': 'G'}
    sequence_upper = sequence.upper()

    complement = ''
    for c in sequence_upper:
        complement += mapping[c]

    # reverse the sequence
    return complement[::-1]

def split_sequence(sequence, window):
    sequences = []
    limit = len(sequence) - window + 1

    for i in range(0, limit):
        sequences.append(sequence[i:window + i])

    return sequences

def no_stop(sequence):
    stop = ['UAA', 'UAG', 'UGA']

    for i in range(0, len(sequence), 3):
        if sequence[i:i + 3] in stop:
            return False

    return True

def possible_toehold_B(reg, rev):
    loop = 'GGACUUUAGAACAGAGGAGAUAAAGAUG'

    #What is the RBS ?
    linker = 'ACCTGGCGGCAGCGCAAGAAGA'#( from Green paper 2019 se, modified with an extra A for creating C site MOCLO)
    #linker = "AACCUGGCGGCAGCGCAAGAAGAUGCGUAAA" 
    toeholds = []

    #for n in ['A', 'G', 'U', 'C']:
    if no_stop(reg[0:11] + linker):
            toeholds.append(rev + loop + reg[0:11] + linker)
            return toeholds
      
    
    toeholds.append("STOP")
    return toeholds


### Step 1: Define a target sequence
In this case, we use the  5' of the glycoprotein 1 from PVY:

In [3]:
seq = "ATGGCAACTTACATGTCAACAATCTGTTTTGGTTCGTTTGAATGCAAGCTACCATACTCACCAGCCTCTTGCGAGCATATTGTGAAGGAACGAGAAGTGCCGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAAAATATGCTACTGTTCGTGTGCTCAAAAACGGTACTTTTACGTACCGATACAAGACTGATGCCCAGATAATGCGCATTCAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAAATGGCCGCTCCTAGTATTGTGTCAAAAATTACTATAGCTGGCGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGATGCGTAAAGTCAAGACACGCCCCATAATAAAGTTGACAGAAGGCCAGATGAATCACCTCATTAAGCAGATAAAACAGATTATGTCGGAGAAAAGAGGGTCTGTCCACTTAATTAGTAAGAAAACCACTCATGTTCAATATAAGAAGATACTTGGTGCATACTCCGCAGCGGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTTGGACTTTTGCAACGTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATCTTGAACACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGGCTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTATGATGCACGTTCTAGAGTTACTCAGAGTATTTTAAACTCAATGATCCAGTTTTCGAATGCCGACAATTTTTGGAAGGGTCTGGACGGTAATTGGGCACGAATGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGGGTAGCTGCATTGATGGCACACAGTATCCTTCCGTGCTATAAGATAACTTGCCCCACCTGTGCTCAACAGTATGCCAGCTTGCCAGTTAGC"

Convert to RNA and determine the reverse complement

In [4]:
processed_sequence = seq.upper().replace('T', 'U') #/.replace(' ', '')
rc = reversed_complement(processed_sequence)
len(split_sequence(rc,36))

1015

### Step 2: Determine 36-nucleotide sub-sequences
To do this, we make all possible triggers for the direct and reversed complementary sequence

In [5]:
d_1 = {'Triggers': split_sequence(processed_sequence,36)}
df_1 = pd.DataFrame(data=d_1)
df_1["Sense"]="Direct"

d_2 = {'Triggers': split_sequence(rc,36)}
df_2 = pd.DataFrame(data=d_2)
df_2["Sense"]="Reversed Complement"
frames = [df_1, df_2]
result = pd.concat(frames)
#result

### Step 3: For each trigger, design a toehold sensor

In [6]:
toeholds=[]
for i in range(len(result.iloc[:,0])):
    toeholds.append((possible_toehold_B(result.iloc[i,0],reversed_complement(result.iloc[i,0])))[0])
a=pd.Series(toeholds)
result["Toehold Switch"]=a.values
#result

### Step 4: Remove sensors with STOP codons

In [7]:
df = result[result.iloc[:,2] != "STOP"]
#df

### Step 5: Evaluate sequence properties using NUPACK 

Scoring functions from Green (2019):
- Three-parameter fit (R2 = 0.57):
- Fold change = –71.7 dfull_sensor  – 49.1 dactive_sensor – 22.6 dbinding_site + 54.3
- Four-parameter fit (R2 = 0.60):
- Fold change = –93.2 dfull_sensor – 43.3 dactive_sensor – 22.1 dbinding_site – 9.4 dmin_target + 61.3

Definition of the parameters :
- dfull_sensor:  Ensamble defect for the full toehold switch sequence and structure 
- dactive_sensor: Ensemble defect was calculated directly from the sequence from the first base of the loop sequence. A completely single-stranded secondary structure was used for assessing design quality for dactive_sensor.
- dbinding_site: Ensemble defect was calculated in an analogous manner using the pairwise binding probabilities of the complete target RNA sequence and specifying a completely single-stranded ideal secondary structure in the binding site region.

### Library functions
- Calculation of the minimum free energy (MFE) secondary structure of a singular RNA sequence
- NUPACK analysis

In [8]:
def DG(sequence,result_path,wait):
    file = open('{}pipo.in'.format(result_path), 'w')
    file.write("{}\n".format(sequence))
    file.close()
    final=[]
    semi_final=[]

    Popen(["mfe -T 29 {}pipo".format(result_path)],shell=True, stdout=PIPE)
    time.sleep(wait)
    lenght=len(sequence)
    with open("{}pipo.mfe".format(result_path)) as res:
        for r in res:
            r = r.strip('\n')
            if not r.startswith('%'):
                r = r.split('\t')
                semi_final.append(r)

    #final.append()                

    return (float(semi_final[2][0]))

    os.remove("{}pipo.mfe".format(result_path,))
    os.remove("{}pipo.in".format(result_path))
    
def complex_defect(sequence, secondary, result_path):
    file = open('{}toeh.in'.format(result_path), 'w')
    file.write("{}\n".format(sequence))
    file.write("{}".format(secondary))
    file.close()

    defect_toeh = 0
    count = 0
    with Popen(["complexdefect", "{}toeh".format(result_path)], stdout=PIPE) as proc:
        res = (proc.stdout.read()).decode("utf-8").split('\n')
        for l in res:
            count += 1
            if count == 16:
                defect_toeh = float(l)

    os.remove("{}toeh.in".format(result_path))
    return defect_toeh

def nupack_analysis(sequence, secondary_sensor,  window, sensor_type, result_path):
    list_for_table = []

    processed_sequence = sequence.upper().replace('T', 'U').replace(' ', '')
    reg_sequences = split_sequence(processed_sequence, window)
    rev_comp_sequences = [reversed_complement(s) for s in reg_sequences]

    if sensor_type == 'A':
        target_toehold_map = possible_toehold_A(reg_sequences, rev_comp_sequences)
    else:
        target_toehold_map = possible_toehold_B(reg_sequences, rev_comp_sequences)

    sequence = sequence.upper().replace('T', 'U')
    single_streadness_sequence = single_streadness(sequence, result_path, wait=6)
    for target, toehold in target_toehold_map.items():
        id = sequence.index(target)

        target_defect = sum(single_streadness_sequence[id:id+36])/36
        toehold_defect = sum(single_streadness(toehold, result_path)[0:30])/30
        sensor_defect = complex_defect(toehold, secondary_sensor, result_path)

        score = 5*(1-target_defect) + 4*(1-toehold_defect) + 3*sensor_defect

        list_for_table.append(tuple([target[0:36], toehold, 1-target_defect, 1-toehold_defect, sensor_defect, score]))

    return list_for_table

Structure of the designed optimal interaction betwen a Sensor and its trigger

In [9]:
linear_Structure="............................................................."
linear_Structure_25="........................."
secondary_sensor_B = '.........................(((((((((((...(((((............)))))...)))))))))))......................'
secondary_2=".......................................(((((............)))))....................................+...................................."

In [27]:
dfull_sensor=[]

for i in range (len(df.Triggers)):
    dfull_sensor.append((complex_defect(df.iloc[i,2],secondary_sensor_B,"")))
dfull_sensor
P1=pd.Series(dfull_sensor)
df["dfull_sensor"]=P1.values


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import sys


In [28]:
dactive_sensor=[]
for i in range (len(df.Triggers)):
   
    dactive_sensor.append((complex_defect(df.iloc[i,2][36::],linear_Structure,"")))
P2=pd.Series(dactive_sensor)
df["dactive_sensor"]=P2.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [29]:
dbinding_site=[]
for i in range (len(df.Triggers)):
   
    dbinding_site.append((complex_defect(df.iloc[i,2][0:25],linear_Structure_25,"")))
P3=pd.Series(dbinding_site)
df["dbinding_site"]=P3.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [33]:
df

Unnamed: 0,Triggers,Sense,Toehold Switch,dfull_sensor,dactive_sensor,dbinding_site
1,UGGCAACUUACAUGUCAACAAUCUGUUUUGGUUCGU,Direct,ACGAACCAAAACAGAUUGUUGACAUGUAAGUUGCCAGGACUUUAGA...,0.1991,0.3914,0.2764
2,GGCAACUUACAUGUCAACAAUCUGUUUUGGUUCGUU,Direct,AACGAACCAAAACAGAUUGUUGACAUGUAAGUUGCCGGACUUUAGA...,0.1997,0.3377,0.3174
3,GCAACUUACAUGUCAACAAUCUGUUUUGGUUCGUUU,Direct,AAACGAACCAAAACAGAUUGUUGACAUGUAAGUUGCGGACUUUAGA...,0.2047,0.3295,0.3325
5,AACUUACAUGUCAACAAUCUGUUUUGGUUCGUUUGA,Direct,UCAAACGAACCAAAACAGAUUGUUGACAUGUAAGUUGGACUUUAGA...,0.2104,0.3481,0.3730
6,ACUUACAUGUCAACAAUCUGUUUUGGUUCGUUUGAA,Direct,UUCAAACGAACCAAAACAGAUUGUUGACAUGUAAGUGGACUUUAGA...,0.2161,0.3973,0.4132
7,CUUACAUGUCAACAAUCUGUUUUGGUUCGUUUGAAU,Direct,AUUCAAACGAACCAAAACAGAUUGUUGACAUGUAAGGGACUUUAGA...,0.2038,0.3927,0.3361
8,UUACAUGUCAACAAUCUGUUUUGGUUCGUUUGAAUG,Direct,CAUUCAAACGAACCAAAACAGAUUGUUGACAUGUAAGGACUUUAGA...,0.2019,0.3833,0.2460
9,UACAUGUCAACAAUCUGUUUUGGUUCGUUUGAAUGC,Direct,GCAUUCAAACGAACCAAAACAGAUUGUUGACAUGUAGGACUUUAGA...,0.1916,0.3678,0.2038
10,ACAUGUCAACAAUCUGUUUUGGUUCGUUUGAAUGCA,Direct,UGCAUUCAAACGAACCAAAACAGAUUGUUGACAUGUGGACUUUAGA...,0.2157,0.3596,0.2173
11,CAUGUCAACAAUCUGUUUUGGUUCGUUUGAAUGCAA,Direct,UUGCAUUCAAACGAACCAAAACAGAUUGUUGACAUGGGACUUUAGA...,0.2008,0.3968,0.2938


In [39]:
score=[]
for i in range (len(df.Triggers)):
    score.append(54.3 -71.7*df.iloc[i,3] -49.1*df.iloc[i,4] -22.6**df.iloc[i,5])
P4=pd.Series(score)
df["score"]=P4.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


In [48]:
df.sort_values("score", ascending=False,inplace=True)
df.to_csv("Ranked_designs.csv")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [51]:
Dg_RBS_linker=[]
for i in range(len(df.iloc[:,3])):
    RBS_linker=df.iloc[i,2][48:96]
    Dg_RBS_linker.append(DG(RBS_linker,"",2))
P5=pd.Series(Dg_RBS_linker)
df["Dg_RBS_linker"]=P5.values

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


In [53]:
df.to_csv("Ranked_designs.csv")