# Oligo Design

For Nucleaseq, we want to design sequences with the following basic structure:

| Left primer | Left BC | Left buffer | Target$_n$ | Right buffer | Right fill | Right BC | Right primer |
| - | - | - | - | - | - | - | - |

Where Target$_n$ is from the set of all desired modified target sequences. 

The set of Target$_n$'s is specific to each experiment, but there are some relatively standard sets. For example, most experiments will wish to include all single- and double-mismatch sequences. While on the other hand, the PAM structure for each CRISPR variant is different and may require custom sequence generation. And other nucleases may have entirely different needs.

To handle this, we have a number of functions for standard modifications in design.py, which can be called below, while at the same time we have a space explicitly reserved for custom sequence generation: "Custom sequence functions". After the set of target-generation functions is complete, they need added to the "Construct Sequences" section below, in the manner shown by the included examples. Go through this section carefully to verify the set of included sequences is correct.

The "Run parameters" section needs updated according to the experimental requirements, as well. One parameter which needs specified is a list all canonical cut positions along the target sequence. This needs to be a python integer with ".5" after it to indicate the cut position. For instance, 18.5 cuts between python indices 18 and 19. The lower portion of this notebook needs this information to find appropriate primer sequences.

Finally, the user is expected to adjust the primers as necessary to fit their experimental conditions. See the "Replace Primers" section for details.

## Sequence Motifs

### For this example, we want the following sequences and modifications:

* Target D-Nme2
* single mismatches
* double mismatches
* single insertions
* double insertions
* single deletions 
* double deletions
* scanning mismatch regions (using the complement) for 3 bp to 24 bp length regions
* Perfect target with different buffer regions
* Perfect target with various barcodes
* Random negative control seqs
* Various 4N PAM subsets
* single mm and single ins seqs
* single mm and single del seqs

In [1]:
import time
notebook_start_time = time.time()

In [2]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [3]:
import sys
import yaml
import itertools
import random
import math
import logging
import scipy.misc
import editdistance
import numpy as np
from copy import deepcopy
from collections import defaultdict, Counter
from Bio.Seq import Seq

In [4]:
from nucleaseq.NucleaSeqOligo import NucleaSeqOligo
from nucleaseq.equalmarginalseqs import generate_clean_random_eqmarg_seqs
from nucleaseq import design, seqtools

In [5]:
master_seed = 43
random.seed(master_seed)
rand_seeds = [random.randint(0, 100) for i in range(10)]

# Run parameters

In [6]:
total_desired_seqs = 12472
n_err_detect_seqs = 150  # Number of random sequences for negative control
min_perfect_target_copies = 50

targets_fpath = '/shared/targets.yml'
target_name = 'D'

barcodes_fpath = '/mnt/marble/hdd2/hawkjo/barcode_and_nucleaseq/src/freebarcodes/barcodes/barcodes17-2.txt'
barcodes = [line.strip() for line in open(barcodes_fpath)]

min_primer_len = 21  # First length to try. Will go smaller if possible. Adjust this if notebook too slow.
max_primer_len = 30

bad_substrs = ['CC', 'GG', 'AAA', 'TTT'] # Forbidden subseqs in buffers or primers (here Cas9 or Cas12a PAMs)

nprocs = 20

In [7]:
h_cannonical_cut_sites = [20.5, 22.5, 26.5]  # Include all cut sites on all strands, absolute coordinates
fudge_factor = 5             # How far from a cannonical cut site are possible cuts
min_buffer_len = 5           # Min length of target-flanking buffers

pamtarg_coord_one_pos = 24   # Position in the target sequence just to the right of the PAM-target boundary
abs_cannonical_cut_sites = [int(math.ceil(ccs)) for ccs in h_cannonical_cut_sites]
cannonical_cut_sites = [ccs - pamtarg_coord_one_pos for ccs in abs_cannonical_cut_sites] # convert to pamtarg coords

print 'Consider cut sites within {} bp of pamtarg positions: {}'.format(fudge_factor, cannonical_cut_sites)

Consider cut sites within 5 bp of pamtarg positions: [-3, -1, 3]


In [8]:
targets = yaml.load(open(targets_fpath))
target = targets[target_name]
target_seeds = [target[:14], target[-13:]]

In [9]:
print 'Target {} ({} bp): {}'.format(target_name, len(target), target)
print 'Target seeds:'
for target_seed in target_seeds:
    print '    ({} bp): {}'.format(len(target_seed), target_seed)
print 'Cannonical cut sites:'
for ccs in h_cannonical_cut_sites:
    ccs = int(math.ceil(ccs))
    print '    {} x {}'.format(target[:ccs], target[ccs:])

Target D (27 bp): TTTAGTGATAAGTGGAATGCCATGTGG
Target seeds:
    (14 bp): TTTAGTGATAAGTG
    (13 bp): GAATGCCATGTGG
Cannonical cut sites:
    TTTAGTGATAAGTGGAATGCC x ATGTGG
    TTTAGTGATAAGTGGAATGCCAT x GTGG
    TTTAGTGATAAGTGGAATGCCATGTGG x 


# Static Sequences

In [10]:
bases = 'ACGT'

In [11]:
random.seed(rand_seeds[7])
cr1 = design.shuffled_equalish_bases(18, bad_substrs)
cr2 = design.shuffled_equalish_bases(18, bad_substrs)

buffer1 = design.get_buffer(5, 'left', target, bad_substrs)
buffer2 = design.get_buffer(5, 'right', target, bad_substrs)

print 'Temp left/right buffers:', buffer1, buffer2
print 'Temp primer 1: {} bp, {:.1f}% GC  {}'.format(len(cr1), 
                                               100*(cr1.count('C') + cr1.count('G'))/float(len(cr1)),
                                               cr1)
print 'Temp primer 2 (rev_comp): {} bp, {:.1f}% GC  {}'.format(len(cr2), 
                                               100*(cr2.count('C') + cr2.count('G'))/float(len(cr2)),
                                               cr2)

Temp left/right buffers: ATCGA ATGCA
Temp primer 1: 18 bp, 50.0% GC  AGATCGCAGCTCGTCAAT
Temp primer 2 (rev_comp): 18 bp, 50.0% GC  CGTCATAATGAACGCGCT


# Generation of Sequences for Pilot Experiment

In [12]:
from freebarcodes.editmeasures import simple_hamming_distance
import freebarcodes.seqtools as fbseqtools

## Custom sequence functions

In [13]:
interesting_pams = [
    'NNCTTN',
    'NNCCTN',
    'NNTCTN',
    'NNCGCN',
    'NNTTTN',
    'NNATTN',
    'NNGTTN',
    'NNGCTN',
    'NNACTN',
    'NNTTCN',
    'NNCATN',
    'NNTCCN',
    'NNCCCN',
]

def get_interesting_pam_seqs():
    assert target.startswith('TTTA'), target
    template_seq = target[4:]
    output = set()
    for pam in interesting_pams:
        pam_kernel = pam[2:5]
        for b1, b2, b3 in itertools.product(bases, repeat=3):
            output.add(b1 + b2 + pam_kernel + b3 + template_seq)
    return output

def next_different_base(samp):
    samp_bases = set(samp)
    assert len(samp_bases) < 4, samp
    for b in bases:
        if b not in samp_bases:
            return b
        
def get_one_mm_seq_per_pos(seq):
    output = set()
    for i in range(len(seq)):
        # Choose mismatch to be different from ref base and neighboring bases
        neighborhood = seq[max(i-1, 0):i+2]
        assert len(neighborhood) == 3 or i == 0 or i == len(seq) - 1, (i, neighborhood)
        output.add(seq[:i] + next_different_base(neighborhood) + seq[i+1:])
    return output
        
def get_one_ins_seq_per_pos(seq):
    output = set()
    for i in range(len(seq)):
        # Choose insertion to be different from either neighboring base
        neighborhood = seq[max(i-1, 0):i+1]
        assert len(neighborhood) == 2 or i == 0, (i, neighborhood)
        output.add(seq[:i] + next_different_base(neighborhood) + seq[i:])
    return output
                
def get_single_mm_single_del_seqs():
    output = set()
    for del_seq in fbseqtools.get_deletion_seqs(target, 1):
        output.update(get_one_mm_seq_per_pos(del_seq))
    return output

def get_single_mm_single_ins_seqs():
    output = set()
    for mm_seq in get_one_mm_seq_per_pos(target):
        output.update(get_one_ins_seq_per_pos(mm_seq))
    return output

## Construct sequences

In [14]:
log = logging.getLogger()
log.addHandler(logging.StreamHandler())
log.setLevel(logging.INFO)

In [15]:
# This cell is the heart of the sequence construction. It is a single cell to 
# guarantee all parts are always performed together, so make reproducible output.

#----------------------------------------------------------------------------------
# Setup the barcode pairs
#----------------------------------------------------------------------------------

def min_dist_to_target_seed(s):
    min_dists = []
    for target_seed in target_seeds:
        if len(s) > len(target_seed):
            slong, sshort = s, target_seed
        else:
            slong, sshort = target_seed, s
        slong_rc = seqtools.dna_rev_comp(slong)
        min_dists.append(min(editdistance.eval(sshort, sl[i:i+len(sshort)]) 
                             for i in range(len(slong) - len(sshort) + 1)
                             for sl in [slong, slong_rc]))
    return min(min_dists)

barcodes.sort(key=min_dist_to_target_seed, reverse=True) # Prefer barcodes less similar to target
err_barcodes = barcodes[:n_err_detect_seqs]
norm_barcodes = barcodes[n_err_detect_seqs:total_desired_seqs]
log.info('Max / min accepted / min barcode distances to target seed: {} / {} / {}\n'.format(
    min_dist_to_target_seed(err_barcodes[0]),
    min_dist_to_target_seed(norm_barcodes[-1]),
    min_dist_to_target_seed(barcodes[-1])
))

random.seed(rand_seeds[2])
random.shuffle(norm_barcodes)
random.shuffle(err_barcodes)

barcode_pairs, err_barcode_pairs = [], []
for i in xrange(len(norm_barcodes)):
    bc1 = norm_barcodes[i]
    i2 = (i+1) % len(norm_barcodes)
    bc2_rc = str(Seq(norm_barcodes[i2]).reverse_complement())
    barcode_pairs.append((bc1, bc2_rc))
for i in xrange(len(err_barcodes)):
    bc1 = err_barcodes[i]
    i2 = (i+1) % len(err_barcodes)
    bc2_rc = str(Seq(err_barcodes[i2]).reverse_complement())
    err_barcode_pairs.append((bc1, bc2_rc))

    
#----------------------------------------------------------------------------------
# Custom sequences
#----------------------------------------------------------------------------------

sequence_set = set()

# Interesting PAMs
interesting_pam_seqs = get_interesting_pam_seqs()
sequence_set.update(interesting_pam_seqs)
log.info('Interesting PAM seqs: {}'.format(len(interesting_pam_seqs)))

# Single mm and single del
single_mm_single_del_sequences = get_single_mm_single_del_seqs()
log.info('Single mm, Single del seqs: %d' % len(single_mm_single_del_sequences))
sequence_set.update(single_mm_single_del_sequences)

# Single mm and single ins
single_mm_single_ins_sequences = get_single_mm_single_ins_seqs()
log.info('Single mm, Single ins seqs: %d' % len(single_mm_single_ins_sequences))
sequence_set.update(single_mm_single_ins_sequences)


#----------------------------------------------------------------------------------
# Standard sequences
#----------------------------------------------------------------------------------

# All possible single mismatches and indels
single_mismatches = fbseqtools.get_mismatch_seqs(target, 1)
sequence_set.update(single_mismatches)
log.info('Single mismatch seqs: %d' % len(single_mismatches))

single_insertions = fbseqtools.get_insertion_seqs(target, 1)
sequence_set.update(single_insertions)
log.info('Single insertion seqs: %d' % len(single_insertions))

single_deletions = fbseqtools.get_deletion_seqs(target, 1)
sequence_set.update(single_deletions)
log.info('Single deletion seqs: %d' % len(single_deletions))

# Every stretch of mismatched sequence at each possible position in the target
c_stretch = set()
for stretch_size in range(2, len(target) + 1):
    complement_stretch = fbseqtools.get_stretch_of_complement_seqs(target, stretch_size)
    c_stretch |= complement_stretch
sequence_set.update(c_stretch)
log.info('Complement stretch seqs: %d' % len(c_stretch))

# PAMs with perfect target
#randomized_pams = set()
#for seq in fbseqtools.get_randomized_pam_seqs(target, 4, 5, '5p'):
#    randomized_pams.add('A' + seq)
#    if simple_hamming_distance(seq[1: 4], 'TTT') <= 1:
#        for base in bases:
#            randomized_pams.add(base + seq[1:])
#randomized_pams |= fbseqtools.get_randomized_pam_seqs(target, 3, 5, '3p')
#sequence_set.update(randomized_pams)
#log.info('6N PAMs: %d' % len(randomized_pams))

# Perfect Targets A-E and A-Csy
other_targets = set(targets.values())
log.info('Alternative targets: %d' % len(other_targets))
sequence_set.update(other_targets)

# double mismatches, deletions, and insertions 
double_mismatch_sequences = fbseqtools.get_mismatch_seqs(target, 2)
sequence_set.update(double_mismatch_sequences)
log.info('Double mismatch seqs: %d' % len(double_mismatch_sequences))

double_deletions = fbseqtools.get_deletion_seqs(target, 2)
log.info('Double deletion seqs: %d' % len(double_deletions))
sequence_set.update(double_deletions)

double_insertion_sequences = fbseqtools.get_insertion_seqs(target, 2)
log.info('Double insertion seqs: %d' % len(double_insertion_sequences))
sequence_set.update(double_insertion_sequences)

# Generate complete sequences from collected sequences
complete_sequences = set()
for sequence in sequence_set:
    left_barcode, right_barcode = barcode_pairs.pop()
    complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, buffer1, sequence, buffer2, '', right_barcode, cr2))
    
# Perfect target with different buffer regions    
alt_buffers = set([(buffer, buffer[::-1]) for buffer in fbseqtools.get_randomized_stretch_seqs(buffer1, 2)
                   if 'AAA' not in buffer and 'CCC' not in buffer and 'GGG' not in buffer and 'TTT' not in buffer
                   and buffer.startswith(buffer1[:3])])
good_gc_alt_buffers = set()
for a, b in alt_buffers:
    gc_content = float(a.count('C') + a.count('G')) / len(a)
    if gc_content <= 0.6:
        good_gc_alt_buffers.add((a, b))
for left_buffer, right_buffer in good_gc_alt_buffers:
    left_barcode, right_barcode = barcode_pairs.pop()
    complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, left_buffer, target, right_buffer, '', right_barcode, cr2))
log.info('Alternative buffer with perfect target seqs: %d' % len(good_gc_alt_buffers))

# Perfect target with various barcodes
log.info('Perfect target with different barcode seqs: %d' % min_perfect_target_copies)
for _ in range(min_perfect_target_copies):
    left_barcode, right_barcode = barcode_pairs.pop()
    complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, buffer1, target, buffer2, '', right_barcode, cr2))


# Error rate detection sequences    
len_err_detect_seq = (
    max(len(seq) for seq in complete_sequences) 
    - sum(len(s) for s in [cr1, left_barcode, '', right_barcode, cr2])
)
err_detect_seqs = generate_clean_random_eqmarg_seqs(nseq=n_err_detect_seqs,
                                                    seqlen=len_err_detect_seq)
err_min_dists = [min_dist_to_target_seed(seq) for seq in err_detect_seqs]
log.info('Before search min error detection seq dist to target seed quartiles: {} / {} / {} / {} / {}'.format(
    *map(int, np.percentile(err_min_dists, [0, 25, 50, 75, 100]))
))

for _ in range(50):
    test_err_detect_seqs = generate_clean_random_eqmarg_seqs(nseq=n_err_detect_seqs,
                                                             seqlen=len_err_detect_seq)
    if (min(min_dist_to_target_seed(seq) for seq in test_err_detect_seqs)
        > min(min_dist_to_target_seed(seq) for seq in err_detect_seqs)):
        err_detect_seqs = test_err_detect_seqs
        
err_min_dists = [min_dist_to_target_seed(seq) for seq in err_detect_seqs]
log.info('After search min error detection seq dist to target seed quartiles: {} / {} / {} / {} / {}'.format(
    *map(int, np.percentile(err_min_dists, [0, 25, 50, 75, 100]))
))
log.info('Error detection seqs: %d' % len(err_detect_seqs))
        
for err_seq in err_detect_seqs:
    left_barcode, right_barcode = err_barcode_pairs.pop()
    complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, '', err_seq, '', '', right_barcode, cr2))

    
#----------------------------------------------------------------------------------
# Fill unused seqs
#----------------------------------------------------------------------------------

log.info('Remaining seqs to be filled: {}'.format(total_desired_seqs - len(complete_sequences)))
num_singles = len(single_deletions) + len(single_insertions) + len(single_mismatches)
singleton_copies = 1
while total_desired_seqs - len(complete_sequences) > num_singles:
    singleton_copies += 1
    for sequence in single_deletions | single_insertions | single_mismatches:
        left_barcode, right_barcode = barcode_pairs.pop()
        complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, buffer1, sequence, buffer2, '', right_barcode, cr2))
log.info('Total copies/sequences of single error seqs: {}/{}'.format(singleton_copies, singleton_copies*num_singles))

added_perfects = 0
while total_desired_seqs > len(complete_sequences):
    added_perfects += 1
    left_barcode, right_barcode = barcode_pairs.pop()
    complete_sequences.add(NucleaSeqOligo(cr1, left_barcode, buffer1, target, buffer2, '', right_barcode, cr2))
log.info('Added perfect seqs: {}'.format(added_perfects))
log.info('Total perfect seqs: {}'.format(added_perfects 
                                              + min_perfect_target_copies
                                              + len(good_gc_alt_buffers)))
    
log.info('Total sequences generated: %d' % len(complete_sequences))
if barcode_pairs:
    log.warning('\n(%d unused barcode pairs)\n' % len(barcode_pairs))

Max / min accepted / min barcode distances to target seed: 9 / 6 / 2

Interesting PAM seqs: 832
Single mm, Single del seqs: 507
Single mm, Single ins seqs: 720
Single mismatch seqs: 81
Single insertion seqs: 82
Single deletion seqs: 20
Complement stretch seqs: 351
Alternative targets: 25
Double mismatch seqs: 3159
Double deletion seqs: 191
Double insertion seqs: 3315
Alternative buffer with perfect target seqs: 15
Perfect target with different barcode seqs: 50
Before search min error detection seq dist to target seed quartiles: 2 / 5 / 5 / 6 / 6
After search min error detection seq dist to target seed quartiles: 4 / 5 / 5 / 6 / 6
Error detection seqs: 150
Remaining seqs to be filled: 3038
Total copies/sequences of single error seqs: 17/3111
Added perfect seqs: 110
Total perfect seqs: 175
Total sequences generated: 12472


In [16]:
oligo = list(complete_sequences)[0]
print 'Example oligo ({} bp):'.format(len(oligo))
oligo.pieces

Example oligo (107 bp):


['AGATCGCAGCTCGTCAAT',
 'ATTCATGTTCCTCAAGC',
 'ATCGA',
 'TTTAGTGATAAGTGGAATGCCCTGTGG',
 'ATGCA',
 '',
 'AGACGTACTGCTGCGCT',
 'CGTCATAATGAACGCGCT']

In [17]:
bad_gc_seqs = [seq for seq in complete_sequences if seq.gc_content >= 0.6 or seq.gc_content <= 0.4]
print '{:.1f}% of sequences have unusually high or low GC content'.format(100*float(len(bad_gc_seqs))/len(complete_sequences))

0.0% of sequences have unusually high or low GC content


In [18]:
left_bcs = set(oligo._barcode_left for oligo in complete_sequences)
right_bc_rcs = set(str(Seq(oligo._barcode_right).reverse_complement()) for oligo in complete_sequences)
assert len(left_bcs) == len(right_bc_rcs)
print 'Number of seqs, barcodes:', len(left_bcs)

Number of seqs, barcodes: 12472


In [19]:
raw_oligo_lens = set(map(len, complete_sequences))
print 'Raw oligos range in length from {} nt to {} nt'.format(min(raw_oligo_lens), max(raw_oligo_lens))

Raw oligos range in length from 103 nt to 115 nt


# Build Primers

In [20]:
random.seed(rand_seeds[9])
complete_sequences = design.update_buffers(
    complete_sequences, 
    min_primer_len, 
    min_buffer_len,
    target,
    bad_substrs,
    fudge_factor,
    abs_cannonical_cut_sites,
    pamtarg_coord_one_pos
)

print 'Oligo lengths:', set(map(len, complete_sequences))

Oligo lengths: set([138])


In [21]:
oligo = list(complete_sequences)[0]
print 'Example oligo:'
oligo.pieces

Example oligo:


['AGATCGCAGCTCGTCAAT',
 'ATTCATGTTCCTCAAGC',
 'CACTGA',
 'TTTAGTGATAAGTGGAATGCCCTGTGG',
 'TAGAGTACTAGCGAGCTATCTGCGCAC',
 'GCTCGAAT',
 'AGACGTACTGCTGCGCT',
 'CGTCATAATGAACGCGCT']

## Find usable primer prefixes

In [22]:
for primer_len in range(min_primer_len, max_primer_len + 1):
    print 'Trying', primer_len
    random.seed(rand_seeds[3])
    complete_sequences = design.update_buffers(
        complete_sequences, 
        primer_len, 
        min_buffer_len,
        target,
        bad_substrs,
        fudge_factor,
        abs_cannonical_cut_sites,
        pamtarg_coord_one_pos
    )
    print 'Oligo lengths:', set(map(len, complete_sequences))
    oligo = list(complete_sequences)[0]
    print 'Left/right buffers: {} / {}'.format(oligo._buffer_left, oligo._buffer_right)
    
    good_prefixes = design.find_good_prefixes(
        complete_sequences, 
        primer_len, 
        bad_substrs,
        cannonical_cut_sites,
        fudge_factor,
        nprocs
    )
    if len(good_prefixes) >= 2:
        break
    print 

assert len(good_prefixes) >= 2
print
print 'Success!'
print good_prefixes
print

if primer_len == min_primer_len:
    prev_good_prefixes = good_prefixes[:]
    for fake_primer_len in range(primer_len-1, 0, -1):
        print 'Trying buffers based on', fake_primer_len
        random.seed(rand_seeds[3])
        complete_sequences = design.update_buffers(
            complete_sequences, 
            fake_primer_len, 
            min_buffer_len,
            target,
            bad_substrs,
            fudge_factor,
            abs_cannonical_cut_sites,
            pamtarg_coord_one_pos
        )
        print 'Oligo lengths:', set(map(len, complete_sequences))
        oligo = list(complete_sequences)[0]
        print 'Left/right buffers: {} / {}'.format(oligo._buffer_left, oligo._buffer_right)

        good_prefixes = design.find_good_prefixes(
            complete_sequences, 
            primer_len, 
            bad_substrs,
            cannonical_cut_sites,
            fudge_factor,
            nprocs
        )
        if len(good_prefixes) >= 2:
            print
            print good_prefixes
            prev_good_prefixes = good_prefixes[:]
        else:
            break
        print 
    print
    print 'Failed, reverting to best previous:'
    good_prefixes = prev_good_prefixes
    fake_primer_len += 1
print good_prefixes

Trying 21
Oligo lengths: set([138])
Left/right buffers: CATGCA / TCGACATGCAGCGACTGCTAGTACTAG
.
Trying 22
Oligo lengths: set([140])
Left/right buffers: CGCATGA / TAGTAGCTCACGACTACTGCAGTGCAGT
.*******
Success!
['GCGTTGTTCGAGCTGTTGCGTT', 'AATGTGTTCGCTTCGTTGTTGT', 'TTCGCGCTGTAAGAGTGTGTTG', 'GCTGCAACTCAATGTAACTGTG', 'TCTTCTTGCTGTTCTTGTTACG', 'AACTCTATCTTCACGTCTTCGA', 'GAGCGAGCGACAAGCGTGCAAT']

['GCGTTGTTCGAGCTGTTGCGTT', 'AATGTGTTCGCTTCGTTGTTGT', 'TTCGCGCTGTAAGAGTGTGTTG', 'GCTGCAACTCAATGTAACTGTG', 'TCTTCTTGCTGTTCTTGTTACG', 'AACTCTATCTTCACGTCTTCGA', 'GAGCGAGCGACAAGCGTGCAAT']


# Replace primers

Test the above primers in your oligo analyzer of choice. If your experimental conditions require a longer primer, add bases as desired below. If you need shorter primers, do not change anything here. The full length of these primers is needed for the post-experiment analysis. Simply order primers with the inner bases removed as needed for your experiment.

In [23]:
left_primer = good_prefixes[0] + ''
right_primer = good_prefixes[1] + 'G'
cr_left = left_primer
cr_right = fbseqtools.dna_rev_comp(right_primer)

print 'Left / Right primers:'
print left_primer
print right_primer
print
print 'Hence, the oligos will have the form:'
print
print '{}    BC_left/Buf_left/Target/Buf_right/BC_right*    {}'.format(cr_left, cr_right)

Left / Right primers:
GCGTTGTTCGAGCTGTTGCGTT
AATGTGTTCGCTTCGTTGTTGTG

Hence, the oligos will have the form:

GCGTTGTTCGAGCTGTTGCGTT    BC_left/Buf_left/Target/Buf_right/BC_right*    CACAACAACGAAGCGAACACATT


In [24]:
for oligo in complete_sequences:
    oligo._cr_left = cr_left
    oligo._cr_right = cr_right

In [25]:
print 'Oligo lengths:', set(map(len, complete_sequences))

Oligo lengths: set([149])


# Output

In [26]:
seq_fpath = 'target_{}_seqs.txt'.format(target_name)
exploded_seq_fpath = 'exploded_' + seq_fpath

with open(seq_fpath, 'w') as out:
    out.write('\n'.join([oligo.sequence for oligo in complete_sequences]))
with open(exploded_seq_fpath, 'w') as out:
    out.write('\n'.join([oligo.tab_delimited_str() for oligo in complete_sequences]))

In [27]:
total_time = time.time() - notebook_start_time

In [28]:
print 'Run time: {} seconds'.format(total_time)

Run time: 2123.42604494 seconds
