In [1]:
import coral as cr

# Overview

A common task in synthetic biology is to design and test several variants of a given biological part. In this case, we were interested in testing length variants of the pFUS1 promoter in yeast. We wanted to compare the expression of the most commonly-reported promoter (length 435) to much shorter (length 250) and much longer (length 1000) variants.

YCL027W is the systematic name for FUS1. To generate our promoters, we will take regions just prior to the start codon of FUS1. We decided on our sizes based on looking at the Yeast Promoter Atlas Promoter (http://ypa.csbb.ntu.edu.tw/do?act=gene_by_kw&query=FUS1)
  * 250 bp: includes all STE12 and DIG1 binding sites
  * 435 bp: exact YPA-predicted promoter length. Has TATA box and NRG1 binding site as well.
  * 1000 bp: overkill - includes a large chunk of the gene before FUS1

### Getting the region upstream of the FUS1 gene

Coral has several built-in ways of retrieving sequences. The simplest and most straightforward method is reading standard sequence file formats like genbank (.gb, .ape) or FASTA (.fasta, .fa, .seq).

In this example, we read in the yeast chromosome contaning FUS1, chromosome 3, then retrieve the sequence upstream of FUS1.

In [2]:
# FUS1 is on chromosome 3
chr3 = cr.io.read_dna('./chr03.gb')

We could also have used online resources like the SGD or Yeast Promoter Atlas.

In [3]:
# First, we need to isolate the FUS1 coding region
fus1_features = []
for feature in chr3.features:
    if 'locus_tag' in feature.qualifiers:
        if 'YCL027W' in feature.qualifiers['locus_tag']:
            fus1_features.append(feature)
fus1_features

[FUS1 'gene' feature (71802 to 73341) on strand 0,
 FUS1(1) 'mRNA' feature (71802 to 73341) on strand 0,
 FUS1(2) 'CDS' feature (71802 to 73341) on strand 0]

There are several features, all describing the coding region of FUS1. If we were writing a general-purpose script, we filter to features of type 'gene' or 'CDS', then grab either the start coordinate (if on strand 0) or the end coordinate (if on strand 1).

But we can easily see that we can use any of these features and use its start codon, so that's what we'll do.

In [4]:
feature = fus1_features[0]

upstream_lengths = [250, 435, 1000]

promoters = []
for length in upstream_lengths:
    promoter = chr3[feature.start - length:feature.start]
    promoter.name = "pFUS1({})".format(len(promoter))
    promoter.features = [cr.Feature(promoter.name, 0, len(promoter), "promoter")]
    promoters.append(promoter)

Done! We've designed all of our promoters. If we want to be particularly careful, we can do some checks on the outputs as well.

In [5]:
# Promoters should be 250, 435, and 1000 bp long
print 'Promoter lengths: {}'.format([len(p) for p in promoters])

# Each promoter should be a subset of the 1000 bp version - i.e.
# we should be able to find the 250 bp promoter in the 1000 bp one
subsets = [promoter in promoters[-1] for promoter in promoters]
print 'Smaller promoters are contained within bigger ones: {}'.format(subsets)

# The FUS1 gene itself should not be part of the promoters
fus1 = chr3.extract(feature)
print 'First ten bases of FUS1 in promoters?: {}'.format([fus1[:10] in p for p in promoters])

Promoter lengths: [250, 435, 1000]
Smaller promoters are contained within bigger ones: [True, True, True]
First ten bases of FUS1 in promoters?: [False, False, False]
