# **How to: creating automatic overhangs**

This tutorial is aimed at people who will continue their experiments with the Golden Gate method.

How about designing your final plasmid without worrying about each separate part and using a script to add the restriction binding sites, spacer, and overhangs? That’s what you find here!

---

Authors: [**@GiovannaMaklouf**](https://twitter.com/giomaklouf), [**@IsaacG**](https://twitter.com/IsaacGuerreiro9)


# Configurations for this tutorial




First let's run some important settings so you can run this tutorial successfully. 


Colab notebooks use python kernels to run each cell. However, because ***Poly*** is written in **Go language (golang)**, we need to install and configure some things in colab to make feasible run something in go lang here.

### **1. In order to start the golang environment, run the line below:**

In [None]:
# this process may take a few minutes
!add-apt-repository ppa:longsleep/golang-backports -y
!apt update
!apt install golang-go
%env GOPATH=/root/go
!go get -u github.com/gopherdata/gophernotes
!cp ~/go/bin/gophernotes /usr/bin/
!npx degit gopherdata/gophernotes/kernel \
     /usr/local/share/jupyter/kernels/gophernotes

0% [Working]            Get:1 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
0% [Waiting for headers] [Connecting to security.ubuntu.com (91.189.91.39)] [Co                                                                               Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease
0% [Connecting to security.ubuntu.com (91.189.91.39)] [Connected to cloud.r-pro                                                                               Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
                                                                               Hit:4 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
0% [3 InRelease 50.4 kB/88.7 kB 57%] [Connecting to security.ubuntu.com (91.1890% [2 InRelease gpgv 242 kB] [3 InRelease 54.7 kB/88.7 kB 62%] [Connecting to s0% [2 InRelease gpgv 242 kB] [Waiting for headers] [Connecting to security.ubun                                             

### **2. Download important data to run this tutorial**

In [None]:
!rm -rf $GOPATH/pkg/mod/github.com/!open-!science-!global
!rm -rf $GOPATH/pkg/mod/cache/download/github.com/!open-!science-!global
!go get -u github.com/Open-Science-Global/poly@e3e1c61

go: downloading github.com/Open-Science-Global/poly v0.11.3


In [None]:
!wget https://raw.githubusercontent.com/Open-Science-Global/friendzymes-cookbook/main/data/create-parts-with-overhangs/pht43-final.gb

--2021-10-16 23:18:36--  https://raw.githubusercontent.com/Open-Science-Global/friendzymes-toolkit/main/data/ecoli-k12-cdss.fasta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5054174 (4.8M) [text/plain]
Saving to: ‘ecoli-k12-cdss.fasta’


2021-10-16 23:18:37 (62.6 MB/s) - ‘ecoli-k12-cdss.fasta’ saved [5054174/5054174]



### **3. Connect Colab Notebook to your GDrive (not required)**

The previous code will download the files temporarily. If you want to download them to a folder on your drive and save it for later analysis or if you are already using this notebook to run your own files, you should connect your Google Drive to Colab and you will be able to access, read and save files permanently. 
So, if you prefer, you can do this with the code line below:



In [None]:
from google.colab import drive
drive.mount('/content/drive')

KeyboardInterrupt: ignored

## After running these steps, click on **Runtime** in the menu bar & **Change Runtime Type** to Go, if it hasn't changed yet.

This will make Colab starting use a open source go kernel called Gopher Notes.

Now we are ready to work.

# **Creating automatic overhangs**

## **Importing packages and pre-requisites**

In [None]:
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"
	"strings"
	"strconv"
	"sync"
	"math/rand"

  "github.com/Open-Science-Global/poly"
	"github.com/Open-Science-Global/poly/checks"
	"github.com/Open-Science-Global/poly/io/fasta"
	"github.com/Open-Science-Global/poly/io/genbank"
	"github.com/Open-Science-Global/poly/linearfold"
	"github.com/Open-Science-Global/poly/synthesis"
	"github.com/Open-Science-Global/poly/transform"
	"github.com/Open-Science-Global/poly/transform/codon"
	"github.com/Open-Science-Global/poly/finder"
)

## **1. Preparations for creating automatic overhangs: functions set up**


Run the cells below to generate the function to automatic create based on the sequences of your choice. 
For example: You created your final plasmid with everything you think will be necessary to get from the organism your final objective, but would be nice to have a tool to separate all your project in parts, already organized by type using the assembly system connectors from your favorite assembly standard. In this case is the Friendzymes extended MoClo standard, but feel free to change to your pattern.


**Important points:**
*   We have below ONE main function (AutomaticAnnotation) and 15 subfunctions (randonDnaSequence, addBbsiStructureFoward, addBbsiStructureReverse, addOverhangs, reateListPromoters, createListTerminator, createListFlankReverse, createListTargetSelection, createListEcoliSelectionOrigin, createListEcoliOrigin, createListRbs, createListEcoliSelection, createListCds, filterByType, generateOverhangs) that are inside the main one and also need to be set. 
*   Overall, you don't need to change anything to save and use these functions in the second step of this tutorial (2. Running), except if you want to use a different **assembly standard**, so we highly recommend you to understand line by line and make the necessary changes.

In [None]:
// This functions generates a random sequence of length N using a specific seed.
func randomDnaSequence(length int, seed int64) string {
  var dnaAlphabet = []rune("ATCG")
	rand.Seed(seed)

	randomSequence := make([]rune, length)

	for basepair := range randomSequence {
		  randomIndex := rand.Intn(len(dnaAlphabet))
			randomSequence[basepair] = dnaAlphabet[randomIndex]
	}
	return string(randomSequence)
}

randomDnaSequence(10, int64(rand.Intn(100)))

CGTCCCTGAT

In [None]:
// 15 random bp -> bbsi cut site forward GAAGAC -> 2bp -> bbsi overhang GGAG -> random 8bp ->
// bsai site forward -> bsai overhang -> main sequence -> bsai overhang 2 -> bsai site reverse -> random 8bp ->
// bbsi overhang CGCT -> 2bp -> bbsi cute site reverse GTCTTC -> 15 random bp
func addBbsiStructureFoward(internalOverhang string) string {
  bbsiFoward := "GAAGAC" 
  bbsiOverhangFoward := "GGAG"
  randomFoward := randomDnaSequence(15, int64(rand.Intn(10000)))
  twoRandomFoward := randomDnaSequence(2, int64(rand.Intn(10000))) 
  eightRandomFoward := randomDnaSequence(8, int64(rand.Intn(10000))) 

  return randomFoward  + bbsiFoward + twoRandomFoward + bbsiOverhangFoward + eightRandomFoward + internalOverhang

}

func addBbsiStructureReverse(internalOverhang string) string {
  bbsiReverse := "GTCTTC"
  bbsiOverhangReverse := "CGCT"
  randomReverse := randomDnaSequence(15, int64(rand.Intn(10000)))
  twoRandomReverse := randomDnaSequence(2, int64(rand.Intn(10000)))
  eightRandomReverse := randomDnaSequence(8, int64(rand.Intn(10000)))

  return internalOverhang + eightRandomReverse + bbsiOverhangReverse + twoRandomReverse + bbsiReverse + randomReverse

}

The function below *addOverhangs* is responsible for receiving a full sequence and a list of annotations (called features as genbank standard) and extract from the final sequence a specific part (a promotor for example) and add the correct overhangs for a specific standard.

In [None]:
type MoCloPart struct {
  sequence string
  name string
  typePart string
}

//Features
func addOverhangs(sequence string, features []poly.Feature, fiveOverhang string, threeOverhang string) []MoCloPart {
  var parts []MoCloPart
  for _, feature := range features {
    annotationSequence := sequence[feature.SequenceLocation.Start:feature.SequenceLocation.End]
    parts = append(parts, MoCloPart{
                            strings.ToUpper(fiveOverhang + annotationSequence + threeOverhang),
                            feature.Attributes["label"],
                            feature.Type,
                          })
  }
  return parts
}

We also created functions for each specific part type to add overhangs following our specific assembly standard.

In [None]:
import "github.com/Open-Science-Global/poly"

func createListPromoters(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "GGAG"
  threeOverhang := "TACT" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

// An observation it's when we have a promoter it's close to an insulator, or a promoter following
// a recombination site we should use other overhangs, and we have to create another type annotations
// instead of PR

func createListRbs(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "TACT"
  threeOverhang := "GACC" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListCds(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "AATG"
  threeOverhang := "GCTT" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

// if we have a CDS close to a c terminal tag we should use another overhangs
// and another type annotations instead of CS 

func createListTerminator(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "GCTT"
  threeOverhang := "CGCT" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListTargetSelection(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "AAGG"
  threeOverhang := "ATGA" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListEcoliSelection(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "GCAA"
  threeOverhang := "ACTA" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListEcoliOrigin(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "ACTA"
  threeOverhang := "AAAA" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListFlankReverse(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "AAAA"
  threeOverhang := "AAGG" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

func createListEcoliSelectionOrigin(sequence string, features []poly.Feature) []MoCloPart {
  spacer := "T"
  bsaiFoward := "GGTCTC" + spacer
  bsaiReverse := spacer + "GAGACC"
  fiveOverhang := bsaiFoward + "GCAA"
  threeOverhang := "AAAA" + bsaiReverse
  return addOverhangs(sequence, features, addBbsiStructureFoward(fiveOverhang), addBbsiStructureReverse(threeOverhang))
}

The *filterByType* is a function responsible for filtering annotation by specific metadata (the Type property in the feature object), so we could automatic identify annotation by their part types and correctly add the overhangs.

In [None]:
import "strings"

func filterByType(features []poly.Feature, partType string) []poly.Feature {
  var list []poly.Feature
  for _, feature := range features {
    if strings.ToUpper(feature.Type) == partType {
      list = append(list, feature)
    }
  }
  return list
}

The *generateOverhangs* function basically receives a specific tag and generate all the overhangs for type of part.

In [None]:
import "log"

func generateOverhangs(tag string, sequence string, features []poly.Feature) []MoCloPart {

  switch tag {
    case "PR": {
        return createListPromoters(sequence, features)
    }
    case "TS": {
        return createListTerminator(sequence, features)
    }
    case "FR": {
        return createListFlankReverse(sequence, features)
    }
    case "ES": {
        return createListEcoliSelection(sequence, features)
    }
    case "EO": {
        return createListEcoliOrigin(sequence, features)
    }
    case "RB": {
        return createListRbs(sequence, features)
    }
    case "ES+EO": {
        return createListEcoliSelectionOrigin(sequence, features)
    }
  }
    log.Fatal("Pass a tag of a possible part category!")
}

In [None]:
func AutomaticOverhangs(sequence poly.Sequence) []MoCloPart {
  tags := []string{"PR", "TS", "FR", "ES", "EO", "RB", "ES+EO"}

  var allParts []MoCloPart
  for _, tag := range tags {
    filteredParts := filterByType(sequence.Features, tag)
    if (len(filteredParts) > 0) {
      parts := generateOverhangs(tag, sequence.Sequence, filteredParts)
      allParts = append(allParts, parts...)
    }
  }
  return allParts
}

## **2. Running**


Now that we have all the functions necessary, lets combine everything together by receiving a final plasmid sequence the we called *pht-43-final* sequence.

In [None]:
// For this tutorial, we have already downloaded this in the settings part (in "Download important data to run this tutorial")
pht43 := genbank.Read("./pht43-final.gb")

Run the AutomaticAnnotation funtion with the final plasmid and take a look, you received all the annotations with Type properties that matches the tags created in the *generatedOverhangs* function with the overhangs already created. 

In [None]:
parts := AutomaticOverhangs(pht43)

Now transform all this sequences in Fasta objects to create a Fasta file with all your sequences already in you Assembly Standard!

In [None]:
var fastas []fasta.Fasta

for _, part := range parts {
  fastas = append(fastas, fasta.Fasta{part.name + "|" + part.typePart, part.sequence})
}

fasta.Write(fastas, "./allPartsWithOverhangs.fasta")