# **How to: optimize codons with Poly package and friendzymes toolkit**
In this notebook, you will input your CDS sequence that has one or more problems and receive your CDS corrected without the problematic sequences. This is done by replacing the codons with synonymous ones, thus keeping the same amino acid sequence at the end.

*Kind reminder that this tutorial was NOT meant for non-coding sequences such as promoters, rbs, and terminators. If you have found problematic sequences in it, review case by case and be careful not to lose biological meaning.*

---

Authors: [**@GiovannaMaklouf**](https://twitter.com/giomaklouf), [**@IsaacG**](https://twitter.com/IsaacGuerreiro9)



# Configurations for this tutorial




First let's run some important settings so you can run this tutorial successfully. 


Colab notebooks use python kernels to run each cell. However, because ***Poly*** is written in **Go language (golang)**, we need to install and configure some things in colab to make feasible run something in go lang here.

### **1. In order to start the golang environment, run the line below:**

In [None]:
# this process may take a few minutes
!add-apt-repository ppa:longsleep/golang-backports -y
!apt update
!apt install golang-go
%env GOPATH=/root/go
!go get -u github.com/gopherdata/gophernotes
!cp ~/go/bin/gophernotes /usr/bin/
!npx degit gopherdata/gophernotes/kernel \
     /usr/local/share/jupyter/kernels/gophernotes

0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:4 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ Packages [69.5 kB]
Ign:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release [696 B]
Hit:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Get:8 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release.gpg [836 B]
Hit:10 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:13 htt

### **2. Download important data to run this tutorial**

In [None]:
!rm -rf $GOPATH/pkg/mod/github.com/!open-!science-!global
!rm -rf $GOPATH/pkg/mod/cache/download/github.com/!open-!science-!global
!go get -u github.com/Open-Science-Global/poly@e3e1c61

go: downloading github.com/Open-Science-Global/poly v0.11.3


In [None]:
!wget https://raw.githubusercontent.com/Open-Science-Global/friendzymes-cookbook/main/data/cds-fix/bsub-ecoli.json

--2021-10-20 21:19:29--  https://raw.githubusercontent.com/Open-Science-Global/friendzymes-toolkit/main/data/codon-table/bsub-ecoli.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4702 (4.6K) [text/plain]
Saving to: ‘bsub-ecoli.json’


2021-10-20 21:19:29 (47.1 MB/s) - ‘bsub-ecoli.json’ saved [4702/4702]



### **3. Connect Colab Notebook to your GDrive (not required)**

The previous code will download the files temporarily. If you want to download them to a folder on your drive and save it for later analysis or if you are already using this notebook to run your own files, you should connect your Google Drive to Colab and you will be able to access, read and save files permanently. 
So, if you prefer, you can do this with the code line below:



In [None]:
from google.colab import drive
drive.mount('/content/drive')

## After running these steps, click on **Runtime** in the menu bar & **Change Runtime Type** to Go, if it hasn't changed yet.

This will make Colab starting use a open source go kernel called Gopher Notes.

Now we are ready to work.

# **Importing packages and pre-requisites**

In [None]:
package main

import (
	"fmt"
	"os"
	"path/filepath"
	"strings"
	"sync"

  "github.com/Open-Science-Global/poly"
	"github.com/Open-Science-Global/poly/checks"
	"github.com/Open-Science-Global/poly/io/fasta"
	"github.com/Open-Science-Global/poly/io/genbank"
	"github.com/Open-Science-Global/poly/linearfold"
	"github.com/Open-Science-Global/poly/synthesis"
	"github.com/Open-Science-Global/poly/transform"
	"github.com/Open-Science-Global/poly/transform/codon"
)

## **Fixing problematic sequences**

Keep in mind that the sequence you give as input for codon optimization has to be:

1.   a coding sequence (CDS);
2.   in a DNA sequence format, i.e. in A, T, C and Gs;


In [None]:
// run this to set up the subfunction 1 (REMOVE OR ADD RESTRICTIONS ENZYMES BINDING SITES AS YOU WISH)

func forbiddenSeqList() []string {
	BsaI := "GGTCTC"
	BbsI := "GAAGAC"
	BtgzI := "GCGATG"
	SapI := "GCTCTTC"
	BsmbI := "CGTCTC"
	AarI := "CACCTGC"
	PmeI := "GTTTAAAC"
	HindIII := "AAGCTT"
	PstI := "CTGCAG"
	XbaI := "TCTAGA"
	BamHI := "GGATCC"
	SmaI := "CCCGGG"
	KpnI := "GGTACC"
	SacI := "GAGCTC"
	SalI := "GTCGAC"
	EcoRI := "GAATTC"
	SphI := "GCATGC"
	AvrII := "CCTAGG"
	SwaI := "ATTTAAAT"
	AscI := "GGCGCGCC"
	FseI := "GGCCGGCC"
	PacI := "TTAATTAA"
	SpeI := "ACTAGT"
	NotI := "GCGGCCGC"
	SanDI_A := "GGGACCC"
	SanDI_T := "GGGTCCC"
	BglII := "AGATCT"
	XhoI := "CTCGAG"
	ClaI := "ATCGAT"

	List := []string{
		BsaI,
		BbsI,
		SapI,
		BsmbI,
		BtgzI,
		AarI,
		PmeI,
		HindIII,
		PstI,
		XbaI,
		BamHI,
		SmaI,
		KpnI,
		SacI,
		SalI,
		EcoRI,
		SphI,
		AvrII,
		SacI,
		SalI,
		SwaI,
		AscI,
		FseI,
		PacI,
		SpeI,
		NotI,
		SanDI_A,
		SanDI_T,
		BglII,
		XhoI,
		ClaI,
	}

	return List
}

In [None]:
// run this to set up the subfunction 2

func homologySequencesFindProblems() []string {
	return []string{"AAAAAA", "CCCCCC"}
}

In [None]:
// run this to set up the main function

func fixSequence(sequence string, codonTable codon.Table) string {
	// Construct function that will remove unwanted properties in our sequences
	// Function#1: Remove unwanted sequences as restriction binding sites and homopolymers of length 5
	forbiddenSequences := forbiddenSeqList()
	removeSequenceFunc := synthesis.RemoveSequence(forbiddenSequences)

	// Function#2: Remove secondary structures
	removeSecondaryFunc := synthesis.RemoveHairpin(20, 200)

	// Function#3: Remove repetition greater than 10 inside the sequence
	removeRepeatFunc := synthesis.RemoveRepeat(10)

	// Added all those functions to a list and pass through FixCds function that will take care of our problems
	var functions []func(string, chan synthesis.DnaSuggestion, *sync.WaitGroup)
	functions = append(functions, removeSequenceFunc, removeRepeatFunc, removeSecondaryFunc)

	fixedSeq, _, _ := synthesis.FixCds(":memory:", sequence, codonTable, functions)
	// Because FixCds actually remove stop codon we will concatenate to it
	return fixedSeq
}


The codon table is very important for this step, because it tells us which synonymous codon to use in case we have problematic parts that need correction. 

For this tutorial, we have already downloaded this data (`"bsub-ecoli.json"`) in the settings part (in "Download important data to run this tutorial"). In previous tutorials, like Codon Optimization, we have how to generate this codon table with the function `GenerateCodonTable()`, in case you want to generate your own.

In [None]:
codonTable := codon.ReadCodonJSON("bsub-ecoli.json")
fmt.Println(codonTable)

{[TTG CTG ATT ATC ATA ATG GTG] [TAA TAG TGA] [{Q [{CAA 4471} {CAG 5528}]} {R [{CGT 1496} {CGC 2527} {CGA 1619} {CGG 2037} {AGA 1122} {AGG 1196}]} {N [{AAT 4311} {AAC 5688}]} {A [{GCT 2078} {GCC 2117} {GCA 2392} {GCG 3411}]} {E [{GAA 7067} {GAG 2932}]} {Y [{TAT 5296} {TAC 4703}]} {* [{TAA 2820} {TAG 1197} {TGA 5982}]} {T [{ACT 1952} {ACC 2695} {ACA 2266} {ACG 3085}]} {S [{TCT 1498} {TCC 1300} {TCA 1910} {TCG 2130} {AGT 1150} {AGC 2008}]} {C [{TGT 3531} {TGC 6468}]} {I [{ATT 3513} {ATC 3786} {ATA 2700}]} {M [{ATG 10000}]} {K [{AAA 5743} {AAG 4256}]} {D [{GAT 6146} {GAC 3853}]} {G [{GGT 2480} {GGC 3515} {GGA 2052} {GGG 1951}]} {P [{CCT 1914} {CCC 1597} {CCA 2618} {CCG 3870}]} {H [{CAT 4930} {CAC 5069}]} {W [{TGG 10000}]} {V [{GTT 2662} {GTC 2133} {GTA 2085} {GTG 3118}]} {F [{TTT 5416} {TTC 4583}]} {L [{TTA 1873} {TTG 2390} {CTT 1100} {CTC 0} {CTA 0} {CTG 2885}]}]}


874 <nil>

In [None]:
// run this line to read the CDS example to fix
// For this tutorial we will use the DNA sequence that corresponds to the enzyme Pfu-Sso7d, 
// which was put into the variable below. This sequence has 3420 base pairs.

PfuSeq := "ATGGGTCTCATTTTAGATGTGGATTATATCACAGAGGAAGGAAAGCCAGTTATACGTCTTTTCAAGAAGGAAAATGGGAAATTTAAGATAGAGCATGACCGTACATTCCGTCCGTATATCTATGCCCTGTTGCGTGATGATTCTAAAATCGAGGAAGTCAAGAAAATTACCGGCGAACGGCACGGTAAAATAGTCCGGATCGTGGACGTAGAAAAGGTAGAAAAGAAATTCCTGGGGAAACCGATAACTGTATGGAAGCTGTATCTTGAACATCCGCAAGACGTCCCAACTATTCGAGAGAAAGTTAGAGAACATCCGGCAGTGGTGGATATTTTCGAATATGACATACCGTTTGCCAAACGATATTTGATAGACAAAGGTCTGATCCCGATGGAAGGGGAAGAGGAGCTGAAAATTTTGGCGTTTGATATCGAAACTTTGTATCATGAAGGCGAAGAATTCGGTAAGGGCCCTATCATCATGATCAGTTATGCAGATGAAAACGAAGCTAAGGTGATTACGTGGAAGAACATAGATTTGCCTTATGTCGAGGTAGTGTCATCAGAGCGTGAGATGATCAAGCGCTTCTTGCGTATTATTCGTGAGAAAGATCCTGACATCATTGTTACCTATAATGGAGATTCATTTGACTTTCCTTATTTAGCTAAGCGTGCCGAGAAATTAGGCATTAAGCTTACCATTGGGAGAGATGGATCGGAACCGAAAATGCAGAGGATTGGCGACATGACGGCAGTTGAAGTGAAAGGAAGAATCCACTTTGACTTGTATCACGTGATAACGAGAACAATCAATCTGCCCACTTACACACTAGAAGCTGTTTATGAAGCCATATTTGGAAAACCCAAGGAAAAGGTATACGCAGACGAGATCGCGAAGGCGTGGGAGTCTGGTGAAAACTTAGAGAGGGTTGCAAAGTATTCTATGGAGGATGCAAAAGCTACGTACGAATTGGGTAAAGAGTTCTTGCCCATGGAGATACAGTTGTCCCGGCTGGTGGGACAACCCCTGTGGGATGTGTCGCGCTCGTCAACTGGCAACTTAGTTGAATGGTTTCTTTTGCGGAAGGCGTATGAACGCAACGAAGTCGCGCCAAATAAGCCAAGCGAGGAGGAATATCAAAGACGATTGAGGGAAAGTTACACCGGTGGCTTCGTTAAAGAACCTGAGAAAGGGTTGTGGGAAAACATCGTATACCTGGACTTCAGGGCATTATACCCTAGCATCATCATTACGCACAACGTGAGCCCCGATACATTGAATCTTGAGGGGTGCAAGAATTACGATATCGCCCCGCAGGTAGGCCATAAATTTTGCAAAGATATACCGGGCTTCATACCATCACTTTTAGGACACCTGTTAGAAGAACGGCAAAAGATTAAAACAAAGATGAAGGAAACACAGGACCCGATAGAGAAGATTCTGTTAGATTACCGGCAAAAGGCAATCAAACTTTTAGCAAACAGCTTTTACGGGTACTACGGGTACGCAAAAGCACGATGGTATTGCAAAGAATGTGCAGAATCAGTCACTGCATGGGGTCGTAAGTATATAGAATTGGTCTGGAAGGAATTAGAAGAGAAGTTCGGCTTTAAAGTGTTATACATAGACACGGATGGTCTTTATGCCACTATTCCTGGTGGAGAAAGTGAAGAGATTAAGAAGAAAGCATTAGAGTTCGTTAAGTATATCAACTCGAAATTGCCCGGTCTGCTGGAACTGGAATACGAAGGTTTCTATAAGCGCGGATTCTTCGTAACGAAGAAGCGCTATGCGGTCATCGATGAAGAGGGAAAAGTTATTACGCGCGGTCTGGAGATAGTCCGTAGGGACTGGTCGGAAATCGCGAAAGAAACACAAGCGCGAGTGTTGGAAACCATCTTAAAGCACGGGGACGTTGAGGAGGCTGTTAGGATTGTAAAAGAAGTCATCCAGAAGCTGGCAAATTATGAAATCCCTCCTGAGAAACTGGCCATCTACGAGCAAATTACTAGACCCTTGCATGAGTATAAGGCAATTGGTCCACATGTGGCGGTGGCTAAGAAACTTGCGGCGAAGGGTGTTAAAATCAAACCCGGCATGGTAATTGGATATATTGTGCTGAGGGGAGATGGTCCTATTTCCAATCGCGCCATATTAGCAGAGGAATATGATCCAAAGAAGCACAAATACGATGCTGAATACTATATCGAAAATCAGGTCTTACCCGCGGTATTGCGTATATTAGAGGGCTTTGGGTACCGCAAAGAGGACTTGCGGTATCAGAAAACGAGACAGGTGGGATTGACGTCTTGGTTAAATATAAAGAAGTCGGGGACGGGCGGAGGCGGCGCAACCGTAAAATTTAAGTACAAAGGCGAGGAGAAAGAAGTTGACATCAGCAAAATCAAGAAAGTGTGGCGCGTTGGGAAAATGATCTCGTTCACTTATGATGAAGGTGGCGGCAAAACCGGTCGTGGCGCCGTATCGGAGAAAGATGCACCAAAGGAACTTTTACAGATGTTAGAGAAACAGAAGAAAGGAGGTGGCTCAGGCGGCGGATCGGAAAACCTGTATTTTCAGGGTGGAGGCGGGTCCATGGTATCAAGTGGAGAAGATATTTTCTCCGGCCTGGTGCCTATCCTGATTGAGTTAGAGGGCGACGTTAATGGCCACAGATTTTCGGTTCGTGGCGAAGGATATGGTGATGCTTCAAATGGCAAATTGGAGATCAAATTTATTTGCACCACGGGTCGCTTGCCGGTCCCGTGGCCGACCCTGGTAACAACGCTGAGTTATGGAGTCCAGTGCTTTGCGAAATACCCAGAGCACATGCGCCAGAATGACTTCTTTAAGAGCGCAATGCCGGATGGCTATGTTCAGGAGCGCACGATCAGCTTTAAAGAGGACGGTACCTACAAAACTCGAGCAGAAGTTAAATTCGAAGGGGAGGCGCTTGTTAACCGGATCGACCTGAAGGGCTTAGAATTCAAAGAGGATGGAAACATCTTGGGTCACAAACTTGAATACTCATTTAATAGTCACTATGTGTATATTACGGCGGATAAGAACAGGAATGGACTGGAGGCACAGTTTCGAATCCGGCATAATGTCGATGATGGTTCCGTCCAATTGGCGGACCATTACCAGCAAAATACCCCGATCGGGGAGGGCCCCGTGTTGTTACCTGAACAGCACTATTTAACCACCAATAGCGTTCTTTCTAAAGATCCCCAAGAACGCAGGGACCACATGGTTCTAGTTGAATTTGTGACTGCTGCCGGTTTGAGCTTAGGAATGGATGAATTGTACAAAAGCGGTGGAGGCAGCCACCATCATCATCATCATCATCACCATCACTCTTCCAAGAAATCAGGTTCATACTCTGGTTCCAAAGGCTCAAAACGGCGGATTCTGTAATAA"
PfuSeq

ATGGGTCTCATTTTAGATGTGGATTATATCACAGAGGAAGGAAAGCCAGTTATACGTCTTTTCAAGAAGGAAAATGGGAAATTTAAGATAGAGCATGACCGTACATTCCGTCCGTATATCTATGCCCTGTTGCGTGATGATTCTAAAATCGAGGAAGTCAAGAAAATTACCGGCGAACGGCACGGTAAAATAGTCCGGATCGTGGACGTAGAAAAGGTAGAAAAGAAATTCCTGGGGAAACCGATAACTGTATGGAAGCTGTATCTTGAACATCCGCAAGACGTCCCAACTATTCGAGAGAAAGTTAGAGAACATCCGGCAGTGGTGGATATTTTCGAATATGACATACCGTTTGCCAAACGATATTTGATAGACAAAGGTCTGATCCCGATGGAAGGGGAAGAGGAGCTGAAAATTTTGGCGTTTGATATCGAAACTTTGTATCATGAAGGCGAAGAATTCGGTAAGGGCCCTATCATCATGATCAGTTATGCAGATGAAAACGAAGCTAAGGTGATTACGTGGAAGAACATAGATTTGCCTTATGTCGAGGTAGTGTCATCAGAGCGTGAGATGATCAAGCGCTTCTTGCGTATTATTCGTGAGAAAGATCCTGACATCATTGTTACCTATAATGGAGATTCATTTGACTTTCCTTATTTAGCTAAGCGTGCCGAGAAATTAGGCATTAAGCTTACCATTGGGAGAGATGGATCGGAACCGAAAATGCAGAGGATTGGCGACATGACGGCAGTTGAAGTGAAAGGAAGAATCCACTTTGACTTGTATCACGTGATAACGAGAACAATCAATCTGCCCACTTACACACTAGAAGCTGTTTATGAAGCCATATTTGGAAAACCCAAGGAAAAGGTATACGCAGACGAGATCGCGAAGGCGTGGGAGTCTGGTGAAAACTTAGAGAGGGTTGCAAAGTATTCTATGGAGGATGCAAAAGCTACGTACGAATTGGGTAAAGAGTTCTTGCCCATGGAGATAC

In [None]:
// run this line to actually fix the sequence

fixedSequence := fixSequence(PfuSeq, codonTable)
fixedSequence

ATGGGACTCATTTTAGATGTGGATTATATCACAGAGGAAGGAAAGCCAGTTATACGTCTTTTCAAGAAGGAAAATGGGAAATTTAAGATAGAGCATGACCGTACATTCCGTCCGTATATCTATGCCCTGTTGCGTGATGATTCTAAAATCGAGGAAGTCAAGAAAATTACCGGCGAACGGCACGGTAAAATAGTCCGGATCGTGGACGTAGAAAAGGTAGAAAAGAAATTCCTGGGGAAACCGATAACTGTATGGAAGCTGTATCTTGAACATCCGCAAGACGTCCCAACTATTCGAGAGAAAGTTAGAGAACATCCGGCAGTGGTGGATATTTTCGAATATGACATACCGTTTGCCAAACGATATTTGATAGACAAAGGTCTGATCCCGATGGAAGGGGAAGAGGAGCTGAAAATTTTGGCGTTTGATATCGAAACTTTGTATCATGAAGGCGAAGAGTTCGGTAAGGGCCCTATCATCATGATCAGTTATGCAGATGAAAACGAAGCTAAGGTGATTACGTGGAAGAACATAGATTTGCCTTATGTCGAGGTAGTGTCATCAGAGCGTGAGATGATCAAGCGCTTCTTGCGTATTATTCGTGAGAAAGATCCTGACATCATTGTTACCTATAATGGAGATTCATTTGACTTTCCTTATTTAGCTAAGCGTGCCGAGAAATTAGGCATTAAACTTACCATTGGGAGAGATGGATCGGAACCGAAAATGCAGAGGATTGGCGACATGACGGCAGTTGAAGTGAAAGGAAGAATCCACTTTGACTTGTATCACGTGATAACGAGAACAATCAATCTGCCCACTTACACACTAGAAGCTGTTTATGAAGCCATATTTGGAAAACCCAAGGAAAAGGTATACGCAGACGAGATCGCGAAGGCGTGGGAGTCTGGTGAAAACTTAGAGAGGGTTGCAAAGTATTCTATGGAGGATGCAAAAGCTACGTACGAATTGGGTAAAGAGTTCTTGCCCATGGAGATAC

Then you can just save it with the code below and inspect it as you wish!

In [None]:
var fixedList []fasta.Fasta
fixedList = append(fixedList, fasta.Fasta{"My fixed CDS sequence", fixedSequence}) 
fasta.Write(fixedList, "fixedSequence.fasta")